Data Science 261

Machine Learning at Scale

3 units

Course Description

This course teaches the underlying principles required to develop scalable machine learning pipelines for structured and unstructured data at the petabyte scale. Students will gain hands-on experience in Apache Hadoop and Apache Spark.

Skill Sets

Code up machine learning algorithms on single machines and on clusters of machines / Amazon AWS / Working on problems with terabytes of data / Machine learning pipelines for petabyte-scale data / Algorithmic design / Parallel computing

Tools

Apache Hadoop / Apache Spark

Current Course Designers

Profile profile for kylehamilton

Kyle Hamilton
Kyle Hamilton
Former Lecturer Alumni (MIDS 2017)

Original Course Designer

Previously listed as DATASCI W261.

Prerequisites

Data Science 205 & 207. Intermediate programming skills in an object-oriented language (e.g., Python). Master of Information and Data Science students only.

Video

datascience@berkeley | Machine Learning at Scale

datascience@berkeley | Machine Learning at Scale

If you require video captions for accessibility and this video does not have captions, click here to request video captioning.

Course History

Summer 2023

Instructor(s): James Shanahan
Instructor(s): James Shanahan
Instructor(s): Ramakrishna Gummadi
Instructor(s): Ramakrishna Gummadi
Instructor(s): Vinicio De Sola

Spring 2023

Instructor(s): James Shanahan
Instructor(s): Vinicio De Sola
Instructor(s): Vinicio De Sola
Instructor(s): Ramakrishna Gummadi

Fall 2022

Instructor(s): Vinicio De Sola
Instructor(s): Vinicio De Sola
Instructor(s): Ramakrishna Gummadi
Instructor(s): James Shanahan
Instructor(s): James Shanahan
Instructor(s): Ramakrishna Gummadi

Last updated:

October 6, 2022