Data Science 261

Machine Learning at Scale

3 units

Course Description

This course teaches the underlying principles required to develop scalable machine learning pipelines for structured and unstructured data at the petabyte scale. Students will gain hands-on experience in Apache Hadoop and Apache Spark.

Skill Sets

Code up machine learning algorithms on single machines and on clusters of machines / Amazon AWS / Working on problems with terabytes of data / Machine learning pipelines for petabyte-scale data / Algorithmic design / Parallel computing

Tools

Apache Hadoop / Apache Spark

Current Course Designers

Profile profile for jimi

Dr. James G. Shanahan
James Shanahan
Former Lecturer

Profile profile for kylehamilton

Kyle Hamilton
Kyle Hamilton
Former Lecturer Alumni (MIDS 2017)

Original Course Designer

Profile profile for jimi

Dr. James G. Shanahan
James Shanahan
Former Lecturer

Previously listed as DATASCI W261.

Prerequisites

Data Science 205 & 207. Intermediate programming skills in an object-oriented language (e.g., Python). Master of Information and Data Science students only.

Video

datascience@berkeley | Machine Learning at Scale

datascience@berkeley | Machine Learning at Scale

If you require video captions for accessibility and this video does not have captions, click here to request video captioning.

Course History

Spring 2024

Instructor(s): Siinn Che
Instructor(s): Ramakrishna Gummadi
Instructor(s): Vinicio De Sola
Instructor(s): Vinicio De Sola
Instructor(s): Ramakrishna Gummadi
Instructor(s): Vinicio De Sola

Fall 2023

Instructor(s): Vinicio De Sola
Instructor(s): Vinicio De Sola
Instructor(s): Ramakrishna Gummadi
Instructor(s): Ramakrishna Gummadi
Instructor(s): Siinn Che
Instructor(s): Vinicio De Sola
Instructor(s): Vinicio De Sola

Summer 2023

Instructor(s): James Shanahan
Instructor(s): James Shanahan

Last updated:

October 6, 2022