Machine Learning at Scale

Data Science
261

3 units

Course Description

This course teaches the underlying principles required to develop scalable machine learning pipelines for structured and unstructured data at the petabyte scale. Students will gain hands-on experience in Apache Hadoop and Apache Spark.

Skill Sets

Code up machine learning algorithms on single machines and on clusters of machines / Amazon AWS / Working on problems with terabytes of data / Machine learning pipelines for petabyte-scale data / Algorithmic design / Parallel computing

Tools

Apache Hadoop / Apache Spark

Current Course Designers

James Shanahan

Former Lecturer

Website

Kyle Hamilton

Alumni (MIDS 2017)
Former Lecturer

kylehamilton@ischool.berkeley.edu

Original Course Designer

James Shanahan

Former Lecturer

Website

Previously listed as DATASCI W261.

Prerequisites

Data Science 205 & 207. Intermediate programming skills in an object-oriented language (e.g., Python). Master of Information and Data Science students only.

Last updated: October 6, 2022

Machine Learning at Scale

Data Science
261

3 units

Course Description

Skill Sets

Tools

Current Course Designers

Original Course Designer

Prerequisites

Video

Signing Up for I School Classes

Course History

Fall 2025

Summer 2025

Spring 2025

Fall 2024

Machine Learning at Scale

Data Science 261

3 units

Course Description

Skill Sets

Tools

Current Course Designers

Original Course Designer

Prerequisites

Video

Signing Up for I School Classes

Course History

Fall 2025

Summer 2025

Spring 2025

Fall 2024

Data Science
261