Machine Learning at Scale

Data Science

3 units

Course Description

This course teaches the underlying principles required to develop scalable machine learning pipelines for structured and unstructured data at the petabyte scale. Students will gain hands-on experience in Apache Hadoop and Apache Spark.

Skill Sets

Code up machine learning algorithms on single machines and on clusters of machines / Amazon AWS / Working on problems with terabytes of data / Machine learning pipelines for petabyte-scale data / Algorithmic design / Parallel computing


Apache Hadoop / Apache Spark

Current Course Designers

James Shanahan
Former Lecturer
Kyle Hamilton
Alumni (MIDS 2017)
Former Lecturer

Original Course Designer

James Shanahan
Former Lecturer

Previously listed as DATASCI W261.


Data Science 205 & 207. Intermediate programming skills in an object-oriented language (e.g., Python). Master of Information and Data Science students only.
Last updated: October 6, 2022