Data Science 261
Machine Learning at Scale
3 units
Course Description
This course teaches the underlying principles required to develop scalable machine learning pipelines for structured and unstructured data at the petabyte scale. Students will gain hands-on experience in Apache Hadoop and Apache Spark.
Skill Sets
Code up machine learning algorithms on single machines and on clusters of machines / Amazon AWS / Working on problems with terabytes of data / Machine learning pipelines for petabyte-scale data / Algorithmic design / Parallel computing
Tools
Apache Hadoop / Apache Spark
Current Course Designers
Original Course Designer
Previously listed as DATASCI W261.
Prerequisites
Video
If you require video captions for accessibility and this video does not have captions, click here to request video captioning.
Course History
Fall 2023
Summer 2023
Spring 2023
- 1 of 8
- next ›