Applied Data Engineering
Data Science
204
3 units
Course Description
Storing, managing, and processing data are core to data science. This course introduces essential data engineering concepts needed to work effectively with data systems. Students learn how data pipelines are designed, built, and maintained, including data flows, storage, and processing. The course connects technical foundations to real-world business use cases, showing how organizations create value from data. Through hands-on interaction with data at different pipeline stages, students explore key tools, platforms, and architectures used to support scalable data science applications.
Student Learning Outcomes
- Articulate and justify the role of data engineering in the machine learning lifecycle and how it supports downstream analytics, modeling, and deployment.
- Design and implement data ingestion workflows to collect data from multiple sources (APIs, files, databases, streaming systems).
- Model and develop datasets appropriately based on data type and scale, including structured, semi-structured, and unstructured data.
- Build reproducible data pipelines that transform raw data utilizing standardized techniques such as normalization into analysis-ready datasets.
- Integrate and validate multiple data sources into a unified dataset while resolving schema differences, joins, and data quality issues.
- Design scalable storage formats suitable for analytical and machine learning workloads.
- Evaluate and select appropriate data infrastructure and tools based on use case, scale, cost, and performance constraints.
- Prepare datasets for downstream machine learning systems to ensure data reliability and usability.
Students will receive no credit for this course after completing DATASCI 205.
