Special Topics in Technology
In this course you’ll learn industry-standard agile and lean software development techniques such as test-driven development, refactoring, pair programming, and specification through example. You’ll also learn good object-oriented programming style. We’ll cover the theory and principles behind agile engineering practices, such as continuous integration and continuous delivery.
This class will be taught in a flip-the-classroom format, with students programming in class. We'll use the Java programming language. Students need not be expert programmers, but should be enthusiastic about learning to program. Please come to class with laptops, and install IntelliJ IDEA community edition. Students signing up should be comfortable writing simple programs in Java (or a Java-like language such as C#).
This class will cover the principles and practices of managing data at scale, with a focus on use cases in data analysis and machine learning. We will cover the entire life cycle of data management and science, ranging from data preparation to exploration, visualization and analysis, to machine learning and collaboration.
The class will balance foundational concerns with exposure to practical languages, tools, and real-world concerns. We will study the foundations of prevalent data models in use today, including relations, tensors, and dataframes, and mappings between them. We will study SQL as a means to query and manipulate data at scale, including performance concerns like views and indexes, query processing and optimization, and transactions, all from a user perspective. We will study the foundations and realities of data preparation, including hands-on work with real-world data using standard Python and SQL frameworks. We will explore data exploration modalities for non-programmers, including the fundamentals behind spreadsheet systems and interactive visual analytics packages. We will look at approaches for managing the machine learning lifecycle of data preparation, model selection and training, model serving and monitoring. Time permitting we will look at technologies for moving, sharing, and caching data including event streaming systems, key-value/document stores, log analytics, and search engines.
This class will be a modern take on a traditional database class, covering the basics of dealing with data at scale from a user-centered perspective, over the entire life cycle of data management, ranging from data cleaning, extraction, and integration, to analysis and exploration, to machine learning and collaboration. The class will mix traditional lectures and assignments with student paper presentations and a class project. Experience with programming, a basic understanding of computer systems, data structures, and algorithms expected.
Students will build tools to explore and apply theories of information organization and retrieval. Students will implement various concepts covered in the concurrent 202 course through small projects on topics like controlled vocabularies, the semantic web, and corpus analysis. We will also experiment with topics suggested by students during the course. Students will develop skills in rapid prototyping of web-based projects using Python, XML, and jQuery.