Info C8. Foundations of Data Science (4 units)
Foundations of data science from three perspectives: inferential thinking, computational thinking, and real-world relevance. Given data arising from some real-world phenomenon, how does one analyze that data so as to understand that phenomenon? The course teaches critical concepts and skills in computer programming and statistical inference, in conjunction with hands-on analysis of real-world datasets, including economic data, document collections, geographical data, and social networks. It delves into social and legal issues surrounding data analysis, including issues of privacy and data ownership. Also listed as Computer Science C8 and Statistics C8.
Info W10. Introduction to Information (3 units)
Two hours of web-based lecture and one hour of web-based discussion per week. This lower-division survey course will provide an introduction to the study of information, an interdisciplinary science that draws on aspects of computer science, sociology, economics, business, law, library studies, cognitive science, psychology, and communication. The course is organized into modules that may cover topics such as social bookmarking, networks and web security, human-computer interaction, interface design, technology and poverty, law and policy, business models, and entrepreneurship.
A fast-paced introduction to the Python programming language geared toward students of data science. The course introduces a range of Python objects and control structures, then builds on these with classes on object-oriented programming. The last section of the course is devoted to Python’s system of packages for data analysis. Students will gain experience in different styles of programming, including scripting, object-oriented design, test-driven design, and functional programming. Aside from Python, the course also covers use of the command line, coding and presentation with Jupyter notebooks, and source control with Git and GitHub. This is an online course; students will attend regular live online sessions as well as reviewing recorded material.
This course is conducted entirely online, so students do not need to be on the Berkeley campus in order to participate. However, live session attendance via our course management software is required, so students will attend class as a group on a weekly basis. Students will also view recorded content as a supplement to the live session meetings. Exact section days and times will be announced soon.
Info 88A. Data and Ethics (2 units)
This course provides an introduction to critical and ethical issues surrounding data and society. It blends social and historical perspectives on data with ethics, policy, and case examples to help students develop a workable understanding of current ethical issues in data science. Ethical and policy-related concepts addressed include: research ethics; privacy and surveillance; data and discrimination; and the “black box” of algorithms. Importantly, these issues will be addressed throughout the lifecycle of data — from collection to storage to analysis and application. Course assignments will emphasize researcher and practitioner reflexivity, allowing students to explore their own social and ethical commitments.
Three hours of lecture per week. Must be taken on a passed/not passed basis. An introduction to high-level computer programming languages covering their basis in mathematics and logic. This course will guide students through the elements that compose any programming language including expressions, control of flow, data structures, and modularity via functions and/or objects. Covers traditional contemporary programming paradigms including sequential, event-based, and object-oriented programming; multi-person programming projects and debugging strategies.
Course may be repeated for credit. One to four hours of directed group study per week. Must be taken on a passed/not passed basis. Lectures and small group discussions focusing on topics of interest, varying from semester to semester.
Info C103. History of Information (3 units)
Prerequisites: Upper level undergraduates. This course explores the history of information and associated technologies, uncovering why we think of ours as "the information age." We will select moments in the evolution of production, recording, and storage from the earliest writing systems to the world of Short Message Service (SMS) and blogs. In every instance, we'll be concerned with both what and when and how and why, and we'll keep returning to the question of technological determinism: how do technological developments affect society and vice-versa? Also listed as History C192, Media Studies C104C, and Cognitive Science C103.
Info 114. User Experience Research (3 units)
Three hours of lecture per week. Methods and concepts of creating design requirements and evaluating prototypes and existing systems. Emphasis on computer-based systems, including mobile system and ubiquitous computing, but may be suitable for students interested in other domains of design for end-users. Includes quantitative and qualitative methods as applied to design, usually for short-term term studies intended to provide guidance for designers. Students will receive no credit for 114 after taking 214.
Two hours of lecture per week, one hour of discussion per week. Open to all undergraduate students and designed for those with little technical background. In this course students will first gain an understanding of the basics of how search engines work, and then explore how search engine design impacts business and culture. Topics include search advertising and auctions, search and privacy, search ranking, internationalization, anti-spam efforts, local search, peer-to-peer search, and search of blogs and online communities.
Info 146. Foundations of New Media (3 units)
Three hours of lecture per week. Prerequisites: No prior New Media production experience required. Introduction to interdisciplinary study and design of New Media. Survey of theoretical and practical foundations of New Media including theory and history; analysis and reception; computational foundations; social implications; interaction, visual, physical, and narrative design. Instruction combines lectures and project-based learning using case studies from everyday technology (e.g., telephone, camera, web).
Three hours of lecture per week. Prerequisites: Introductory programming experience. This course looks at the quickly developing landscape of mobile applications. It focuses on Web-based mobile applications, and thus covers issues of Web service design (RESTful service design), mobile platforms (iPhone, Android, Symbian/S60, WebOS, Windows Mobile, BlackBerry OS, BREW, JavaME/JavaFX, Flash Light), and the specific constraints and requirements of user interface design for limited devices. The course combines a conceptual overview, design issues, and practical development issues.
Two hours of lecture and one hour of laboratory per week. Prerequisites: Introductory programming experience. This course focuses on understanding the Web as an information system, and how to use it for information management for personal and shared information. The Web is an open and constantly evolving system which can make it hard to understand how the different parts of the landscape fit together. This course provides students with an overview of the Web as a whole, and how the individual parts it together. It provides students with the understanding and skills to better navigate and use the landscape of Web information.
Three hours of lecture and one hour of laboratory per week. An introduction to high-level computer programming languages with emphasis on strings, modules, functions and objects; sequential and event-based programming. Uses the PYTHON language.
With the advent of virtual communities and online social networks, old questions about the meaning of human social behavior have taken on renewed significance. Using a variety of online social media simultaneously, and drawing upon theoretical literature in a variety of disciplines, this course delves into discourse about community across disciplines. This course will enable students to establish both theoretical and experiential foundations for making decisions and judgments regarding the relations between mediated communication and human community. Also listed as Sociology C167.
Info 181. Technology and Poverty (3 units)
Three hours of lecture per week. This course will encourage students to think broadly about the interplay between technological systems, social processes, economic activities, and political contingencies in efforts to alleviate poverty. Students will come to understand poverty not only in terms of high-level indicators, but from a ground-level perspective as 'the poor' experience and describe it for themselves. The role played by individuals and societies of the developing world as active agents in processes of technology adoption and use will be a central theme.
Info 190. Special Topics in Information (3 units)
Specific topics, hours and credit may vary from section to section, year to year. May be repeated for credit with change in content.
Course may be repeated for credit. One to four hours of lecture per week. Meetings to be arranged. Must be taken on a passed/not passed basis.
Info 199. Individual Study (1-4 units)
Course may be repeated for credit. Must be taken on a pass/not passed basis. Individual study of topics in information management and systems under faculty supervision.
Three hours of lecture per week. Organization, representation, and access to information. Categorization, indexing, and content analysis. Data structures. Design and maintenance of databases, indexes, classification schemes, and thesauri. Use of codes, formats, and standards. Analysis and evaluation of search and navigation techniques.
Info 203. Social Issues of Information (4 units)
Three hours of lecture per week. Prerequisite: Consent of instructor required for non majors. The relationship between information and information systems, technology, practices, and artifacts on how people organize their work, interact, and understand experience. Individual, group, organizational, and societal issues in information production and use, information systems design and management, and information and communication technologies. Social science research methods for understanding information issues.
Info 205. Information Law and Policy (3 units)
Three hours of lecture per week. Course must be completed for a letter grade to fulfill degree requirements. Prerequisites: Consent of instructor required for non-majors. Law is one of a number of policies that mediates the tension between free flow and restrictions on the flow of information. This course introduces students to copyright and other forms of legal protection for databases, licensing of information, consumer protection, liability for insecure systems and defective information, privacy, and national and international information policy.
Three hours of lecture and one hour of laboratory per week. Prerequisites: An introductory programming course and consent of instructor for non-majors. Course must be completed for a letter grade to fulfill degree requirement. Technological foundations for computing and communications: computer architecture, operating systems, networking, middleware, security. Programming paradigms: object oriented-design, design and analysis of algorithms, data structures, formal languages. Distributed-system architectures and models, inter-process communications, concurrency, system performance.
As information becomes increasingly strategic for all organizations, technology professionals must also develop the core business skills required to build personal brand, expand influence, build high-quality relationships, and deliver on critical enterprise projects. Using a combination of business and academic readings, case discussions and guest speakers, this course will explore a series of critical business topics that apply to both start-up and Fortune 500 enterprises. Subjects include: communication and presentation skills, software and product development methodologies, negotiation skills, employee engagement, organizational structures and career paths, successful interviewing, and CV preparation.
Info 212. Information in Society (3 units)
Three hours of lecture per week. The role of information and information technology in organizations and society. Topics include societal needs and demands, sociology of knowledge and science, diffusion of knowledge and technology, information seeking and use, information and culture, and technology and culture.
Three hours of lecture per week. User interface design and human-computer interaction. Examination of alternative design. Tools and methods for design and development. Human computer interaction. Methods for measuring and evaluating interface quality.
Info 214. Needs and Usability Assessment (3 units)
Concepts and methods of needs and usability assessment. Understanding users' needs and practices and translating them into design decisions. Topics include methods of identifying and describing user needs and requirements; user-centered design; user and task analysis; contextual design; heuristic evaluation; surveys, interviews, and focus groups; usability testing; naturalistic/ethnographic methods; managing usability in organizations; and universal usability.
Three hours of lecture per week. This course covers the practical and theoretical issues associated with computer-mediated communication (CMC) systems (e.g., email, newsgroups, wikis, online games, etc.). We will focus on the analysis of CMC practices, the relationship between technology and behavior, and the design and implementation issues associated with constructing CMC systems. This course primarily takes a social scientific approach (including research from social psychology, economics, sociology, and communication).
Info 218. Concepts of Information (3 units)
Three hours of lecture per week. Prerequisites: Graduate standing. As it's generally used, "information" is a collection of notions, rather than a single coherent concept. In this course, we'll examine conceptions of information based in information theory, philosophy, social science, economics, and history. Issues include: How compatible are these conceptions; can we talk about "information" in the abstract? What work do these various notions play in discussions of literacy, intellectual property, advertising, and the political process? And where does this leave "information studies" and "the information society"?
Three hours of lecture per week. Prerequisite: 206 or consent of instructor. Policy and technical issues related to insuring the accuracy and privacy of information. Encoding and decoding techniques including public and private key encryption. Survey of security problems in networked information environment including viruses, worms, trojan horses, Internet address spoofing.
Info 221. Information Policy (3 units)
Three hours of lecture per week. An examination of the nature of corporate, non-profit, and governmental information policy. The appropriate role of the government in production and dissemination of information, the tension between privacy and freedom of access to information. Issues of potential conflicts in values and priorities in information policy.
Three hours of lecture per week. This course focuses on managing people in information-intensive firms and industries, such as information technology industries. Topics include managing knowledge workers; managing teams (including virtual ones); collaborating across disparate units, giving and receiving feedback; managing the innovation process (including in eco-systems); managing through networks; and managing when using communication tools (e.g., tele-presence). The course relies heavily on cases as a pedagogical form.
Three hours of lecture per week. Using a mix of theory and case studies, the course provides students with different backgrounds a unifying view of the design life cycle, making them more effective and versatile designers.
Info 231. Economics of Information (3 units)
Three hours of lecture per week. The measurement and analysis of the role information plays in the economy and of the resources devoted to production, distribution, and consumption of information. Economic analysis of the information industry. Macroeconomics of information.
Info 232. Applied Behavioral Economics (3 units)
"Behavioral Economics" is one important perspective on how information impacts human behavior. The goal of this class is to deploy a few important theories about the relationship between information and behavior, into practical settings — emphasizing the design of experiments that can now be incorporated into many 'applications' in day-to-day life. Truly 'smart systems' will have built into them precise, testable propositions about how human behavior can be modified by what the systems tell us and do for us. So let's design these experiments into our systems from the ground up! This class develops a theoretically informed, practical point of view on how to do that more effectively and with greater impact.
Three hours of lecture per week. Application of economic tools and principles, including game theory, industrial organization, information economics, and behavioral economics, to analyze business strategies and public policy issues surrounding information technologies and IT industries. Topics include: economics of information; economics of information goods, services, and platforms; strategic pricing; strategic complements and substitutes; competition models; network industry structure and telecommunications regulation; search and the "long tail"; network cascades and social epidemics; network formation and network structure; peer production and crowdsourcing; interdependent security and privacy.
Info 235. Cyberlaw (3 units)
Three hours of lecture per week. Introduction to legal issues in information management, antitrust, contract management, international law including intellectual property, trans-border data flow, privacy, libel, and constitutional rights.
Three hours of lecture per week. The philosophical, legal, historical, and economic analysis of the need for and uses of laws protecting intellectual property. Topics include types of intellectual property (copyright, patent, trade secrecy), the interaction between law and technology, various approaches (including compulsory licensing), and the relationship between intellectual property and compatibility standards.
Three hours of lecture per week. Prerequisites: 202 or consent of instructor. Theories and methods for searching and retrieval of text and bibliographic information. Analysis of relevance, utility. Statistical and linguistic methods for automatic indexing and classification. Boolean and probabilistic approaches to indexing, query formulation, and output ranking. Filtering methods. Measures of retrieval effectiveness and retrieval experimentation methodology.
Info 242. XML Foundations (3 units)
Three hours of lecture. The Extensible Markup Language (XML), with its ability to define formal structural and semantic definitions for metadata and information models, is the key enabling technology for information services and document-centric business models that use the Internet and its family of protocols. This course introduces XML syntax, transformations, schema languages, and the querying of XML databases. It balances conceptual topics with practical skills for designing, implementing, and handling conceptual models as XML schemas.
Three hours of lecture per week. Prerequisites: 202 or consent of instructor. Standards and practices for organization and discription of bibliographic, textual, and non textual collections. Design, selection, maintenance and evaluation of cataloging, classification, indexing and thesaurus systems for specific settings. Codes, formats and standards for data representation and transfer of data.
Info 246. Multimedia Information (3 units)
Three hours of lecture per week. Prerequisites: 202, 203 or consent of instructor. Concepts and methods of design, management, creation, and evaluation of multimedia information systems. Theory and practice of digital media production, reception, organization, retrieval, and reuse. Review of applicable digital technology with special emphasis on digital video. Course will involve group projects in the design and development of digital media systems and applications.
Three hours of lecture and one hour of laboratory per week. Prerequisites: Information 206, Computer Science 160, or knowledge of programming and data structures with consent of instructor. The design and presentation of digital information. Use of graphics, animation, sound, visualization software, and hypermedia in presenting information to the user. Methods of presenting complex information to enhance comprehension and analysis. Incorporation of visualization techniques into human-computer interfaces
Three hours of lecture per week. Prerequisites: 206 or equivalent. Communications concepts, network architectures, data communication software and hardware, networks (e.g. LAN, wide), network protocols (e.g. TCP/IP), network management, distributed information systems. Policy and management implications of the technology.
252 can not be taken for credit if student has previously taken 152. Three hours of lecture per week. Prerequisites: 206 or consent of instructor. This course looks at the quickly developing landscape of mobile applications. It focuses on Web-based mobile applications, and thus covers issues of Web service design (RESTful service design), mobile platforms (iPhone, Android, Symbian/S60, WebOS, Windows Mobile, BlackBerry OS, BREW, JavaME/JavaFX, Flash Light), and the specific constraints and requirements of user interface design for limited devices. The course combines a conceptual overview, design issues, and practical development issues.
Info 253. Web Architecture (3 units)
This course is a survey of Web technologies, ranging from the basic technologies underlying the Web (URI, HTTP, HTML) to more advanced technologies being used in the the context of Web engineering, for example structured data formats and Web programming frameworks. The goal of this course is to provide an overview of the technical issues surrounding the Web today, and to provide a solid and comprehensive perspective of the Web's constantly evolving landscape.
Three hours of lecture per week. Letter grade to fulfill degree requirements. Prerequisites: Proficient programming in Python (programs of at least 200 lines of code), proficient with basic statistics and probabilities. This course examines the state-of-the-art in applied Natural Language Processing (also known as content analysis and language engineering), with an emphasis on how well existing algorithms perform and how they can be used (or not) in applications. Topics include part-of-speech tagging, shallow parsing, text classification, information extraction, incorporation of lexicons and ontologies into text analysis, and question answering. Students will apply and extend existing software tools to text-processing problems.
Info 257. Database Management (3 units)
Three hours of lecture per week. Introduction to relational, hierarchical, network, and object-oriented database management systems. Database design concepts, query languages for database applications (such as SQL), concurrency control, recovery techniques, database security. Issues in the management of databases. Use of report writers, application generators, high level interface generators.
Students will receive no credit for C262 after taking 290 section 4. Three hours of lecture and one hour of laboratory per week. This course explores the theory and practice of Tangible User Interfaces, a new approach to Human Computer Interaction that focuses on the physical interaction with computational media. The topics covered in the course include theoretical framework, design examples, enabling technologies, and evaluation of Tangible User Interfaces. Students will design and develop experimental Tangible User Interfaces using physical computing prototyping tools and write a final project report. Also listed as New Media C262.
Three hours of seminar per week. How does the design of new educational technology change the way people learn and think? How do we design systems that reflect our understanding of how we learn? This course explores issues on designing and evaluating technologies that support creativity and learning. The class will cover theories of creativity and learning, implications for design, as well as a survey of new educational technologies such as works in computer supported collaborative learning, digital manipulatives, and immersive learning environments. Also listed as New Media C263.
Info C265. Interface Aesthetics (2 units)
Two hours of lecture per week. This course will cover new interface metaphors beyond desktops (e.g., for mobile devices, computationally enhanced environments, tangible user interfaces) but will also cover visual design basics (e.g., color, layout, typography, iconography) so that we have systematic and critical understanding of aesthetically engaging interfaces. Students will get a hands-on learning experience on these topics through course projects, design critiques, and discussion, in addition to lectures and readings.
Students will receive no credit for C265 after taking 290 section 6 (Spring 2009 or Fall 2010; New Media 290 section 1 (Spring 2009) or New Media 290 section 2 (Fall 2010).
Also listed as New Media C265.
Three hours of lecture per week. Introduction to many different types of quantitative research methods, with an emphasis on linking quantitative statistical techniques to real-world research methods. Introductory and intermediate topics include: defining research problems, theory testing, causal inference, probability and univariate statistics. Research design and methodology topics include: primary/secondary survey data analysis, experimental designs, and coding qualitative data for quantitative analysis. No prerequisites, though an introductory course in statistics is recommended.
Three hours of lecture per week. Theory and practice of naturalistic inquiry. Grounded theory. Ethnographic methods including interviews, focus groups, naturalistic observation. Case studies. Analysis of qualitative data. Issues of validity and generalizability in qualitative research.
Three hours of seminar per week. This seminar reviews current literature and debates regarding Information and Communication Technologies and Development (ICTD). This is an interdisciplinary and practice-oriented field that draws on insights from economics, sociology, engineering, computer science, management, public health, etc. Also listed as Energy and Resources Group C283.
Three hours of lecture per week. This class is focused on the creation of sustainable enterprises based on ICT (Information and Communications Technologies) innovations supporting international development. We take a broad view of entrepreneurship – including starting new businesses, non-profit initiatives and/or public sector projects. We will take a highly iterative, design-oriented, feedback-driven approach to developing and refining business plans for social enterprises.
Info 290. Special Topics in Information (1-4 units)
Course may be repeated for credit as topic varies. Two to six hours of lecture per week for seven and one-half weeks or one to four hours of lecture per week for 15 weeks. Prerequisites: Consent of instructor. Specific topics hours, and credit may vary from section to section, year to year.
Info 290A. Special Topics in Information (1-2 units)
Course may be repeated for credit. One and one-half to two hours of lecture per week for eight weeks. Two hours of lecture per week for six weeks. Three hours of lecture per week for five weeks.
Info 290M. Special Topics in Management (1-4 units)
Course may be repeated for credit as topics in management vary. One to four hours of lecture per week; two to seven and one-half hours of lecture per week for seven weeks. Specific topics, hours, and credit may vary from section to section and year to year.
Info 290T. Special Topics in Technology (1-4 units)
Course may be repeated for credit as topics in technology vary. One to four hours of lecture per week; two to six hours of lecture per week for seven weeks. Specific topics, hours, and credit may vary from section to section and year to year.
Course may be repeated once. Must be taken on a satisfactory/unsatisfactory basis. This is a zero-unit independent study course for international students doing internships under the Curricular Practical Training program. The course will be individually supervised and must be approved by the head graduate adviser.
Info 295. Doctoral Colloquium (1 units)
One hour colloquium per week. Must be taken on a satisfactory/unsatisfactory basis. Prerequisites: Ph.D. standing in the School of Information. Colloquia, discussion, and readings designed to introduce students to the range of interests of the school.
Info 296A. Seminar (2-4 units)
Course may be repeated for credit as topic varies. Two to four hours of seminar per week. Prerequisites: Consent of instructor. Topics in information management and systems and related fields. Specific topics vary from year to year. May be repeated for credit, with change of content.
Info 298. Directed Group Study (1-4 units)
Course may be repeated for credit as topic varies. Weekly group meetings. Prerequisites: Consent of instructor. Group projects on special topics in information management and systems.
Two hours of directed group study per week. Prerequisites: Consent of instructor. Course must be taken for a letter grade to fulfill degree requirements. The final project is designed to integrate the skills and concepts learned during the Information School master's program and helps prepare students to compete in the job market. It provides experience in formulating and carrying out a sustained, coherent, and significant course of work resulting in a tangible work product; in project management, in presenting work in both written and oral form; and, when appropriate, in working in a multidisciplinary team. Projects may take the form of research papers or professionally-oriented applied work.
Info 299. Individual Study (1-12 units)
Course may be repeated for credit as topic varies. Format varies. Prerequisites: Consent of instructor. Individual study of topics in information management and systems under faculty supervision.
Info 375. Teaching Assistance Practicum (1-6 units)
Course may be repeated for credit as topic varies. Four hours of work per week per unit. Must be taken on a satisfactory/unsatisfactory basis. Formerly Information 310. Discussion, reading, preparation, and practical experience under faculty supervision in the teaching of specific topics within information management and systems. Does not count toward a degree.
Info 602. Individual Study for Doctoral Students (1-5 units)
Course may be repeated for credit. Must be taken on a satisfactory/unsatisfactory basis. Prerequisites: Consent of instructor. Individual study in consultation with the major field adviser, intended to provide an opportunity for qualified students to prepare themselves for the various examinations required of candidates for the Ph.D. degree.
Data Science Courses
(Data Science courses are restricted to students enrolled in the MIDS degree program.)
Introduces the data sciences landscape, with a particular focus on learning data science techniques to uncover and answer the questions students will encounter in industry. Lectures, readings, discussions, and assignments will teach how to apply disciplined, creative methods to ask better questions, gather data, interpret results, and convey findings to various audiences. The emphasis throughout is on making practical contributions to real decisions that organizations will and should make.
An introduction to many different types of quantitative research methods and statistical techniques for analyzing data. We begin with a focus on measurement, inferential statistics and causal inference using the open-source statistics language, R. Topics in quantitative techniques include: descriptive and inferential statistics, sampling, experimental design, tests of difference, ordinary least squares regression, general linear models.
Data Science depends on data, and a core competency mandated by this reliance on data is knowing effective and efficient ways to manage, search and compute over that data. This course is focused on how data can be stored, managed and retrieved as needed for use in analysis or operations. The goal of this course is provide students with both theoretical knowledge and practical experience leading to mastery of data management, storage and retrieval with very large-scale data sets.
Machine learning is a rapidly growing field at the intersection of computer science and statistics concerned with finding patterns in data. It is responsible for tremendous advances in technology, from personalized product recommendations to speech recognition in cell phones. This course provides a broad introduction to the key ideas in machine learning. The emphasis will be on intuition and practical examples rather than theoretical results, though some experience with probability, statistics, and linear algebra will be important.
Communicating clearly and effectively about the patterns we find in data is a key skill for a successful data scientist. This course focuses on the design and implementation of complementary visual and verbal representations of patterns and analyses in order to convey findings, answer questions, drive decisions, and provide persuasive evidence supported by data. Assignments will give hands-on experience designing data graphics and visualizations, and reporting findings in prose.
Data Science W210. Capstone (3 units)
The capstone course will cement skills learned throughout the MIDS program — both core data science skills and “soft skills” like problem-solving, communication, influencing, and management — preparing students for success in the field. The centerpiece is a semester-long group project in which teams of students propose and select project ideas, conduct and communicate their work, receive and provide feedback (in informal group discussions and formal class presentations), and deliver compelling presentations along with a web-based final deliverable. Includes relevant readings, case discussions, and real-world examples and perspectives from panel discussions with leading data science experts and industry practitioners.
Intro to the legal, policy, and ethical implications of data, including privacy, surveillance, security, classification, discrimination, decisional-autonomy, and duties to warn or act. Examines legal, policy, and ethical issues throughout the full data-science life cycle collection, storage, processing, analysis, and use with case studies from criminal justice, national security, health, marketing, politics, education, employment, athletics, and development. Includes legal and policy constraints and considerations for specific domains and data-types, collection methods, and institutions; technical, legal, and market approaches to mitigating and managing concerns; and the strengths and benefits of competing and complementary approaches.
This course introduces students to experimentation in the social sciences. This topic has increased considerably in importance since 1995, as researchers have learned to think creatively about how to generate data in more scientific ways, and developments in information technology have facilitated the development of better data gathering. Key to this area of inquiry is the insight that correlation does not necessarily imply causality. In this course, we learn how to use experiments to establish causal effects and how to be appropriately skeptical of findings from observational data.
An overview of the contemporary toolkits for problems related to cloud computing and big data. Because the class is an advanced course, we generally assume familiarity with the concepts and spend more time on the implementation. Every lecture is followed by a hands-on assignment, where students get to experience some of the technologies covered in the lecture. By the time students complete the course, they should be able to name the big data problem they are facing, select proper tooling, and know enough to start applying it.
This course teaches the underlying principles required to develop scalable machine learning pipelines for structured and unstructured data at the petabyte scale. Students will gain hands-on experience in Apache Hadoop and Apache Spark.
Understanding language is fundamental to human interaction. Our brains have evolved language-specific circuitry that helps us learn it very quickly; however, this also means that we have great difficulty explaining how exactly meaning arises from sounds and symbols. This course is a broad introduction to linguistic phenomena and our attempts to analyze them with machine learning. We will cover a wide range of concepts with a focus on practical applications such as information extraction, machine translation, sentiment analysis, and summarization.
A continuation of Data Science W203 (Exploring and Analyzing Data), this course trains data science students to apply more advanced methods from regression analysis and time series models. Central topics include linear regression, causal inference, identification strategies, and a wide-range of time series models that are frequently used by industry professionals. Throughout the course, we emphasize choosing, applying, and implementing statistical techniques to capture key patterns and generate insight from data. Students who successfully complete this course will be able to distinguish between appropriate and inappropriate techniques given the problem under consideration, the data available, and the given timeframe.