MIMS Final Project 2023

Jumping Ship in the Multiverse of Higher Education

Project Description

Higher education institutes have been promoting socioeconomic mobility by supporting student transfer from 2-year community colleges to 4-year degree granting universities, such as in the California public post-secondary system and the State University of New York (SUNY) System. However, there are some barriers to a successful transfer. For students, earning enough credits at 2-year institutions that qualify for the transfer credits required by 4-year degree programs is the most critical issue. For institutions, defining which course at one institution will count as credit for an equivalent course at another institution, namely, course articulation, requires efforts from both the school, and it is an intractable task when attempting to manually articulate every set of courses at every institution with one another. This has sown great confusion among students and institutes alike. According to the Community College Research Center, the transferring credit problem is a major contributing factor to the dismal rates of graduation among transferring students [3].

This project targets at enabling the process of defining and maintaining course articulations tractable for a large system of colleges and universities (SYS1) with 58 campuses by leveraging the information contained within historic enrollment patterns and course catalog descriptions. Thus far, research has demonstrated that equivalent courses cluster similarly in the isolated latent spaces of each institution [1], which is similar to the properties of latent spaces of natural languages [2]. The ability to transform creative and freeform human activities into a mathematical "latent space" plays a key role in the success of representation learning algorithms in ML. These numerical representations, called "embeddings", are useful only if their geometry corresponds meaningfully to real-life dynamics of the human world. Our research goal is to arrive at such a semantically comprehensive model for Education. This project aims to explore the relationships between these isolated spaces and potentially bring them together into one unified latent space. Specifically, we aim to design Natural Language Processing (NLP) based machine learning methods to learn universal course embeddings from enrollment sequences of students in all the colleges and universities of the SYS1 system, with which we leverage to calculate course similarities and predict articulation pairs by finding the most similar course in the destination university given a source course. The deliverables of the project will be project code and a research paper.

The success of this analytical investigation, we believe, will have a huge impact on the educational futures of thousands of community college students who wish to transfer to higher-tier research universities, but are held back due to the intractability of the problem of course articulation. We also hope to impact the area of Massive Open Online Courses (MOOC) recommendations and multilingual educational platforms across the world. 


[1] Pardos, Zachary A., Hung Chau, and Haocheng Zhao. "Data-assistive course-to-course articulation using machine translation." Proceedings of the Sixth ACM Conference on Learning@ Scale. 2019.

[2] Chen, Xilun, and Claire Cardie. "Unsupervised multilingual word embeddings." Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 261–270, 2018.

[3] Community College Transfer: July 2021 Policy Fact Sheet

[4] Image Courtesy: Bronx Community College, CUNY.

More Information

Image Courtesy Bronx Community College CUNY
Image Courtesy: Bronx Community College, CUNY

Cracking the course transferring problem in a university system using NLP


Latent Space Odyssey

Latent Space Odyssey

If you require video captions for accessibility and this video does not have captions, click here to request video captioning.

Last updated:

May 11, 2023