A female teacher stands in front of a chalkboard with a math equation written on it while smiling and interacting with four middle school students seated at desks. One student has their hand raised to answer a question.
MIDS Capstone Project Spring 2025

EMRI: Early Math Risk Identifier

Problem and Motivation

Globally, 15.1 million 15-year-old students were not proficient in math across the 76 countries that participated in the 2022 Program for International Student Assessment (PISA), a large-scale international assessment coordinated by the Organization for Economic Cooperation and Development (OECD) every 3 to 4 years. In the U.S. alone, that represents 1.24 million students, or over a third of 15-year-olds, at risk of falling behind in math. Traditional tools to identify struggling students are expensive, reactive, and often too late. EMRI (Early Math Risk Identifier) is a proactive, AI-powered tool designed to flag students at risk of low math performance before they fall behind, empowering educators and parents to intervene earlier and equitably.

Data Sources & Data Science Approach

Our model is trained on the PISA 2022 dataset which includes nearly 600,000 students across 76 countries and over 1,000 background variables. For each country in the dataset, we developed a neural network model that identifies students at risk of falling behind in math using their background characteristics. Note that SHAP was used to identify the top 20 predictive variables for each country, simplifying the model and improving the model’s explainability. For students that are predicted to fall behind in math, EMRI offers recommendations to parents and educators on how to provide the student with extra support.

Evaluation

The performance of EMRI was evaluated using metrics such as the area under the curve (AUC), specificity, and overall accuracy. The AUC values ranged from 0.76 to 0.93 across the countries, with the majority of the values exceeding 0.80. The specificity (e.g., how accurate the model is in identifying students that are actually not proficient in math) for all countries ranged between 90% and 91%, as the binary classification threshold was calibrated for each country to meet this target. The overall accuracy ranged from 55% to 89% across the countries, with a median value of 74%.

Key Learnings & Impact

Through building EMRI, we discovered just how powerful background characteristics can be in predicting student math performance, long before students begin to struggle visibly. Variables like the number of digital devices with screens at home, total number of class periods per week, and gender consistently emerged as top predictors across the countries. One of our biggest takeaways was that we do not need hundreds of variables to generate reliable insights; just 20 carefully selected indicators were enough to produce highly accurate predictions.

Another key learning was the importance of explainability. Teachers and school leaders want more than just a score; they want to understand why a student was flagged as at-risk. By integrating SHAP into our workflow, we made EMRI’s risk scores transparent and trustworthy, giving educators the context they need to make confident, data-informed decisions.

Ultimately, EMRI isn’t just a tool. It’s a shift in approach. Instead of reacting once students have already fallen behind, schools can now get ahead of the curve, deploying resources early and equitably. The impact is far-reaching: students gain the support they need to succeed; schools improve performance and equity outcomes; and society benefits from a better-prepared, more inclusive workforce. EMRI proves that with the right data, early intervention can be both scalable and transformative.

Finally, note that EMRI is designed to assess the risk of a student falling behind in math using their background characteristics, much like genetic testing is used to assess a person’s risk of getting cancer based on their genes. Importantly, this is not deterministic - it simply provides data-informed insights. Just as individuals can reduce their cancer risk by adjusting their diet and lifestyle, students identified as “at risk of falling behind in math” can benefit from targeted support from educators and parents. This early identification creates an opportunity for intervention, preventing academic struggles before it is too late.

Acknowledgment

This project was developed by Mick Dreeling, Selene Lee, Natacha Maheshe, Elijah Mercer, and YueFeng Xue as part of the UC Berkeley MIDS capstone course. We gratefully acknowledge the support of our instructors, peers, and AWS cloud resources. A special thanks to educators who helped guide the practical application of EMRI.

Last updated: April 21, 2025