MIDS Capstone Project Spring 2025

NeuroBeacon

Team members

Mission Statement

Tenebris ad sinus custodi—Hold back the darkness.

That is why NeuroBeacon exists. In the face of a growing global crisis of cognitive decline, our mission is to empower individuals to maintain and enhance cognitive function across the lifespan. We believe that while no single solution exists, the power of consistent mental engagement—delivered thoughtfully—is profound.

NeuroBeacon is a research-driven initiative to develop adaptive, data-driven training tools grounded in behavioral science and machine learning. By integrating reinforcement learning algorithms with real-world inspired gameplay, we aim to deliver a platform that not only sharpens the mind, but does so through challenges that are engaging, meaningful, and joyful.

We are committed to combining technical rigor with elegant design to create accessible, effective, and ethically responsible tools. At the same time, we strive to provide caregivers and healthcare professionals with meaningful insights into cognitive performance—making NeuroBeacon a beacon for individuals and support networks alike.

NeuroBeacon

The problem:

Cognitive training is too often uninspiring—rigid, repetitive, and disconnected from real life. While studies support the benefits of mental engagement, traditional brain-training apps frequently fall short. They feel like chores, not challenges.

The solution:

NeuroBeacon is a game-based cognitive training platform that centers engagement and adaptability. Our interactive challenges—from math-based problem-solving to rapid-recall to visual memory and reaction time tests—are rooted in everyday cognitive demands. Each game is designed to feel intuitive, immersive, and rewarding.

What sets us apart:

At the core of NeuroBeacon is an adaptive difficulty engine powered by reinforcement learning. Rather than relying on fixed difficulty tiers, the platform dynamically adjusts to each player’s performance in real time—measuring accuracy, speed, and behavioral patterns to keep them in their optimal learning zone.

When a player excels, the challenge intensifies. If they stumble, the system scaffolds support and eases difficulty to rebuild confidence. This ensures that every session is both productive and engaging—tailored to the user, not the average.

The result:

A personalized cognitive training experience that is data-driven, responsive, and aligned with each user’s growth trajectory. NeuroBeacon isn’t just about training the brain—it’s about keeping players challenged, curious, and motivated.

Data

Our capstone project leverages reinforcement learning to personalize educational experiences through adaptive question difficulty. To support robust and transferable learning, we pre-train our model on two diverse datasets: EdNet, a large-scale dataset from an AI tutoring platform, and Duolingo's SLAM dataset, a dataset focused on second language acquisition modeling. This dual-source strategy ensures exposure to a wide range of learning behaviors and task types, promoting generalizability across educational domains.

Both datasets contain the following core variables:

Timestamp (relative or absolute)
User ID
Question ID
Question type
Elapsed time to question completion
Correctness of the response

Data Pipeline

We processed the raw datasets to engineer a set of domain-specific state variables that reflect learner performance and behavior over time. These features were then scaled based on known minimum and maximum values. The resulting state representation encapsulates both short-term and long-term learner trends:

State Variables and Descriptions
State Variable	Description
prev_is_slow	Previous answer was slow
prev_is_correct	Previous answer was correct
questions_roll_ct	Rolling count of previous questions answered
correct_answers_roll_sum	Rolling count of previous questions answered correctly
percent_correct_roll	Rolling percentage of previous questions answered correctly
percent_correct_group_roll	Rolling percentage of previous similar questions answered correctly
elapsed_time_cum_avg	Average time to completion of previously answered questions
cumulative_reward	overall “grade” so far: continuous [0,1]

To incorporate user-specific patterns, we engineered user embeddings from three variables: the percentage of correctly answered easy, medium, and hard questions. These features were passed through a small neural network (input dimension 3, output dimension 4), enabling us to represent users as part of the environment rather than creating one model per user. This approach supports generalization across users and allows the model to learn evolving user behavior over time.

The action space for our agent consists of selecting one of three discrete difficulty levels: easy, medium, or hard.

After iterating on several formulations, our final reward function is defined as follows:

NeuroBeacon Reward Function Latex Screenshot

Model

At the core of our system is a Deep Q-Learning algorithm. It uses a neural network—referred to as the Primary Network—to estimate Q-values, which represent the expected reward of taking a given action in a specific learner state. This network is responsible for selecting question difficulty at each step.

We tuned the Primary Network’s architecture using Amazon SageMaker Autotune on the pretraining data. The final model includes:

Three fully connected layers
Gamma (discount factor): 0.9
Adam optimizer
Learning rate: 0.035

To stabilize training over time and mitigate model drift—particularly important since we continuously retrain on new data—we maintain a separate Target Network, which is a delayed copy of the Primary Network and updated only once every 7 days. This temporal separation reduces the risk of unstable feedback loops during Q-learning.

To train the model, we store each user interaction—consisting of the current state, selected action, received reward, and resulting next state—in a Replay Buffer. After every game session (a batch of 10 questions), we randomly sample from this buffer, mixing recent and older interactions. This random sampling strategy helps the model avoid overfitting to the latest data and improves overall learning stability.

During each training iteration, the sampled "current states" are passed through the Primary Network, while the corresponding "next states" are evaluated by the Target Network. Using the predicted Q-values, received rewards, and a fixed discount factor (gamma), we compute a Mean Squared Error loss to update the Primary Network. This reinforcement loop allows the model to learn effectively from noisy, real-world data in an incremental fashion.

Before any real-time interaction begins, the Primary Network is initialized with weights from the pre-trained model, giving it a strong foundation built from EdNet and SLAM. Once deployed, the model continues to learn after each game session through this reinforcement loop. Importantly, the Primary Network is responsible for selecting the difficulty level of the next question after each individual response, enabling a highly personalized and responsive learning experience.

App Architecture

Application Flow

Evaluation

To evaluate model performance on NeuroBeacon games, we utilized both synthetically generated user data as well as real user data from a Beta test of the NeuroBeacon platform.

Synthetic

Synthetic data allowed us to test and measure how NeuroBeacon would perform for various types of users who we anticipate would engage with the platform. We could use this data for validation and to improve the transfer of the pretrained model for use on the NeuroBeacon-developed games.

We did this by evaluating NeuroBeacon performance against a set of predefined baseline models:

Simple Model: increases difficulty of the next question when a question is answered correctly, decreases difficulty when answered incorrectly;
Random Model: always outputs a random difficulty for the next question;
Easy Model: always outputs an ‘Easy’ difficulty for the next question;
Medium Model: always outputs a ‘Medium’ difficulty for the next question; and
Hard Model: always outputs a ‘Hard’ difficulty for the next question.

Within the NeuroBeacon model, we also compared performance with and without re-training on new user input, as it would do with real world users.

We began by defining 7 representative “user profiles” with probability distributions for their accuracy and time to answer a question for each difficulty level. Profiles also included a learning rate and a non-linear fatigue effect to simulate how performance may change over time. For each of these user profiles, we could simulate how a user would interact with a model, using the model to determine the next difficulty for a question that would be shown to the user.

We ran simulations with 70 synthetic users, 10 from each profile type, and evaluated the cumulative reward achieved at the end of a set number of sessions. We could assess how reward was distributed across the different models for each user type to motivate experimentation for fine-tuning.

We also evaluated the distribution of question difficulty served to users, broken out by user type. This helped us understand and assess how the NeuroBeacon model responded to users by adapting behavior to optimize reward and learn with the individual user.

Our experimentation focused on adjusting the reward function, gamma value, and model entropy. These parameters in particular help guide model behavior. Our goal was to find a balance between allowing the model to explore the full state space and encouraging the model to exploit previously explored rewards paths.

Based on initial results of the implemented pre-trained model, we identified that the NeuroBeacon model was learning to optimize reward by delivering primarily difficult questions to all users, as shown below in the violin plot on the left. After allowing for model retraining on the user interaction data, we can see how the model learns to serve more ‘easy’ and ‘medium’ questions, specifically to our users designed to model beginners and those in cognitive decline.

NeuroBeacon difficulty distribution graphic

After experimentation and fine-tuning, we see that the NeuroBeacon model with retraining, in addition to adapting with users, successfully reduces model loss over time and maximizes average cumulative reward when compared to other baseline models.

Beta Test

The second form of evaluation came from real users engaging with a beta test of the NeuroBeacon platform. The test was open for 2 weeks and included full access to the home Dashboard and all 5 games.

Data from this user interaction was used to establish initial baselines and thresholds for question difficulty and speed.

The beta also included a link to a survey to gather feedback on the user experience. The survey included questions surrounding:

Perceived benefit of the platform,

Perceived difficulty of each game,
General likes and dislikes,
Bugs encountered during play,
Likelihood to continue play, and
Likelihood to recommend the platform to others.

Overall, participants enjoyed using the NeuroBeacon beta platform, with users saying that they found the games to be challenging, while still engaging. They also found the dynamic scoring to be motivating. Out of 14 survey respondents, 62% of users said they would continue using the platform on their own, and 79% would recommend it to others. Feedback from these users was incorporated into the platform when the beta test closed.

Key Learnings and Impact

Throughout the process of NeuroBeacon development, we learned a number of key concepts about reinforcement learning that we would like to highlight:

First, the reinforcement learning paradigm is flexible, and its techniques can build on top of existing deep learning models.
Second, developing unseen environments is critical for transferring and evaluating model learning.
Third, system architecture design is critical to enable continuous adaptation of reinforcement models.

We have provided a framework for pre-training and transferring a learned policy to unseen environments. We have created new environments to evaluate and fine-tune reward function, policy, and predicted actions during the reinforcement loop. Finally, we established system design approaches to enable rapid re-learning when deploying reinforcement models to web applications.

We are proud of these technical achievements that we believe will enable user impact. Our real world inspired games aim to help draw connections to daily tasks and ultimately improve everyday life for our users. NeuroBeacon is a personalized platform down to its core that learns and adapts as the user does, empowering a learning experience unique to the individual. With growing numbers of adults experiencing cognitive decline, we hope NeuroBeacon can be there to meet the needs of these future users and Hold Back the Night.

Acknowledgements

We would like to express our gratitude to our Capstone professors, Todd Holloway and Puya Vahabi, for their support and guidance during development of NeuroBeacon. We would also like to thank Dr. Gina Calloway who provided relevant cognitive science expertise supporting game development and survey creation.

Additionally, thank you to all of our MIDS peers and professors who empowered us with the technical skills as data scientists to bring NeuroBeacon to life.

Course

Data Science 210. Capstone , Spring 2025

Class Project Gallery

More Information

NeuroBeacon - Train Your Brain

NeuroBeacon Capstone Final Presentation Slides

NeuroBeacon Welcome Dashboard screenshot

NeuroBeacon Games

NeuroBeacon features a suite of adaptive mini-games designed to strengthen key cognitive skills through engaging, real-world-inspired challenges:

Fraction Focus
Sharpen your numerical reasoning by solving fraction addition problems in multiple-choice or free input modes. Scoring adjusts based on accuracy and attempts.
Trivia Time
Test your memory and general knowledge with decade-based trivia spanning history, pop culture, science, and more. Tracks accuracy across categories and sessions.
Reaction Road (Driving Endeavor)
Boost reaction time and visual attention by clicking quickly in response to stimuli during simulated driving scenarios using real-world imagery.
Memory Match
Strengthen short-term memory and recall by matching visual patterns in increasingly complex sequences.
Sudoku Solver
Enhance logic and pattern recognition skills with dynamically generated Sudoku puzzles, designed to challenge players across a range of difficulties.

Each game adapts in real time using a personalized difficulty model based on your accuracy, speed, and performance history.

Whether you're training focus, memory, logic, or reflexes, NeuroBeacon delivers cognitive workouts that evolve with you.

Video

Last updated: April 18, 2025