NeuroBeacon
Mission Statement
Tenebris ad sinus custodi—Hold back the darkness.
That is why NeuroBeacon exists. In the face of a growing global crisis of cognitive decline, our mission is to empower individuals to maintain and enhance cognitive function across the lifespan. We believe that while no single solution exists, the power of consistent mental engagement—delivered thoughtfully—is profound.
NeuroBeacon is a research-driven initiative to develop adaptive, data-driven training tools grounded in behavioral science and machine learning. By integrating reinforcement learning algorithms with real-world inspired gameplay, we aim to deliver a platform that not only sharpens the mind, but does so through challenges that are engaging, meaningful, and joyful.
We are committed to combining technical rigor with elegant design to create accessible, effective, and ethically responsible tools. At the same time, we strive to provide caregivers and healthcare professionals with meaningful insights into cognitive performance—making NeuroBeacon a beacon for individuals and support networks alike.
NeuroBeacon
The problem:
Cognitive training is too often uninspiring—rigid, repetitive, and disconnected from real life. While studies support the benefits of mental engagement, traditional brain-training apps frequently fall short. They feel like chores, not challenges.
The solution:
NeuroBeacon is a game-based cognitive training platform that centers engagement and adaptability. Our interactive challenges—from math-based problem-solving to rapid-recall to visual memory and reaction time tests—are rooted in everyday cognitive demands. Each game is designed to feel intuitive, immersive, and rewarding.
What sets us apart:
At the core of NeuroBeacon is an adaptive difficulty engine powered by reinforcement learning. Rather than relying on fixed difficulty tiers, the platform dynamically adjusts to each player’s performance in real time—measuring accuracy, speed, and behavioral patterns to keep them in their optimal learning zone.
When a player excels, the challenge intensifies. If they stumble, the system scaffolds support and eases difficulty to rebuild confidence. This ensures that every session is both productive and engaging—tailored to the user, not the average.
The result:
A personalized cognitive training experience that is data-driven, responsive, and aligned with each user’s growth trajectory. NeuroBeacon isn’t just about training the brain—it’s about keeping players challenged, curious, and motivated.
Data
Our capstone project leverages reinforcement learning to personalize educational experiences through adaptive question difficulty. To support robust and transferable learning, we pre-train our model on two diverse datasets: EdNet, a large-scale dataset from an AI tutoring platform, and Duolingo's SLAM dataset, a dataset focused on second language acquisition modeling. This dual-source strategy ensures exposure to a wide range of learning behaviors and task types, promoting generalizability across educational domains.
Both datasets contain the following core variables:
- Timestamp (relative or absolute)
- User ID
- Question ID
- Question type
- Elapsed time to question completion
- Correctness of the response
Data Pipeline
We processed the raw datasets to engineer a set of domain-specific state variables that reflect learner performance and behavior over time. These features were then scaled based on known minimum and maximum values. The resulting state representation encapsulates both short-term and long-term learner trends:
State Variable | Description |
---|---|
prev_is_slow | Previous answer was slow |
prev_is_correct | Previous answer was correct |
questions_roll_ct | Rolling count of previous questions answered |
correct_answers_roll_sum | Rolling count of previous questions answered correctly |
percent_correct_roll | Rolling percentage of previous questions answered correctly |
percent_correct_group_roll | Rolling percentage of previous similar questions answered correctly |
elapsed_time_cum_avg | Average time to completion of previously answered questions |
cumulative_reward | overall “grade” so far: continuous [0,1] |
To incorporate user-specific patterns, we engineered user embeddings from three variables: the percentage of correctly answered easy, medium, and hard questions. These features were passed through a small neural network (input dimension 3, output dimension 4), enabling us to represent users as part of the environment rather than creating one model per user. This approach supports generalization across users and allows the model to learn evolving user behavior over time.
The action space for our agent consists of selecting one of three discrete difficulty levels: easy, medium, or hard.
After iterating on several formulations, our final reward function is defined as follows:
Model
At the core of our system is a Deep Q-Learning algorithm. It uses a neural network—referred to as the Primary Network—to estimate Q-values, which represent the expected reward of taking a given action in a specific learner state. This network is responsible for selecting question difficulty at each step.
We tuned the Primary Network’s architecture using Amazon SageMaker Autotune on the pretraining data. The final model includes:
- Three fully connected layers
- Gamma (discount factor): 0.9
- Adam optimizer
- Learning rate: 0.035
To stabilize training over time and mitigate model drift—particularly important since we continuously retrain on new data—we maintain a separate Target Network, which is a delayed copy of the Primary Network and updated only once every 7 days. This temporal separation reduces the risk of unstable feedback loops during Q-learning.
To train the model, we store each user interaction—consisting of the current state, selected action, received reward, and resulting next state—in a Replay Buffer. After every game session (a batch of 10 questions), we randomly sample from this buffer, mixing recent and older interactions. This random sampling strategy helps the model avoid overfitting to the latest data and improves overall learning stability.
During each training iteration, the sampled "current states" are passed through the Primary Network, while the corresponding "next states" are evaluated by the Target Network. Using the predicted Q-values, received rewards, and a fixed discount factor (gamma), we compute a Mean Squared Error loss to update the Primary Network. This reinforcement loop allows the model to learn effectively from noisy, real-world data in an incremental fashion.
Before any real-time interaction begins, the Primary Network is initialized with weights from the pre-trained model, giving it a strong foundation built from EdNet and SLAM. Once deployed, the model continues to learn after each game session through this reinforcement loop. Importantly, the Primary Network is responsible for selecting the difficulty level of the next question after each individual response, enabling a highly personalized and responsive learning experience.
App Architecture
Application Flow
Evaluation
To evaluate model performance on NeuroBeacon games, we utilized both synthetically generated user data as well as real user data from a Beta test of the NeuroBeacon platform.
Synthetic
Synthetic data allowed us to test and measure how NeuroBeacon would perform for various types of users who we anticipate would engage with the platform. We could use this data for validation and to improve the transfer of the pretrained model for use on the NeuroBeacon-developed games.
We did this by evaluating NeuroBeacon performance against a set of predefined baseline models:
- Simple Model: increases difficulty of the next question when a question is answered correctly, decreases difficulty when answered incorrectly;
- Random Model: always outputs a random difficulty for the next question;
- Easy Model: always outputs an ‘Easy’ difficulty for the next question;
- Medium Model: always outputs a ‘Medium’ difficulty for the next question; and
- Hard Model: always outputs a ‘Hard’ difficulty for the next question.
Within the NeuroBeacon model, we also compared performance with and without re-training on new user input, as it would do with real world users.
We began by defining 7 representative “user profiles” with probability distributions for their accuracy and time to answer a question for each difficulty level. Profiles also included a learning rate and a non-linear fatigue effect to simulate how performance may change over time. For each of these user profiles, we could simulate how a user would interact with a model, using the model to determine the next difficulty for a question that would be shown to the user.
We ran simulations with 70 synthetic users, 10 from each profile type, and evaluated the cumulative reward achieved at the end of a set number of sessions. We could assess how reward was distributed across the different models for each user type to motivate experimentation for fine-tuning.
We also evaluated the distribution of question difficulty served to users, broken out by user type. This helped us understand and assess how the NeuroBeacon model responded to users by adapting behavior to optimize reward and learn with the individual user.
Our experimentation focused on adjusting the reward function, gamma value, and model entropy. These parameters in particular help guide model behavior. Our goal was to find a balance between allowing the model to explore the full state space and encouraging the model to exploit previously explored rewards paths.
Based on initial results of the implemented pre-trained model, we identified that the NeuroBeacon model was learning to optimize reward by delivering primarily difficult questions to all users, as shown below in the violin plot on the left. After allowing for model retraining on the user interaction data, we can see how the model learns to serve more ‘easy’ and ‘medium’ questions, specifically to our users designed to model beginners and those in cognitive decline.
After experimentation and fine-tuning, we see that the NeuroBeacon model with retraining, in addition to adapting with users, successfully reduces model loss over time and maximizes average cumulative reward when compared to other baseline models.
Beta Test
The second form of evaluation came from real users engaging with a beta test of the NeuroBeacon platform. The test was open for 2 weeks and included full access to the home Dashboard and all 5 games.
Data from this user interaction was used to establish initial baselines and thresholds for question difficulty and speed.
The beta also included a link to a survey to gather feedback on the user experience. The survey included questions surrounding:
Perceived benefit of the platform,
- Perceived difficulty of each game,
- General likes and dislikes,
- Bugs encountered during play,
- Likelihood to continue play, and
- Likelihood to recommend the platform to others.
Overall, participants enjoyed using the NeuroBeacon beta platform, with users saying that they found the games to be challenging, while still engaging. They also found the dynamic scoring to be motivating. Out of 14 survey respondents, 62% of users said they would continue using the platform on their own, and 79% would recommend it to others. Feedback from these users was incorporated into the platform when the beta test closed.
Key Learnings and Impact
Throughout the process of NeuroBeacon development, we learned a number of key concepts about reinforcement learning that we would like to highlight:
- First, the reinforcement learning paradigm is flexible, and its techniques can build on top of existing deep learning models.
- Second, developing unseen environments is critical for transferring and evaluating model learning.
- Third, system architecture design is critical to enable continuous adaptation of reinforcement models.
We have provided a framework for pre-training and transferring a learned policy to unseen environments. We have created new environments to evaluate and fine-tune reward function, policy, and predicted actions during the reinforcement loop. Finally, we established system design approaches to enable rapid re-learning when deploying reinforcement models to web applications.
We are proud of these technical achievements that we believe will enable user impact. Our real world inspired games aim to help draw connections to daily tasks and ultimately improve everyday life for our users. NeuroBeacon is a personalized platform down to its core that learns and adapts as the user does, empowering a learning experience unique to the individual. With growing numbers of adults experiencing cognitive decline, we hope NeuroBeacon can be there to meet the needs of these future users and Hold Back the Night.
Acknowledgements
We would like to express our gratitude to our Capstone professors, Todd Holloway and Puya Vahabi, for their support and guidance during development of NeuroBeacon. We would also like to thank Dr. Gina Calloway who provided relevant cognitive science expertise supporting game development and survey creation.
Additionally, thank you to all of our MIDS peers and professors who empowered us with the technical skills as data scientists to bring NeuroBeacon to life.