IntegraMS: A Multimodal Clinical Intelligence Platform Anchored in Proteomics for Predicting MS Progression
Introduction
Problem & Motivation
Multiple Sclerosis (MS) diagnosis and management are multifactorial, yet real-world research workflows often resemble a patchwork of disconnected systems. One increasingly important tool in immunology research and drug discovery is phage display–based antibody profiling, a high-throughput technique that enables rapid, unbiased screening of patient antibodies against large “libraries” of proteins. These experiments give a detailed view of immune activity at scale by revealing which specific proteins—or patterns of proteins—are recognized by patient antibodies. In parallel, circulating biomarkers such as neurofilament light chain (NfL) are increasingly measured in blood as a proxy for neuronal injury.
In a recent study, Zamecnik, Sowa et al. (2024) used phage display to screen antibodies from patients with MS and demographically matched healthy controls against a library spanning all the possible human proteins. The authors identified a subset of patients with MS whose antibodies were enriched for a specific protein pattern detectable in blood samples collected up to five years prior to symptom onset. This work represented an important advance, demonstrating that antibody-based proteomic signatures can reflect disease-associated immune activity well before clinical presentation. While this study cohort included clinical information, the relationship between these proteomic patterns and downstream clinical outcomes such as disability progression was not explored.
IntegraMS began as an effort to systematically link these high-dimensional proteomic measurements with available clinical data from this study cohort. As this work progressed, it became clear that the challenges encountered—including data harmonization, time alignment, interpretability, and scalability—were not unique to a single study, but instead reflected a broader field-wide gap between cutting-edge proteomic research and clinically relevant insights.
A bird’s eye view:
- Phage display experiments tell us how patient antibodies from before and after diagnosis react to 500000+ proteins
- Neurofilament Light Chain (NfL) levels tell us about how neuronal injury changes from before and after diagnosis
- Clinical information includes demographics, symptom history, and a disability severity score (DSS)
- DSS is the gold standard used in MS care to measure disability progression
Some challenges:
- Harmonizing these siloed datastreams
- Interpreting the highly-dimensional phage display data
- Inconsistent temporal alignment across disease milestones
- No standardized pipeline to connect immune fingerprints to clinical outcomes
The results: fragmented and slow analysis; difficulties in translating research findings across cohorts and institutions.
IntegraMS addresses these challenges by providing a cloud-based dashboard and modeling framework that integrates proteomic and clinical data within a structured, reproducible workflow. By combining peptide-level filtering, clinical timeline alignment, cohort construction, and interpretable modeling, the platform enables clinician-scientists to explore relationships between immune activity and disability trajectories in consented research populations. IntegraMS is designed to bridge the gap between research-grade molecular data and clinically relevant insight—supporting discovery, hypothesis generation, and translational research.
Data Source & Preprocessing
Data Science Approach
The IntegraMS modeling pipeline focuses on a single primary prediction target: the current Disability Status Scale score (DSS_Cur) on its 0–10 integer scale. To generate these predictions, the model incorporates demographic variables, disease timeline features such as time from symptom onset to diagnosis and time since diagnosis, aligned DSS history, and a curated peptide-level feature set derived from phage display proteomics. By integrating these diverse inputs, the system is able to model disability status in a way that reflects both clinical progression and underlying proteomic signatures.
Beyond point prediction of current disability, IntegraMS also implements extrapolation techniques to forecast disability progression over time. A key component of this is a survival-based model parameterized with the Weibull_min distribution, which estimates the time until patients reach clinically meaningful disability thresholds such as DSS6 or DSS8. We chose the Weibull distribution because it is widely used in medical progression modeling and naturally captures how disease evolves over time, including accelerating, decelerating, or constant hazard patterns. In parallel, an ensemble progression module averages predictions from exponential, logistic, and power-law progression curves, providing a flexible family of trajectories that can represent both steady and abrupt clinical changes. Together, these methods allow the system not only to estimate current status, but also to project plausible future trajectories under multiple progression dynamics.
To enable robust learning, the pipeline uses a combined dataset consisting of real DoDSR patient records and a large synthetic cohort, with demographic distributions balanced through SMOTE. This blended dataset helps the model capture a broader range of clinical and demographic variation. The modeling approach emphasizes interpretability, ensuring that researchers can understand how different features—both clinical and proteomic—as well as the parameters of the survival and trajectory models, contribute to disability prediction and progression forecasts, while still taking advantage of modern machine-learning methods capable of handling high-dimensional data.
Developing this model required overcoming several key challenges. One major hurdle was the extreme dimensionality and noise inherent in raw proteomic data, which necessitated careful filtering and feature engineering to distill meaningful biological signals. Another challenge was the inconsistent timing of clinical measurements, requiring rigorous temporal alignment to maintain a coherent disease timeline. The coarse, integer-based nature of DSS introduced additional modeling difficulty, as did the limited size of the real-world patient cohort used for validation. The current model configuration addresses these issues through targeted preprocessing, synthetic data augmentation, a carefully structured train/test split, and survival/curve-fitting procedures that are robust to sparse and irregularly spaced longitudinal data.
Looking forward, several planned extensions aim to deepen the clinical usefulness and predictive strength of the system. Future iterations may incorporate richer survival or time-to-event modeling to refine estimates of when patients are likely to reach critical disability milestones such as DSS6 (when a patient requires a cane or walking assistance) or DSS8 (when a patient is wheelchair bound), and to quantify uncertainty around those forecasts. Additional work will include systematic hyperparameter tuning, exploration of feature interactions to capture nuanced relationships between demographic and clinical factors, and grouping peptides into biologically meaningful pathways to enhance interpretability and clinical relevance. Together with the survival-based Weibull_min modeling and the exponential/logistic/power ensemble of progression curves, these enhancements will help evolve IntegraMS into an even more powerful tool for understanding and forecasting MS progression.
System & Architecture
IntegraMS is implemented as a cloud-native biomedical research platform using AWS and Firebase. The frontend, built with React and Tailwind, serves as the main interface for researchers. Through it, users can upload patient-level data, inspect curated peptide feature summaries, and view model-generated DSS predictions.
The backend is written in Node.js and is responsible for authentication, data validation, preprocessing, and coordination with AWS. Firebase Authentication secures user access, and the backend enforces schema validation for uploaded CSVs containing clinical and proteomic information. Once validated, data are stored and prepared for model inference.
Model inference is containerized and deployed on AWS ECS. Model images are stored on ECR, while SQS manages inference job queues and CloudWatch provides monitoring and logging. Raw and processed data—including proteomic inputs, model outputs, and intermediate artifacts—are stored in S3. Processed predictions and associated metadata are surfaced via Firestore to support responsive, queryable views in the dashboard.
The minimum viable product is designed to move researchers efficiently from raw data to model prediction:
- Upload a CSV file containing proteomic and clinical features.
- Allow the backend to validate, parse, and preprocess the input.
- Trigger the ECS-based inference service to generate DSS_Cur predictions.
- Display these predictions, along with key contextual information, in an integrated dashboard.
This architecture provides a reproducible, scalable environment for running MS disability prediction models on proteomic and clinical data, and it lays the groundwork for future expansion to additional cohorts, features, and prediction tasks.
Key Learnings & Impact
In building IntegraMS end to end, the team learned that synthetic data plays a critical role in strengthening models trained on small, specialized clinical cohorts. Aggressive yet biologically grounded peptide filtering is essential to transform hundreds of thousands of raw proteomic signals into a feature space that ML models can learn from reliably. The DSS outcome variable, while clinically useful, poses modeling challenges due to its coarse, integer-based scale and its collection at irregular intervals, making temporal alignment indispensable.
Interactions with clinicians and domain experts emphasized the importance of interpretability and biological plausibility, not just numerical performance. These insights shaped how features were engineered and how the pipeline was designed and documented. Finally, transitioning from exploratory notebooks to a fully deployed cloud system underscored the importance of robust infrastructure—containerization, monitoring, data management, and a usable front-end.
Taken together, these components make IntegraMS a practical, extensible platform for investigating MS disability progression through the lens of proteomics and clinical data, and a foundation for future work in precision neurology.
Achievements & Next Steps
Unifying Data. Forecasting Progression. Empowering Care. We have successfully bridged the gap between scarce data and actionable intelligence. By fusing automated phage analysis with clinical forecasting, our FusionMS platform delivers a comprehensive view of Multiple Sclerosis—predicting long-term DSS trajectories while identifying the top molecular drivers unique to each patient.
Next-Generation Clinical Decision Support We are now scaling this foundation into a fully AI-driven ecosystem. Future iterations will leverage Generative AI to provide instant clinical summaries and a RAG-enabled engine that connects molecular findings to the latest research. Our goal is clear: to move beyond simple analysis and create a dynamic, self-improving decision-support system for the next era of MS diagnostics.
Acknowledgements and References
We thank Dr. Michael Wilson and the team at UCSF Neurology for their guidance and providing access to the primary dataset. We also thank Dr. Mitchell Wallin at the Department of Veterans Affairs for his guidance and advice for interpreting the clinical data. We thank Joyce Shen and Korin Reid for their guidance and expertise throughout this project.
Zamecnik, C.R., Sowa, G.M., Abdelhak, A. et al. An autoantibody signature predictive for multiple sclerosis. Nat Med 30, 1300–1308 (2024). https://doi.org/10.1038/s41591-024-02938-3
Multiple sclerosis, Mayo Clinic. https://www.mayoclinic.org/diseases-conditions/multiple-sclerosis/symptoms-causes/syc-20350269
How Many People Live With Multiple Sclerosis?, National MS Society. https://www.nationalmssociety.org/understanding-ms/what-is-ms/who-gets-ms/how-many-people
Greenfield AL, Hauser SL. B-cell Therapy for Multiple Sclerosis: Entering an era. Ann Neurol. 2018 Jan;83(1):13-26. doi: 10.1002/ana.25119.
