Eligibot
Our Solution
Eligibot is your AI clinical trial buddy, designed to connect you with life-saving treatments. By entering your health details, Eligibot recommends trials that best match your eligibility criteria and additionally predicts risk of trial termination and adverse events. Eligibot helps advance medical research by guiding you through the process of enrolling in suitable trials quickly, transparently, and clearly.
Problem
Statistics show that less than 5% of adult cancer patients participate in clinical trials, despite surveys showing that up to 70% are willing. My grandfather was included in this 70% who ended up passing away from lung cancer when I was 16. Especially as an immigrant who didn’t speak English, the clinical trial system was not something comprehendable nor navigable. For many patients with complex illness, this is also their reality.
Clinical trials are critical for advancing medical research, but enrolling patients in appropriate trials remains a significant challenge. The global clinical trials market is huge and projected to reach $146 billion by 2033. However, almost half of clinical trials experience recruitment delays, often exceeding a month, which can be life-threatening for patients needing urgent treatment. Delays can cost up to $8M per day for a potential blockbuster drug, and if scaled up, a two-month delay may amount to an opportunity cost of $480M. Patient enrollment and recruitment to date is the biggest cost driver of clinical trials.
Data and Methods
Data Source
There are two main data sources we are utilizing for our models, Physionet’s MIMIC-IV and National Institutes of Health ClinicalTrials.gov.
MIMIC-IV
We use MIMIC-IV to train and evaluate our models on real-world clinical documentation, focusing on discharge summaries, lab findings, and screening reports due to their consistency and depth. This data enables accurate extraction of medical conditions, procedures, and other relevant terms critical for trial matching.
ClinicalTrials.gov
Clinical trial data, from sources like clinicaltrials.gov, provides detailed information on study protocols, including eligibility criteria, interventions, and study objectives. This dataset allows our system to align patient-specific clinical features with trial requirements, ensuring precise and relevant matches. It is essential for enabling our AI models to assess trial suitability and rank options based on fit and predicted risk.
Data Pre-Processing
MIMIC-IV
We are combining two main csv files together, discharge (331K encounters, 146K distinct patients) and radiology (2.3M reports, 237K distinct patients). The majority of patients have one discharge encounter, some will have multiple, and rare cases frequent encounters within the span of 2 weeks or greater than 5-10 years. We are opting to use a patient’s first discharge and every next discharge at least 12 months later. The discharge information is long semi-structured clinical notes text and we did some work to condense the patient summary inside the note section. The radiology information has a 94% of patients in the target table which includes radiology or lab detailed background and interpretations.
Because of the anonymous nature of MIMIC-IV, we did some extrapolation on the ages. They are estimated based on anonymized “anchor years” and averaged across all encounters for a given patient.
MIMIC-IV has fairly balanced genders
Records each have a “type of intervention” for general categorization
Common keyword check, not as robust as standardized diagnosis codes (e.g. ICD)
Clinicaltrials.gov
There are ~8,000 trial observations, accessible by API query or bulk download. We did some pre-processing to remove null values and format strings properly. The relevant columns for our use include study_totle, brief_summary, conditions, interventions which are the most descriptive.
Many trials are in the early stages, as observed in Phase 1 and Phase 2
Most trials require the patients to be adults and older adults
Product Overview
High-level overview architecture of the application
Inside the orange box is our core backend system. Inside it, we have multiple models and data sources that work together to generate clinical trial recommendations.
Starting on the left, the user provides input—this could be a patient’s discharge summary, structured clinical data, or a free-form text description of their medical situation. That information goes into what we call the Retrieval Model. The goal of the Retrieval Model is to take the user’s medical context and automatically extract important terms—like conditions, symptoms, and relevant treatments and uses these keywords to query ClinicalTrials.gov, which is maintained by NIH and is a database of federally and privately supported clinical trials conducted in the United States and around the world. By matching the keywords derived from the user’s data to active trials, we get a list of potentially relevant studies.
Once we have a list of 5-10 potentially relevant studies, these studies are fed through an Eligibility Matching Model, which takes the trials we just retrieved and applies each study’s inclusion and exclusion criteria to the user’s medical profile. Some criteria are straightforward: ‘Is the patient between 18 and 65 years old?’ or ‘Is the patient pregnant?’ But others get more nuanced—like ‘Has the patient had this specific type of therapy within the last six months?’ or ‘Are there any comorbid conditions that would conflict with the study’s protocol?’
In parallel, these same shortlisted studies are fed to our Risk Prediction Model. This model relies on historical trial data to estimate the likelihood of early termination or adverse events for the patient’s specific profile. By combining both eligibility checks and risk assessment, we can deliver a final list of trials that not only meet the formal criteria but also align better with the patient’s overall clinical outlook.
Finally, these results are displayed on our Eligibot Website. We built that front-end with Streamlit, which allows us to present the findings in a simple, interactive UI that can be easily changed through iterations.
Models
Retrieval Model
Training
For our retrieval pipeline, we mainly leveraged OpenAI’s API models and finetuned them on the MIMIC Database, a rich source of de-identified patient data from ICUs. Each patient record in MIMIC includes discharge summaries, lab values, and radiology reports. Each record also comes with standardized ICD codes, which are standardized codes for diagnoses, e.g. ‘I10’ for “primary hypertension”. To fine-tune the models, we used these codes as ground-truth labels. We used a small set of 50 random MIMIC entries, split into 40 training samples and 10 validation samples as a proof-of-concept while also minimizing fine-tuning costs.
Results
To assess the performance of our fine-tuned retrieval model, we evaluated both training and validation losses over three epochs using a small, cost-efficient subset of the MIMIC dataset (40 training samples, 10 validation samples). The final model achieved a training loss of 0.3174 and a validation loss of 0.6167, indicating that the model successfully learned from the structured clinical data without overfitting.
More importantly, qualitative comparisons reveal a significant improvement over the base GPT-4o model. While the base model could identify general diagnoses from patient summaries, our fine-tuned version demonstrates a marked increase in diagnostic specificity. For instance, instead of returning vague descriptions like “heart issue” or “high blood pressure,” the fine-tuned model reliably outputs precise ICD-based labels such as “congestive heart failure” or “essential hypertension (I10).”
This increased granularity is essential in a clinical trial matching context, where eligibility often hinges on exact diagnoses. The added specificity from our fine-tuned model ensures that patients are matched to trials with greater accuracy and confidence, ultimately improving both recruitment efficiency and clinical outcomes.
Eligibility Matching Model
Training
For our eligibility matching model, we continued to leverage OpenAI’s API model in order to evaluate the patient’s records against the underlying retrieved trial eligibility criteria. We tuned the model across multiple iterations to filter out noise which we identify as assumptions or trivial information when enrolling in a trial e.g. evaluating patients prior to hospital admission and patients consenting to enrolling in a trial. Additionally, by analyzing the data from clinicaltrials.gov, we observe that many of the trials have inverse criteria statements e.g. inclusion: “Patient is over the age of 18” and for exclusion: “Patient is 18 years old or younger” where the model can filter one of these statements for later accurate scoring.
Results
For the eligibility matching assessment, we created a custom evaluation and scoring system. To do this, we compared the assessed criteria labels from training against 50 manually labeled trials completed by each of our team members; with criteria labels similar to how we expected the model to perform. By running these specific trials through the model and observing results, these labels act as our ground-truth labels and we used these to verify how well the model’s assessed eligibility criteria aligned with our expectations. In other words, for direct evaluation on how the model performs, we measured how many of the model’s labels overlapped with our own labeled data. The labels are then mapped to numerical values depending on inclusion and exclusion criteria and custom scoring is applied to rank trials.
Risk Prediction Model
The risk prediction models provide users with two additional metrics that inform the patient of the risks associated with participating in each of the user’s set of returned trials.
The risk metrics are:
- Risk of Premature Termination
- Risk of Adverse Events
The risk of premature termination metric is a probability of the trial ending before planned, indicating the potential for incomplete treatment or evaluation for the patient. Premature termination can be caused by funding issues, recruitment issues, change in business objectives, and even safety issues. As such, the metric can help our user consider and research the risk of participating in a trial for which these issues may be present.
The risk of adverse events is a metric, scaled from 0 to 100, that indicates the risk of undesirable health events. While adverse events (AEs) are an inherent aspect of clinical trials, the probability and severity of these events may vary across trial interventions and designs. This metric provides the user the ability to consider their personal risk appetite in comparing potential clinical trials.
Training
To predict the risk metrics for clinical trials, we train with data from clinicaltrials.gov, (the same data source for our retrieval model). We select trials that have ended, including those that have successfully completed and those terminated early. This information is used as the target variable for our premature termination outcome.
For the adverse events outcome, we also process the results from completed trials to determine the count of adverse event reports. Adverse events are categorized into “serious” or “other”, with serious AEs defined as those causing death, life-threatening circumstance, or long-term impairment. The AE risk score for a trial is calculated as the count of adverse events over the number of subjects in the trial. We weigh the “other” AEs as 1/3. The scores are then capped at 1.33, accounting for the score of 1 “serious” and 1 “other” AE per patient. Note that the results information is not available for all trials, as there are administrative requirements that dictate which trials will post results publicly. This leads to a potential limitation of distributional shift between the training instances and the actively recruiting trials we provide to users.
Our trial data for MVP dates from January 2018 to January Jan 2025. We extract key trial features, including its start date, randomization strategy, allocation, phase, purpose, and government oversight status. To account for the COVID pandemic, which had major impact by side effect disruption as well as strict administrative regulation, we include the Oxford Stringency Index.
Additionally, most of the data available in clinicaltrials.gov is free text written by the researchers defining their study design and motivations, intervention, conditions of interest, outcomes measured, and criteria. We section the text into 3 sections: Introduction, Outcomes, and Criteria.
To handle the complexity and sparsity in our text models, we train on 50k trials through a deep neural network model, leveraging basic key features as well as rich text data from researcher input. The 3 text sections for each trial are separately implemented with trainable embeddings using BioBert, and pooled into 3 separate pooled representations. Modality weighting is used to learn the importance of each text section and key feature.
Results
In practice, roughly 10% of clinical trials will end prematurely. Given the nature of predicting risk, we favor recall, successfully detecting cases where premature termination risk is potentially high. We are able to produce an F-1 score of 0.42, with a recall of 75%. We compare this to a random biased classifier that determines trials to be terminated early 10% of the time, which will produce a recall and precision of only 10%.
In predicting adverse event relative risk, we produce an R2 score of 0.42, indicating a moderate capture of variability. The result is capped to range [0, 1.33] for scaling and normalized to a percent score.
Ultimately, we provide users with a percent outcome to provide the soft interpretability for the user, who can weigh the relative risk across different trials. This makes classification accuracy less dire should there be error.
Key Learnings & Impact
Impact
This AI-powered clinical trial matching system represents a major leap forward in personalized medicine and research accessibility. By using large language models to interpret complex, unstructured data from electronic health records, the system enables a highly accurate and scalable way to identify relevant clinical trials. The integration of an LLM to assess eligibility criteria, along with a risk prediction model, ensures that recommendations are not only precise but also tailored to individual patient risk profiles. This dramatically reduces the time and effort required by clinicians to find suitable trials, increases enrollment rates, and improves the chances of patients accessing cutting-edge therapies. Ultimately, the product demonstrates how advanced AI can close the gap between patients and research, making trial participation faster, smarter, and more equitable.
Top Technical Challenges
- Medical terms are difficult to understand and decode, both to non-medical personnel and to LLMs.
- Using medical codes and their associated terms to help simplify semantics.
- Objectively defining “Risk” and how to implement it in a model to create a risk prediction score and adverse events.
- Working with limited computational resources and finding solutions to combat this barrier for both web development and model deployment. Especially with long text inputs and large models.
- Medical free text written by administrators of different institutions and backgrounds is extremely vast, and the sparsity of different medical interventions and strategies is also innumerable. This makes for noisy data and unclear boundaries for risk prediction. In fact, clinical trials exist for the very purpose of pushing the frontiers of the unknown, which simply makes prediction a process to be forever improved.
Future Work
For future work, we aim to implement a notification system for patients. This system would securely store patient input even if no matching clinical trial is found immediately. It would then automatically notify them if a suitable clinical trial becomes available later, despite no match being found initially. This ensures patients stay informed about new opportunities without needing to re-enter their information. Our goal is to make the clinical trial matching process more accessible and continuous.
Second, we plan to improve our user login system by offering more secure and flexible authentication options beyond just Google login. Our goal is to make the platform more accessible, user-friendly, and secure for all patients.
Furthermore, we look to implement continuous learning with new incoming clinical trial data and tracking user behavior. We want the model to continue to evolve to learn about new interventions and clinical trial trends for effective and improved prediction. The system will also learn from user behavior, understanding how they evaluate the trials and metrics provided to them, and how they click into next steps for these trials, such as clicking on the contact information provided inside Eligibot.
Acknowledgements
We would like to sincerely thank our professors, Kira Wetzel and Zona Kostic. Their enthusiasm and dedication to teaching the course truly inspired us. Their thoughtful guidance and insightful feedback played a key role in shaping our project. We especially appreciated their willingness to share their expertise and offer support at every step. This project wouldn’t be what it is without their invaluable contributions.
We would like to extend our heartfelt thanks to our TA, Billy Fong, for his incredible support during the AWS deployment. His quick response and willingness to help on such short notice made a huge difference. The technical resources he provided were essential in getting our model up and running. His guidance helped us navigate what initially felt like an overwhelming task. We’re truly grateful for his expertise and support.
We sincerely appreciate collaborating with Joseph Chen, who provided a thorough audit of our project’s potential ethical issues. His insights helped us think critically about the broader implications of our work. Joseph’s thoughtful approach and attention to detail greatly strengthened this important part of our project. Working with him was both productive and inspiring. We truly appreciate his contribution to our team.
References
- Comis RL, Miller JD, Aldige’ CR, Krebs L, Stoval E. Public attitudes toward participation in cancer clinical trials. J Clin Oncol. 2003 Mar 1;21(5):830–5. doi: 10.1200/JCO.2003.02.105.
- Deloitte. Intelligent Clinical Trials. (2020). https://www2.deloitte.com/content/dam/insights/us/articles/22934_intell…
- Deloitte. Patient recruitment is often the holy grail for clinical research. could virtual trials improve our chances of reaching it? (2020). https://www2.deloitte.com/us/en/blog/health-care-blog/2020/patient-recr…
- Murthy VH, Krumholz HM, Gross CP. Participation in cancer clinical trials: race-, sex-, and age-based disparities. JAMA. 2004 Jun 9;291(22):2720–6. doi: 10.1001/jama.291.22.2720.
- Stillman B. (2023). Please be dying, but not too quickly: a clinical trial story. Substack. https://bessstillman.substack.com/p/please-be-dying-but-not-too-quickly
- Unger, J. M., Cook, E., Tai, E., & Bleyer, A. (2016). The Role of Clinical Trial Participation in Cancer Research: Barriers, Evidence, and Strategies. American Society of Clinical Oncology educational book. American Society of Clinical Oncology. Annual Meeting, 35, 185–198. https://doi.org/10.1200/EDBK_156686