MIDS Capstone Project Spring 2024

Managing Drug Shortages


A drug shortage is a period of time when the demand or projected demand for a drug within the U.S. exceeds the supply of the drug. As of November 2023 there were 309 drug shortages ongoing in the U.S. - the highest in nearly a decade. Critical medications for ADHD, chemotherapy and diabetes have constantly been in low supply. Drug shortages have negative impacts on consumers, healthcare providers and pharmaceutical companies. The average drug shortage affects 500,000 consumers and results in a 16.6% increase of the price of drugs in shortage.

Operations managers are responsible for maintaining an adequate supply of prescription drugs at patient care facilities such as hospitals. To maintain an adequate drug supply, operations managers analyze drug inventory levels, usage rates, and drug supply data to determine the appropriate amount of drugs to stock. Today, there are limited tools that offer insight into historical and future drug shortages. Operations managers use manual processes that involve visiting multiple web sites to determine if a drug is at risk for shortage in the near future.

Related Work

Due to the economic and patient impacts of drug shortages, several academic studies have attempted to build predictive models to predict drug shortages. Liu et al. (2021) used purchasing, formulary and historical shortage data to create multiple logistic regression models to classify drugs as high shortage risk or low shortage risk. They identified that drugs attributes including route and drug type can be useful in predicting drug shortages. Pall et al. (2023) used demand-side pharmacy data and historical drug shortage data to build gradient boosted models to predict drug shortages. They report 69% accuracy in classifying shortage risk across four categories without access to supply-side manufacturing data.

Our modeling approach differs in that we use only publicly available data sources from the Internet to create a predictive drug shortage model. Using only publicly available data poses a challenge since many drug shortages occur due to demand-side and supply-side factors that are not publicly disclosed. We attempt to overcome these challenges by engineering time-series features that will improve drug shortage prediction accuracy.


Our product assists operations managers at hospitals or group purchasing organizations in managing drug shortages to reduce patient health impacts by highlighting drug shortage risk scores, predicting drug shortages, and presenting publicly available FDA information in a more digestible manner. Our product:

  • Aggregates data from disparate public FDA data sources into a single tool that provides insights into historic and predicted drug shortages
  • Includes a predictive drug shortage model to provide timely insights into the likelihood of a shortage over a 4 week time horizon
  • Provides a user-friendly analysis tool that enables Operations Managers to quickly assess drug shortage risk across various dimensions


Data and Pre-Processing

Our product uses only publicly available data from several Internet sources. Drug manufacturers are required to report drug shortages to the FDA which posts point-in-time drug shortage data to a publicly available web site. Our team collected data from the FDA shortage web site for a 2-year period to capture a historical record of drug shortages. Our product combines historical drug shortage information with the FDA's National Drug Code (NDC) database along with data from HIPAASpace.com. The result is a data set that merges drug attributes such as manufacturer, dosage, route, and active ingredients with historical shortage data.

To maintain adequate drug supplies on-hand, operations managers typically use a four-week time horizon when evaluating drug shortage risk. Our data pre-processing pipeline converts the historical drug shortage information and drug attributes into time-series data. We engineer several time-series features including historical shortages and current shortages per manufacturer to use as predictors.


We modeled the drug shortage prediction problem as a time-series binary classification problem that attempts to determine if a given drug will enter a shortage in the next four weeks. The training data was split using a time-series cross-validation scheme designed to prevent data leakage from the training set. We evaluated several different types of models including linear regression, random forest, and gradient boosting, XGBoost and LightGBM. The Hyperopt library was used to tune hyperparamters and each experiment was tracked in MLFlow. Finally, we experimented with several model calibration techniques to appropriately set decision thresholds for predicted probabilities. 

Model Evaluation

We evaluated the shortage predition model using several metrics including precision, recall and AUC. The best performing model prior to model calibration was the logistic regression model which resulted in a 94.7% cross-validated AUC. Post-calibration, the random forest model had the highest precision, recall and AUC across all models with 0.1%, 72.1% and 93.2% respectively. The strongest predictors of a drug shortage were adverse events and drug age.


Our MVP combines a predictive shortage model with a user interface designed to enable operations managers to assess the risk of and respond to future drug shortages. Our product includes several Tableau dashboards that allow operations managers to see drugs that have the highest likelihood of going on shortage as well as key drug attributes such as compliance events and inspections. We also provide the ability for operations managers to visualize drug shortage trends over time.

The MVP includes functionality that allows operations managers to search for specific drugs and view detailed information such as drug attributes, shortage history and potential substitutes that can be used during periods of high shortage risk. By combining historical, current and predictive modeling data into a single user interface, we provide operations managers with a set of tools to automate the current workflow for managing shortage risk and ensuring adequate drug supplies. 



Data Sources

Managing Drug Shortages
Managing Drug Shortages
Screen Example
Screen Example

Last updated:

April 19, 2024