MIDS Capstone Project Summer 2023

MRAI: AI for Radiologists

Empowering radiologists in delivering timely and precise MRI scan interpretations

Problem Statement & Motivation

Magnetic resonance imaging (MRI) is a powerful medical imaging modality that can be used to diagnose a wide range of diseases. A typical MRI scan often generates anywhere from 50 to a few hundred image slices. Radiologists manually segment tissues and detect potential diseases by iterating through each of these image slices of a patient. However, this process is extremely time-intensive and subject to inter- and intra-observer variations such as image artifacts, anatomy differences etc., and hence limits the use of routine MRI use in clinical practice.

Deep learning has the potential to address many of the challenges associated with MRI. By using deep learning, we can develop algorithms that can identify insights such as tissue segmentation and pathology detection. Tissue segmentation partitions the image into different segments that correspond to various tissue types (ex: femoral cartilage, tibial cartilage). Pathology detection identifies any abnormalities (ex: ligament tear, meniscal tear). Our MVP serves two purposes. First, it acts as an end-to-end pipeline with automated tissue segmentation analysis and pathology detection algorithms for radiologists. Second, it acts as an integrable and assistive solution enhancing diagnostic capabilities for MRI scanners. This could lead to shortened scan review times, improved diagnostic accuracy, and help address the expanding gap between the number of radiologists and the number of scans they can interpret.

Data Source

The SKM-TEA dataset is a publicly available dataset acquired clinically at Stanford Healthcare. It contains 24800 slices of 2D knee MRI images generated with the qDESS sequence on one of two 3 Tesla (3T) GE MR750 scanners for 155 anonymized patients. Each patient's scan contains 160 knee images, which allows radiologists to view the patient's knee, a 3d object, as a series of 2d images. The dataset includes a variety of MRI sequences, including T1-weighted, and T2-weighted images and manual annotated tissue bounding boxes and segmentation masks.

Additional details:

  • Training: 86 scans
  • Validation: 33 scans
  • Test: 36 scans
  • 4 Tissue labels for segmentation: patellar cartilage, femoral cartilage, meniscus, tibial cartilage
  • 2 Pathology labels derived from annotations:
    • abnormal (cartilage lesion, ligament tear, meniscal tear, effusion)
    • normal (no abnormality)

Data Science Approach

We have an end to end AWS infrastructure enabled architecture. For web interface we are using Apache Web server deployed on EC2 instance. All services interact using fast api endpoints. For model compute, we are using lambda for triggering sagemaker inference endpoint. Lambda can handle millions of request hence making architecture scalable. Sagemaker Notebook instance was used to train and deploy the model. For storage, we leverage S3 to store all raw and processed images and all model outcomes as jsons. 


For segmentation model, we evaluated our model using a per tissue label accuracy and overall recall score. For pathology detection model, we used recall since it measures the completeness of positive predictions. Since our model will be used to as an add on to enhance existing MRI scanners and expedite scan reading time for radiologists, ideally we will want to catch all segmentation masks and pathology detections for manual reviews later. We also performed qualitative evaluation by visually inspecting the segmentated and classified images.

Key Learnings

1. There are mixed opinions when it comes to using AI and deep learning in medical diagnostics. It is important to talk to subject matter experts early on in the project to understand their needs and concerns.

2. Large datasets can be challenging to work with. We dealt with 1.6 Tb of raw data, and processings of 160 images by two models for each patient. It is important to have a good data processing pipeline in place to efficiently load, clean, and prepare the data for training.

3. We understand Radiologists prefer DICOM because it is generated from scanner and contains patient’s information. We made the tradeoff to use H5 because it makes our product HIPAA compliant and it’s more ideal for compute.


We would like to acknowledge the following people for their contributions to this project. We are grateful for their support and guidance. 

  • DataSci 210 Instructors: Joyce Shen, Cornelia Ilin
  • The SMEs who provided us with valuable feedback: 
    • SMEs (External):
      • Dr. Julie Bauml;
      • Robert Lim MD;  
      • Dr. Akshay Chaudhari;
      • Jeremy Heit MD, PhD
    • SMEs (MIDS):
      • Fred Nugen;
      • Dr. Joseph Schwab;
      • March Saper
  • DataSci 231 Classmates: Jerry Qian & Nicholas Lee
  • DataSci 210 Classmates & TAs AWS Support




If you require video captions for accessibility and this video does not have captions, click here to request video captioning.

Last updated:

August 15, 2023