MIDS Capstone Project Spring 2025

AerMAE

Team members

Vision Transformers for Aerial Greenspace Segmentation and Change Detection

Problem & Motivation

Urban greenspaces—such as parks, tree cover, and open land—are vital for environmental health, public well-being, and climate resilience. However, tracking how these areas evolve over time is difficult due to the scarcity of labeled data and the complexity of interpreting historical aerial imagery.

Data Source & Data Science Approach

AerMAE addresses this gap with a self-supervised learning approach that segments greenspaces and detects land cover changes—without relying on manual annotations. Using historical black-and-white imagery from the U.S. Geological Survey (USGS), we pretrain a Vision Transformer (ViT) using masked autoencoders (MAEs), enabling the model to learn spatial features by reconstructing masked portions of each image. We then fine-tune the model on a smaller labeled dataset to perform pixel-level segmentation of greenspaces.

Evaluation

We benchmarked AerMAE against leading segmentation models, including Segformer B3, ResNet50+UNet, and UNet. AerMAE outperformed all baselines, achieving an F1-score of 0.844 and an IoU of 0.732. It accurately captured dense tree canopy in forests and structured greenspaces in urban settings, though small vegetation patches in densely built environments remain a challenge.

Finetune Comparison

Sample segmentation prediction on urban environment

Sample segmentation prediction on forested environment

Key Learnings & Impact

This project shows how self-supervised learning—originally developed for natural language processing—can be effectively adapted for visual tasks in environmental monitoring. AerMAE enables scalable, label-free analysis of greenspace evolution, with applications in urban planning, climate resilience, and environmental justice.

Acknowledgements

We thank Dr. Cornelia Paulik for her guidance and feedback, and the U.S. Geological Survey (USGS) for providing open access to historical aerial imagery. We also acknowledge the authors of the MAE and ViT architectures for open-sourcing their work, which enabled this project.

Resources

Last updated: June 17, 2025