MIDS Capstone Project Fall 2023

Wildfire Prophet

Team members

Wildfires are a problem that is increasing in intensity and frequency. Alarming statistics reveal rising fire-related deaths and extensive property damage; therefore, wildfires result in a surge of insurance claims for property damage and losses.We wanted to see whether we could create a model that could assess the probability of wildfires in a specific area using satellite imagery and create a generalizable model. This is because the images are a valuable resource to assess land cover, vegetation, and more.

We used two different Kaggle datasets: one of Canadian wildfires that consists of only images, and one of US wildfires that is a spreadsheet of the data associated with that wildfire. We used the spreadsheet of US wildfires to generate the positive and negative labels, and then ping a MapBox API to gather the images to match the Canadian dataset. We generate the negative labels by randomly producing GPS coordinates and then ensuring that they are at least 1 mile away from any positive coordinates. We then used two more datasets (Canada, US) to determine if these coordinates are in urban or non-urban areas.

Afterwards, we extracted the LBP (local binary patterns) to use as features for our model which would then measure the accuracy of the image’s classification of whether it is a wildfire or not.

When we examine what it’s accurately classifying and what the model is making incorrect predictions, we see that there are many pictures that suggest urban areas. In order to test whether the model is learning information about the wildfires or maybe it is learning about urban vs. not urban, we examined the correlation between the urban and wildfire classifications, then compare the accuracies when the urban coordinates are removed.

We test a combination of the Canada, US, and a combination of both. We see that the Canada and US combination returns a comparable and a higher accuracy when tested against the Canada test set (92%) and the combination test set (83%) compared with the US dataset by itself and the Canada by itself by itself. However, when we take take away the urban coordinates of the Canada dataset, we see that the Canada dataset might actually be learning to classify urban coordinates because as soon as the urban coordinates were removed from the training set, accuracy plummeted on the test set which is a mixture of urban and rural coordinates.

Course

Data Science 210. Capstone , Fall 2023

Class Project Gallery

More Information

Wildfire Prophet

GitHub Repository

presentation.pdf

Last updated: December 14, 2023

Wildfire Prophet

Course

More Information

Video