MIDS Capstone Project Spring 2025

Mo’ Data sNO’w Problem

Team members

Motivation

Snow and the economy - California drought 2020 — Figure 1: Taken from Escriva-Bou et al., 2022.

Water from snowmelt provides up to 70% of the total annual water supply for many regions in the Western United States (Foster et al., 2011). This is important because years with below average snowfall can have serious economic impacts beyond residential water use. For example, a low snowpack in the Sierra Nevadas in 2020 contributed to an 85-thousand acre-feet shortage of water creating a $68 million economic loss in the San Joaquin watershed alone (Fig 1; Escriva-Bou et al., 2022). Each year, water resource managers have the important and challenging responsibility to use accurate snow measurements to forecast the streamflow to support water allocations for the many use cases within their basins. As water becomes more scarce in these arid regions, the water resource managers require higher accuracy of their snow estimates and thus streamflow forecasts.

Data

In mountainous regions wind, topography, and precipitation make a snowpack variable in space and time. This variability makes it difficult to collect measurements of snow properties and predict its distribution across a landscape. The snow variable of interest used in this project is snow-water equivalent (SWE), which is the amount of water melted from a snowpack. Water managers often multiply this linear measurement over an area to estimate the volume of water (e.g. acre-ft or thousand of acre-ft) to better translate a snowpack into terms relevant to streamflow forecasts. SWE measurements can be collected manually or via snow-telemetry (snow pillow) stations by a combination of snow depth and weight measurements (Fig 2). The California Department of Water Resources operates a sparse network of approximately 130 snow pillows throughout the state that provide important observations of SWE at point locations. If we think about measuring SWE over larger geographic regions, the Airborne Snow Observatory (ASO) captures snow depth using high-resolution lidar images and converts them to SWE. This is considered the gold standard of SWE observations. However, due to its high acquisition costs, ASO data is very sparse in time and limited to select regions. Additionally, atmospheric data (e.g. precipitation, air temperature, wind speed, humidity, etc) from reanalysis products can be used to better understand the spatial distribution of snow properties at coarser resolutions.

Snow pillow diagram and time series — Figure 2: Snow pillow diagram (left) and time series of the Tuolumne Meadows snow pillow and dates of ASO flights within the Tuolumne basin (right).

Goals

We conducted a research project utilizing two streams aimed at developing a workflow for water managers to make more accurate snow predictions using data and machine learning for two California basins (Tuolumne and San Joaquin). The two workflows (hypotheses) are further described below. Additionally, through our research we developed further recommendations and future research for imputation strategies to handle the spatiotemporal dimensions of the snow pillow network. The website containing our Minimum Viable Product (MVP) is provided in a URL link below.

Hypothesis 1 - Less Data (Mo' Data Mo' Problems)

Develop simple machine learning techniques to predict ASO mean SWE using the snow pillow network.
The less data approach leverages interpretability using simple models and smaller datasets that are less vulnerable to data quality issues and noise.

Hypothesis 2 - Mo' Data (Mo' Data sNO'w Problem)

Increased model and data complexity will lead to the most accurate predictions.
In the age of information, more is better!

Benchmark

These constitute the most heavily referenced and publicly available distributed snow datasets.

Snow Data Assimilation System (SNODAS) developed by the National Oceanic and Atmospheric Administration (NOAA)
- https://nsidc.org/data/g02158/versions/1
University of Arizona SWE (UASWE)
- https://snowview.arizona.edu/

Key Performance Metrics

Mean absolute error (MAE) within 10% of label (ASO).
More accurate predictions compared to benchmark competitors (SNODAS and UASWE).

Results

Less Data: showed promise at predicting mean SWE within two test basins (Tuolumne and San Joaquin).
- MAE within 10% of ASO.
- More accurate than SNODAS.
- Less accurate than UASWE for total mean SWE but more accurate for SWE above 10,000 feet of elevation. As a result, we expect this product to be better than UASWE in the melt season when a greater proportion of the snow is located at higher elevations in the basins.
- SWE is highly correlated in these basins and minimal observations and features are needed to make accurate predictions.
- The imputation strategy is important for optimizing predictions.
Mo Data: showed promise at predicting mean SWE within two test basins (Tuolumne and San Joaquin).
- MAE within 10% of ASO
- More accurate than both SNODAS and UASWE

Given our findings, we would recommend using the Mo Data workflow for situations in which water managers have the technical skills and computational resources to run and interpret these models, as they are proved to be most accurate for our test basins. Furthermore, we see promise in using the Less Data workflow for predicting mean SWE, depending on the reliability of the snow pillow network and/or imputation strategy. Furthermore, the interpretability of this approach lends itself for optimizing the use of snow pillows for water resource management.

Key Learnings

In a field that has historically relied on complicated physically-based models or simple statistical models, water resource managers can leverage Less Data and Mo Data machine learning approaches to better estimate SWE and improve their streamflow forecasts.
Imputation is often treated as a one-size-fits-all task driven by missingness rates or dataset size. In contrast, our work shows that metadata, station-specific spatiotemporal traits, and domain semantics demand a more customized and context-aware approach.

Future Work

Extend Less Data and Mo Data to other basins throughout the Western United States that encompass different snow classifications (Sturm and Liston, 2021) and degrees of snow pillow information to better understand how these models generalize.
Develop context-aware and symmetry-informed imputation strategies for snow pillow time series (see attached abstracts for more details).

Acknowledgements

We would like to thank Taylor Winchell from Denver Water for their input on our research approach.

Citations and Data References

California Department of Water Resources (CDWR). (2025). California Data Exchange Center - Snow. https://cdec.water.ca.gov/snow/current/snow/index.html

Escriva-Bou, A., Medellin-Azuara, J., Hanak, E., Abatzoglou, J., Viers, J.: Policy Brief: Drought and California's Agriculture. Public Policy Institute of California. April 2022. https://www.ppic.org/publication/policy-brief-drought-and-californias-a…

Foster, J. L., Hall, D. K., Eylander, J. B., Riggs, G. A., Nghiem, S. V., Tedesco, M., Kim, E., Montesano, P. M., Kelly, R. E. J., Casey, K. A., and Choudhury, B.: A blended global snow product using visible, passive microwave and scatterometer satellite data, Int. J. Remote Sens., 32, 1371–1395, https://doi.org/10.1080/01431160903548013, 2011.

Painter, T. H., et al. (2016). The Airborne Snow Observatory: Fusion of scanning lidar, imaging spectrometer, and physically-based modeling for mapping snow water equivalent and snow albedo. Remote Sensing of Environment, 184, 139-152.

Sturm, M., and G. E. Liston, 2021: Revisiting the Global Seasonal Snow Classification: An Updated Dataset for Earth System Applications. J. Hydrometeor., 22, 2917–2938, https://doi.org/10.1175/JHM-D-21-0070.1.

U.S. Department of Agriculture (USDA) Natural Resources Conservation Service. (2025). SNOwpack TELemetry Network (SNOTEL). NRCS. https://data.nal.usda.gov/dataset/snowpack-telemetry-network-snotel.

For additional scientific references see the attached pdf documents.