SpaceChase
Our mission is to make LA more functional through intelligent parking solutions, giving Angelenos time back for life's meaningful moments.
The Problem
Finding parking in Los Angeles is notoriously difficult, leading to frustration and wasted time. Drivers often struggle to predict where and when parking will be available, especially in busy areas and during peak hours. Existing parking solutions focus on real-time crowdsourcing but lack predictive insights that help drivers plan ahead. SpaceChase aims to help downtown LA drivers find ideal parking spots in the future through machine learning, freeing users from the painstaking decision process of having to find on-street parking by enabling them to plan ahead.
Through surveying Los Angeles residents, we learned that Angelenos were frustrated by how long it took to find parking. Most surveyed spend 10 - 20 minutes searching for a spot in Downtown LA. 64% of respondents shared they would likely use an app that forecasts parking availability. Prediction of open spots, time limits, price information, and distance to their ultimate destination were among the top user-requested features.
The Data
Fortunately, the Los Angeles Department of Transportation (LADOT) maintains data on parking availability through its parking meters. First, there is a static file with meter IDs, rate ranges, rate types, address, and coordinates for nearly 34,000 meters across Los Angeles. However, not all these meters have sensors to detect and collect parking availability. A subset of about 5,500 meters regularly updates with parking availability - as informed by sensors on the meter. The sensors detect the presence of cars in a spot, so they are not based upon remaining time at the meter. This is more accurate than a payment-based definition of spot occupancy, because people can leave there cars for shorter or longer stints than they paid for. The meters with sensor data are found in a few neighborhoods but are most concentrated in Downtown LA. The sensor data was available in real time through an API, but there was also an historical database with second-by-second occupancy for every meter with a sensor for the past year. This historical occupancy data would become the basis of our model which forecasts meter-level availability by time of day.
Minimum Viable Product
Putting this all together, SpaceChase is a web application that predicts street parking meter availability in Downtown Los Angeles using machine learning. We have focused on the most feasible solution for proof of concept and have ideas on how to continuously improve on that. With that said, our MVP allows users to:
- Input their destination address
- Select day and time for their parking needs
- Pinpoint the closest parking meter and compare relative to desired destination
- See hourly forecasts for parking probability to reassess if needed
- Redirect navigation to desired parking location.
Per our surveys, this MVP offers multiple of the features most sought by respondents: namely prediction of open spots and distance to destination. Because the static meter information did not specify price schedules but generally provided a range of prices, we would need to physically inspect and document meter price information. This was not feasible for an MVP, so we don't yet have features to support users' desire for price information.
Click here for a recorded demo of our site, or click here to explore the site yourself!
Technical Solution
Data Source
We use historical archived parking meter sensor data provided by the City of Los Angeles. This archive contains high-frequency records – each meter reports its availability every second based on embedded vehicle presence sensors. These records include:
- SpaceID: Unique meter identifier
- EventTime: Timestamp (local time)
- Is_available: Binary label indicating whether the space was vacant or occupied.
We also include a supplemental dataset – LADOT’s Parking Meter Inventory Reference – which provides static metadata on each meter, such as SpaceID, Address, Latitude, and Longitude.
Data Preprocessing
- Cleaning and Concatenation
- All monthly datasets from the Parking Meter Availability Archive for 2024 were cleaned, standardized, and concatenated into a single dataset. Timestamp strings were parsed into datetime objects and sorted by SpaceID and time.
- Handling Missing Values:
- Some meters had gaps in sensor availability data. In the LSTM model, we padded these sequences with masked values, ensuring that the padding did not influence training through the use of built-in sequence masking. This preserved time-series integrity while accommodating meters with partial data.
- Merging Datasets
- The availability data was merged with the Inventory Reference dataset using SpaceID as the join key, enriching each availability record with its corresponding geolocation and address.
- Filtering for Downtown LA
- The merged dataset was pared down to only include meters located within the geographic boundaries of downtown Los Angeles, using bounding box logic on latitude and longitude.
- Train-Test Split
- Data was split chronologically by SpaceID
- Feature Engineering
- A variety of temporal features were engineered, including:
- hour, day_of_week, month: basic temporal cycles
- Trigonometric encodings: hour_sin, hour_cos (cyclical encodings of time to preserve continuity)
- Occupancy-based trends:
- rolling_avg (6-hour rolling average per meter)
- hourly_occupancy (historical average availability by hour)
- hourly_occupancy_by_day (rolling trends by weekday/hour)
- A variety of temporal features were engineered, including:
Model Selection
Several candidate models were considered during experimentation:
- Deep Learning Models
- LSTM: Designed for sequential time-series data, LSTMs are well-suited to capturing patterns in meter usage over time. However, training individual LSTM models for each SpaceID was computationally infeasible, and shared models reduced performance due to loss of meter-specific patterns.
- Bayesian Flipout: A neural network variant using variational inference to estimate predictive uncertainty. While promising for expressing model confidence and mitigating overfitting, training time was significantly higher, and it required substantial memory for large datasets. Flipout is more suited for low-throughput, high-stakes domains rather than fast, scalable availability prediction.
- Gradient Boosting Models
- XGBoost: An optimized gradient boosting framework offering high predictive power. However, training was slower and more resource-intensive than LightGBM, with no significant improvement in accuracy.
- CATBoost: Another gradient boosting method optimized for categorical data. While comparable to LightGBM in performance and ease of use, it had slightly longer training times and offered no measurable advantage in our parking dataset, which had relatively few categorical fields.
- LightGBM: A gradient boosting algorithm that is fast and efficient with large datasets. It supports categorical features natively and handles class imbalance and feature sparsity well. This made it a strong candidate for structured parking data with engineered temporal features.
- Stacked Ensemble Approach
- XGBoost, LightGBM, and Logistic Regression
While individual models showed moderate performance, none surpassed a baseline AUC threshold across all validation subsets. Ultimately, the most promising results were achieved through a meta-model ensemble, which combined the strengths of:
- LightGBM: Captures structured features and historical trends effectively.
- LSTM: Models sequential dependencies and temporal patterns.
- Logistic Regression: Serves as the meta-learning to blend the two base models.
This architecture achieved the highest validation and test AUC, indicating robust generalization and reliable predictions for real-world user queries.
Model Evaluation
We use Area Under the ROC Curve (AUC) instead of simple accuracy. AUC measures the ability of the model to distinguish between occupied and vacant states across all thresholds, making it better suited for imbalance datasets where the number of vacants vs. occupied meters can vary dramatically. Accuracy along can be misleading – predicting “occupied” for every meter might result in high accuracy but poor real-world usefulness.
Model Development
The model pipeline was constructed as follows:
- Train-Test Split
- The dataset was split into training (70%), validation (15%), and test (15%) subsets. Data was first sorted by SpaceID and then by datetime to preserve meter-specific temporal order before splitting. Feature engineering was applied after the split to reduce the risk of data leakage. This structure ensured that trends in future availability did not influence training data.
- LightGBM Training
- A gradient boosting model was trained on engineered features (hour, day_of_week, rolling_avg, etc.) using binary classification to predict is_available. It was tuned using AUC as the objective and early stopping based on validation loss.
- LSTM Training
- A two-layer LSTM network with dropout and layer normalization was developed to process sequential slices of 12-hour time windows. It learned from sequences of historical availability to infer temporal trends per meter.
- Meta-Model Blending
- Outputs from both LightGBM and LSTM models on the validation set were fed into a Logistic Regression model after normalization. This final blending step yielded predictions with significantly higher accuracy by leveraging the complementary strengths of the two base models.
- Evaluation
- The blended model achieved a high AUC on holdout test data, outperforming individual models and previous ensemble iterations. Feature importance tests further validated the value of occupancy-based engineered features.
Future Improvements
If we had more time or continued this project outside of Capstone, we would focus on improving user experience by:
Including pricing and time limit information in the "Parking Tips" banner at the bottom of our site. Users sought info on meter price, time limits, and known restrictions, which we could provide for a selected meter.
Incorporating Live Data into our model. LADOT has an API with live data. In theory, we should be able to increase near-term accuracy by incorporating real time data into our prediction.
Sharing the most-available close meters. Today we share the availability of the nearest meter, but there may be a more-available meter close by. We can strike a balance by working to define the 'most available meter nearby', or share a map with the availability of the k-nearest meters for some number k.
Ultimately, our biggest opportunity is in better incorporating user feedback. Creating a pipeline for user feedback is the best way to ensure our product is achieving its objective. Further, crowdsourced data from users can create a proprietary dataset on things like construction or temporary parking restrictions which might not be available through the LADOT data.
References + Links
https://ladot.lacity.gov/projects/parking-la
https://data.lacity.org/Transportation/LADOT-Parking-Meter-Occupancy/e7h6-4a3e/about_data
https://data.lacity.org/Transportation/LADOT-Metered-Parking-Inventory-Policies/s49e-q6j2/about_data
