MIDS Capstone Project Spring 2025

Perceive AI

Team members

Our Mission

At Perceive AI, we believe accessibility is a right—not a luxury. Our goal is to remove barriers for the visually impaired, starting with one of the most essential tasks: grocery shopping.

What We Do

Perceive AI is an assistive shopping app that uses real-time grocery item recognition to guide users through stores with precision. Unlike conventional tools that flood users with unnecessary information, Perceive AI focuses solely on items the user actually needs—those on their shopping list.

Through clear directional audio guidance and a screen-reader-friendly interface, the app ensures a smooth, frustration-free experience. With built-in voice command functionality, users can interact hands-free and with full autonomy.

Our Team

We’re a group of five students from UC Berkeley’s Master of Data Science program, driven by a shared passion for using AI to make the world more inclusive. This app is the culmination of our capstone project—a product of research, experimentation, and deep collaboration.

The Challenge

Grocery shopping shouldn’t require help.
Yet for millions of people with visual impairments, navigating produce aisles often means relying on others — or enduring clunky, noisy tech that bombards them with audio overload.

📉 Existing tools are either too limited or too loud.
💸 Visually impaired shoppers face third-party shopping fees averaging $48 per trip.
🧑‍🦯Over 300 million people worldwide live with visual impairment.

Our Solution: Perceive AI

A seamless way to shop with confidence.

Perceive AI is an intuitive mobile app that:

🛒 Uses voice commands to create your shopping list.
👓 Alerts you when you’re near a produce item from your list.
🧠 Combines powerful computer vision + LLMs to detect, confirm, and guide you to items in real-time.

How It Works

Create a voice-powered list before you head to the store.
Walk through the produce section.
Receive audio alerts when an item is nearby.
Check it off. Keep going.
No camera handling. No guessing. Total independence.

🛍️ MVP tested at Trader Joe’s
🧠 Powered by YOLO, AWS SageMaker, and VideoLLaMA
📱 Built for iOS, coming to smart glasses in 2025

Data Science Approach

Problem Definition & Objective: Develop an accurate and efficient object detection model specifically for identifying produce items in real-world scenarios (e.g., grocery stores), suitable for on-device deployment.
Data Acquisition & Preparation:
- Collected a large-scale image dataset relevant to the target domain (produce items).
- Performed manual annotation (labeling) to create ground truth bounding boxes for model training.
- Curated this labeled data into a custom training and validation dataset comprising thousands of images.
Model Selection & Rationale:
- Selected the YOLOv11x architecture.
- Rationale: Chosen for its established state-of-the-art performance, offering a strong balance between detection accuracy and computational efficiency suitable for potential resource-constrained environments.
Training & Fine-tuning:
- Employed a fine-tuning strategy, leveraging pre-trained weights from YOLOv11x and adapting the model to the specific custom produce dataset.
- Training was conducted over 50 epochs.
Evaluation Metrics & Monitoring:
- Primary Metric: Mean Average Precision (mAP).
- Tracked mAP at an IoU threshold of 0.50 (mAP50) to assess general detection performance.
- Also tracked mAP across a range of IoU thresholds (mAP50-95) for a stricter evaluation of localization accuracy.
- Monitored these metrics throughout the 50 epochs to observe convergence and learning progress.
Quantitative Results:
- Achieved a final mAP50 score of 0.85 on the validation set, indicating strong detection performance according to the chosen metric.
Qualitative Validation & Performance Analysis:
- Assessed model performance on representative real-world image examples (detection grids).
- Validated the model's ability to:
  - Identify multiple produce items correctly with high confidence scores.
  - Handle challenging visual conditions such as background clutter and motion blur.
  - Distinguish between visually similar classes (e.g., lemons vs. limes).
- This qualitative analysis confirms the practical reliability suggested by the quantitative mAP score.

Technical Innovation

Our application leverages an on-device YOLO model, custom-trained to recognize only items from the user's grocery list. When a listed item is detected, it instantly triggers a smart pipeline, sending minimal data to an LLM to generate audible feedback. This feedback is delivered immediately back to the user via a responsive WebSocket connection managed with DynamoDB. This selective upload approach ensures cost efficiency, while the modular architecture allows for easy integration of different LLMs and future scaling to platforms like augmented reality.

Next Steps: App Development

Our development focuses on optimizing performance for speed and battery longevity, crucial for a seamless shopping experience. Accessibility is paramount, extending beyond basics to include robust screen reader integration, intuitive voice commands, and refined audio feedback. We are also refining the UI/UX for simplicity and clarity, reducing cognitive load for effortless interaction. Guided by rigorous testing and valuable feedback from the visually impaired community, our goal is to ensure the app is practical, empowering, and truly functional in real-world grocery environments.

Product Roadmap

As we are accepted to Berkeley SkyDeck's Pad-13 incubator program, we are leveraging the mentorship and resources over the next four months, focusing on two key pillars: deepening value through direct grocery store partnerships, and enhancing the user experience via hardware integration and market expansion. Our immediate priority is partnering with grocery chains to unlock precise in-store navigation and richer product data. Longer-term, we aim to expand into larger retailers like Target and Walmart and are evaluating smart glasses integration for a seamless, hands-free option, working to solidify perceive.ai as an essential tool for the visually impaired community

We’re building beyond produce. Here’s what’s next:

🔗 Grocery store partnerships for enhanced navigation
🍇 Expanded produce and product recognition
🧠 Smart glasses integration for hands-free AR
🧪 Continued testing with our user community

Course

Data Science 210. Capstone , Spring 2025

Class Project Gallery

More Information

Perceive AI Website

Video

Last updated: May 27, 2025