LlamaLeftovers
Problem & Motivation
At home food costs for consumers have increased each year since 2020, with prices predicted to increase in 2025 as well. In addition, 92 billion pounds of food gets wasted every year in the United States, or $473 billion worth of lost food (reference, Feeding America). It is increasingly important, now more than ever, to utilize food we already have and reduce our food waste.
Our Solution
Small changes made at the individual level is a great way to build a cumulative impact and shift mindsets. By combining a computer vision, machine learning model with a generative AI model, our product creates recipes based on images of food users have at home. This provides an easy and effective tool to help users reduce their food waste while making creative enjoyable meals.
Data Source & Data Science Approach
We implemented our project using AWS. An EC2 instance hosts a Streamlit application to allow users front end interaction. When an image is uploaded for classification, it invokes the SageMaker endpoint for our trained Yolov8m image classification model. The ingredient classifications are then sent back to the Streamlit app.
Once the user has verified the image classifications, they’ve added other desired recipe inputs and requested a recipe generation, Streamlit makes an API gateway call to a Lambda function. The Lambda function orchestrates the retrieval from Kendra with stored recipes to find relevant excerpts. The Lambda function then constructs the final prompt, and invokes the SageMaker endpoint for our Llama 3 8B model. Finally the recipe is returned to Streamlit for the user.
Data Pipeline
The data starts with the user uploaded image. The image is processed by resizing to 800x800 and converting to a byte array. The array is sent to the Yolov8m image classification model where a list is generated for each ingredient classified. The list contains the bounding box coordinates, the classification, and resulting confidence score.
The prompt inputs are created from the selected ingredients from the image and the manual inputs from the user. These inputs along with the Kendra index generate the engineered prompt for the Llama 3 8B model. Using LangChain Pydantic parser, a consistently formatted JSON is returned for front end display.
Data Source
Our image data comes from Roboflow, a platform designed to accelerate computer vision workflows using high-quality, annotated datasets. We used two core datasets:
- Dataset 1: ~8,800 multi-ingredient fridge images served as our baseline for multi-class classification.
- Dataset 2: ~4,500 single-ingredient images helped balance class representation and improve precision.
- Dataset 3: ~250 live fridge photos taken by team members to help with real world classification (lighting, angle, space, etc.)
Early inspection revealed mislabeled and corrupt images, so we scanned both training and validation directories to clean annotations and remove invalid data. To correct class imbalance, we merged datasets and applied augmentation techniques. After cleaning and rebalancing, we achieved significantly improved validation results.
Model Selection / Evaluation
We evaluated several models, starting with MobileNetV2 and ResNet50, but their accuracy fell short. Transitioning to YOLOv8 offered major improvements. We progressed from YOLOv8s to YOLOv8m, which delivered the best performance.
Key improvements included:
- Merging and cleaning additional datasets
- Balancing class distributions
- Increasing image size to 800x800
- Extending training from 15 to 25 epochs
These changes led to a mAP50 of 73.3% and mAP50-95 of 56.5%, with a smooth loss curve and reduced misclassifications. The model demonstrated strong generalization without overfitting, making YOLOv8m our final choice.
- All loss values are steadily decreasing → model is converging.
- Precision, recall, and mAP are increasing → model is learning effectively and generalizing well.
- Well balanced training process without signs of overfitting (train/val metrics are consistent).
Image Classification
Our final model, YOLOv8m, was trained on over 10,500 images and is capable of accurately detecting 54 commonly used ingredients. It performs especially well when items are clearly visible, evenly lit, and photographed from a head-on angle—consistently delivering high-confidence predictions for everyday household foods.
In the example shown, the model demonstrates high-confidence predictions on diverse ingredients like:
This result highlights the model’s ability to classify ingredients with precision across different shapes, colors, and packaging. Clean lighting and head-on views enhance detection, enabling accurate classification that supports our recipe generation system.
Recipe Generation
Recipe generation in our system relies on a Retrieval‐Augmented Generation (RAG) pipeline that merges the strengths of Amazon Kendra for retrieving relevant recipe snippets and a Llama 3 8B Instruct large‐language model for final recipe creation. First, we store a curated set of 1,000 recipes—sourced from RecipeNLG, Epicurious‐scraped data, and other publicly available texts—in individual files on Amazon S3. Amazon Kendra automatically ingests these recipes and applies semantic plus keyword indexing to facilitate efficient top‐K retrieval.
When a user requests a recipe, our Lambda function constructs a query that combines user‐selected ingredients (including those detected via YOLOv8m) and any additional preferences. This query is sent to Kendra, which returns relevant snippets—such as matched ingredients and cooking steps—from the stored recipes. The Lambda function then injects these references, along with the user’s specific requirements, into a carefully engineered prompt.
That prompt is passed to a SageMaker endpoint hosting the Llama 3 8B model. We leverage LangChain’s prompt templating to ensure the prompt is well‐structured, and use a Pydantic Output Parser to enforce valid JSON output containing fields like recipe title, ingredient list, instructions, and estimated calories. By explicitly labeling sections—such as “REFERENCE RECIPES FROM DATABASE”—we anchor the model’s responses to real data and reduce hallucinations. This approach produces consistent, coherent, and semi‐structured recipes that incorporate the user’s available ingredients. The system further benefits from AWS’s scale‐to‐zero functionality, making it cost‐effective while still delivering high‐quality recipe recommendations.
Key Learnings & Impact
Throughout this project, we encountered several real-world challenges that significantly shaped our solution. Early on, we discovered widespread corrupt and mismatched labels, which led us to build a clean, custom-labeled dataset from scratch. A severe class imbalance hindered model generalization, so we supplemented our training data and applied targeted augmentations to boost minority classes. We also found that real-world detection struggled in cluttered fridge environments. To address this, we incorporated curated fridge images into our validation set and standardized our preprocessing by resizing all images to 800x800 and converting them to byte arrays—resulting in measurable performance gains.
We implemented an active learning loop that enables the system to improve over time by retraining on misclassified examples. For deployment, we integrated the solution with AWS's scale-to-zero functionality to ensure both cost-efficiency and reliable performance. Additionally, we developed prompt-tuning and parsing logic to support semi-structured outputs—unlocking features like multi-ingredient upload and classification with roughly 80% precision. This structure allows for a flexible user experience, whether uploading ingredients individually or in combination, while maintaining strong model accuracy.
Acknowledgements
- Instructors: Joyce Shen, Korin Reid
- TAs: Billy Fong, Robert Wang (AWS)
- AWS for credits to build, train, and operate MVP
- User Feedback - Anonymous
- Family and Friends for support
- UC Berkeley for the education