banner
MIDS Capstone Project Summer 2024

EduCreate

Problem & Motivation

Teachers face a daunting challenge every day.

They must craft lessons that captivate each student and equip them with essential skills, all while juggling diverse learning styles and limited planning time. Traditional education materials often fall short, leaving students uninspired and teachers overwhelmed. How can we make learning more engaging and effective for every student?

Our Solution

We built EduCreate to help teachers address this challenge.

EduCreate provides interactive and user-friendly solutions, including a Comic Generator for creating short comics and written summaries based on input text and lesson prompts, and a Lesson Planner for designing comprehensive, personalized lesson plans.

As a GenAI-assistive resource, EduCreate’s Comic Generator transforms text-based history content into comic strips and creative summaries, saving teachers time and money on lesson planning. This innovative approach empowers teachers to be content creators and invites students to engage with history material through action-packed, life-like stories. 

Additionally, EduCreate's Lesson Planner can transcribe any video or YouTube content into a structured lesson plan based on the user's prompts. This functionality allows teachers to create follow-up work for students, ensuring continuity of learning, and provides make-up resources for those who miss class. 

The platform features a novel data pipeline that integrates several types of GenAI models, including RAG, LLM, Text2Image, and Speech2Text models.

EduCreate aims to tap into the fastest-growing segment of the $400B+ edtech market: interactive, personalized learning solutions1. It also seeks to serve the expanding population of high school teachers worldwide, which stood at over 1 million in the U.S. alone in 20222. By revolutionizing the way educational content is created and delivered, EduCreate strives to make learning more engaging and effective for every student.

Data Science Approach

 

In Version 1, EduCreate features two key components: (1) the Comic Generator and (2) the Lesson Planner. The data and model pipelines for these features are illustrated below:

Comic Generator

  • The pipeline begins with user inputs, which include: (a) the main lesson objective, (b) the comic style, (c) an optional file upload (in PDF, TXT, JPG or JPEG formats), and (d) the choice of LLM model type. 
  • These inputs are processed by the Retrieval Augmented Generation (RAG) model, built using the LangChain framework. We selected LangChain for its modular approach and seamless integration with third-party tools, which simplifies both development and management of the architectural components.
  • We have tuned the RAG model’s hyperparameters – including chunk size, chunk overlap, vector store, retriever, top K, embedding model, and LLM settings – based on extensive model evaluations.
  • If a user uploads a document, the entire document is processed directly through the LLM’s context window. If no document is uploaded, the RAG model retrieves relevant text chunks from a pre-indexed and vectorized vector store (Qdrant). These retrieved chunks are then used as context for generating output through the LLM.
  • Users can choose between Anthropic’s Claude 3.5 Sonnet or OpenAI’s GPT-4o Mini for the LLM. Our evaluations show that while both models perform comparably, OpenAI’s GPT-4o Mini is significantly more cost-efficient.
  • The RAG model produces two outputs: (a) a text summary aligned with the main lesson objective, and (b) a dictionary of prompts and captions derived from the summary. Output (b) is then fed into the Text2Image model.
  • We selected Stability AI’s Stable Diffusion 3 as our primary Text2Image model due to its open-source nature, high quality, efficient use of memory and GPU resources, and stronger control features. Despite these advantages, Stable Diffusion 3, like other proprietary models, also faces challenges in maintaining consistent characters.
  • Finally, the generated images from the Text2Image model are stitched together with the captions to create the comic strip, which users can then download.

Lesson Planner

  • The pipeline starts with user inputs, which include: (a) optional instructions for lesson plan creation, (b) the type of video, and (c) the URL link to the video, which must be accessible online.
  • Audio is extracted from the video using tools like pydub and yt_dlp. This audio is then transcribed by OpenAI’s Whisper Speech2Text model.
  • The transcribed text, combined with the user inputs, is processed by the LLM (Anthropic’s Claude 3.5 Sonnet) to generate the lesson plan. The completed lesson plan can then be downloaded by the user.

Evaluation

We evaluated our models using three distinct methods:

  1. Quality of RAG Output: To assess and optimize the hyperparameters of the RAG model for improved output quality, we created a synthetic dataset to serve as a repository of “ground truths.” We then tuned the hyperparameters based on two key metrics: (a) Semantic Similarity (as measured by the cosine similarity of generated response sentence embeddings – encoded by Sentence Transformers – to our ground truth responses), and (b) LLM Judge (using Large Language Models to evaluate or rank responses based on predefined criteria). The synthetic dataset was generated by the LLM through a few-shot learning process with a small number of actual Q&A examples.
  2. CLIP / BLIP Evaluation of Generated Images: To evaluate the generated images and their corresponding prompts (generated from the RAG output), we employed the CLIP/BLIP model architecture. Each image panel was processed through the BLIP model to generate a set of descriptions. These descriptions were then ranked by CLIP based on their accuracy. We used cosine similarity to compare the generated descriptions with the prompts generated by the RAG model and passed through the Text2Image model during the comic generation process. 
  3. User Testing: The MVP was shared with teachers and interested users for feedback and acceptance testing. This feedback loop is used for further model refinements.

Evaluating Generative AI models presents general difficulties, as there are currently no universal standards for model assessment. Our approach aims to provide a more holistic evaluation of the end-to-end pipelines.

Key Learnings & Impact

One of our key learnings is recognizing the immense power of data science to reimagine how we approach education and drive positive change. Our mission with EduCreate is to democratize access to innovative teaching tools. By empowering the content creator within every teacher and igniting the imagination of every student, we aim to make a lasting, positive impact on education. After all, sharing knowledge is the ultimate act of caring.

Our technical and human-centric learnings are set out in the respective sections below.

Technical Challenges

  • Developing an end-to-end data and model pipeline: Achieving our project aims within a short time frame of 14 weeks required efficient planning and execution.
  • Maintaining consistency of characters: Ensuring that characters remain consistent across the different images that make up a comic strip was a significant challenge.
  • Evaluating hyperparameter trade-offs: Carefully tuning hyperparameters across the pipeline stack involved weighing various trade-offs to optimize performance.
  • Assessing quality without a universal standard: Evaluating the quality of RAG and image generation outputs posed difficulties due to the lack of a universal standard.
  • Transforming research into production: Moving from research (Google Colab code development) to production (Streamlit) and developing a product-centric website required a focus on scalability, usability, and user stories.

Human-Centric Challenges

  • Designing and pitching a market-ready tool: Developing and presenting a tool that has the potential to succeed in the market involved addressing user needs and effectively demonstrating its value.
  • Ensuring tool flexibility in an ambiguous problem space: Designing a tool that remains adaptable and useful despite uncertainties and varying user requirements required careful consideration and innovation.

A Few Final Words to All Those Interested in EduCreating

There were several reasons we built this tool. Many are featured above or on our official website (below). We’ll share a few additional reasons here. Today’s news is dominated by stories of students using AI tools to (seemingly) “shut off their brains” and skate through classwork (e.g., writing their essays, correcting their homework assignments, providing answers to take-home exams, etc.). We wanted to build something that would serve teachers in return. More specifically, we wanted to show that this technology can be harnessed for good and for brain-engaging learning-centric experiences if handled responsibly and perceptively.

As you’ll probably see when you interact with EduCreate, we also wanted to show that even today’s state of the art AI models can make some mistakes – like generating images that aren’t consistent over a series of comic panels or that look a little “off” (having extra fingers, arms in weird places, etc.) when examined closely. We think teachers can lean into these imperfections and use them as the basis of a lesson for both themselves and their students: while AI tools are great and helpful (we made this tool because we clearly believe they are!), they’re ultimately not a perfect substitute for human creativity, intellect, craft, and knowledge. So, when used, they should be highly scrutinized and evaluated on both accuracy and value.

That’s exactly what we desire with EduCreate. We hope it encourages both teachers and students to actively engage in the process of learning and to maximize the studying, investigating, analyzing, and discussing that are so fundamental to a solid educational experience.

Acknowledgements

We would like to extend a special thanks to Mark Butler for his invaluable wisdom in the GenAI space, to our instructors Danielle Cummings and Fred Nugen for their thoughtful feedback and guidance throughout our journey, and to each of the educators, teachers, and colleagues we spoke with along the way. Your insights have been instrumental in helping us bring our vision for EduCreate to life. Thank you!

Join Us in Revolutionizing Education

Footnotes

1Yelenevych, A. (2022, December 27). Council post: The future of EdTech. Forbes Business Council. https://www.forbes.com/sites/forbesbusinesscouncil/2022/12/26/the-future-of-edtech/?sh=4ade250a6c2f. 

2Occupational Outlook Handbook. (2024, April 17). High school teachers. U.S. Bureau of Labor Statistics. https://www.bls.gov/ooh/education-training-and-library/high-school-teachers.htm.  

References

Aaron Humphrey. (2020). The Pedagogy and Potential of Educational Comics – International Journal of Comic Art, Volume 22, Issue 2.

Alec Radford, Jong Wook Kim, Tao Xu, Greg Brockman, Christine McLeavey, and Ilya Sutskever. (2022). “Robust Speech Recognition via Large-Scale Weak Supervision.” arXiv preprint arXiv:2212.04356.

Ben Proven-Bessel, Zilong Zhao, and Lydia Chen. (2021). “ComicGAN: Text-to-Comic Generative Adversarial Network.” arXiv preprint arXiv:2109.09120.

Jay Hosler and K. B. Boomer. (2011). “Are comic books an effective way to engage nonmajors in learning and appreciating science?” CBE Life Sciences Educ, 10(3), 309–317. Doi: 10.1187/cbe.10-07-0090.

Mohit Iyyer, Varun Manjunatha, Anupam Guha, Yogarshi Vyas, Jordan Boyd-Graber, Hal Daumé III, and Larry Davis. (2017). “The Amazing Mysteries of the Gutter: Drawing Inferences Between Panels in ComicBook Narratives.” arXiv preprint arXiv:1611.05118. 

Patrick Esser, Sumith Kulal, Andreas Blattmann, Rahim Entezari, Jonas Müller, Harry Saini, Yam Levi, Dominik Lorenz, Axel Sauer, Frederic Boesel, et al. (2024). “Scaling Rectified Flow Transformers for High-Resolution Image Synthesis.” arXiv preprint arXiv:2403.03206.

Tim Hodges. (2018). “School Engagement Is More Than Just Talk.” Gallup – Education. https://www.gallup.com/education/244022/school-engagement-talk.aspx, 2018.

Yupeng Zhou, Daquan Zhou, Ming-Ming Cheng, Jiashi Feng, and Qibin Hou. (2024). “StoryDiffusion: Consistent Self-Attention for Long-Range Image and Video Generation.” arXiv preprint arXiv:2405.01434.

Zhen Huang, Zengzhi Wang, Shijie Xia, and Pengfei Liu. (2024). “OlympicArena Medal Ranks: Who Is the Most Intelligent AI So Far?” arXiv preprint arXiv:2406.16772. 

 

Last updated: September 10, 2024