MIDS Capstone Project Fall 2023

NLP in the Classroom

Problem & Motivation

Providing teachers with timely feedback is an issue especially in public school districts with schools that have multiple competing needs for administrators. One way to improve this feedback loop is by using transcripts of classroom recordings to look at the fine grain data that emerges from a classroom. Using a natural language processing algorithm, transcripts can be analyzed to measure various features of a conversation such as talking time and how often teachers follow up on a student response or how teachers and students participate in turn-taking. We use an anonymized dataset shared by Professor Dora Demszky from the Stanford Graduate School of Education which has been made available for research purposes. This dataset includes 1,660 classroom transcripts of lessons that have been anonymized and given labels by subject experts. Providing teachers with feedback about their instructional practice is important. To this end, our research will look at how NLP tools can predict teacher discourse moves from classroom transcripts. The potential impacts would include improving learning and teaching outcomes.

Models

We used three different Large Language Models (LLMs) to achieve similar results as the original study. We use Llama 2 (an LLM owned by Meta), GPT (an LLM owned by OpenAI and the “brain” behind ChatGPT), and RoBERTa-large (Liu et al. 2019) (a variant of BERT, an LLM owned by Google) fine-tuned using LoRA (Hu et al. 2021).

Results

Overall, the The RoBERTa-large with LoRA outperformed the other models included the original study from Demszky et al. 2023 in terms of accuracy on Teacher On Task, High Uptake and Focusing Question. The Roberta model consistently outperformed Llama 2 in predicting every discourse move indicating that a larger model does not necessarily equate to a better performance.

Acknowledgments

We acknowledge Professor Dora Demszky et al for their foundational work on this problem and for providing easy access to such a huge dataset. Our project would not have been possible otherwise. Thanks as well to Google, OpenAI, Meta, and the countless data scientists, linguists, computer scientists, and statisticians who have worked to develop the field of natural language processing, and on whose shoulders our own work rests. Thanks as well to Professors Danielle Cummings and Fred Nugen as well as the University of California, Berkeley for providing the support and framework for this project.

Last updated:

December 14, 2023