AI Fiction, Real People: The Complexities of Studying LLM-User Conversation Data
“Data, Power and the Ethics of Knowledge: A Digital Humanities Series” is presented by the Summer Program in Digital Humanities and co-sponsored by the Digital Humanities Library, and the School of Information
The release of AI-user conversation dataset, like WildChat, has offered new opportunities to understand how people are using LLMs in their daily lives. This talk will discuss an ongoing project that draws on WildChat — which contains millions of conversations collected with user consent — focusing specifically on how users generate fiction and stories. I will reflect on the many ethical and privacy considerations that are raised when working with this user data, and share some insights that can be gleaned from analyzing it. This talk will consider the significance of real-world data for AI research, as well as the threats that are posed by the misuse or abuse of this data, whether in research or industry.
Speaker
Melanie Walsh
Melanie Walsh is an assistant professor in the Information School and an adjunct assistant professor in the English Department at the University of Washington. She received her Ph.D. in English literature from Washington University in St. Louis, before becoming a postdoctoral associate in information science at Cornell University. She is co-PI of the AI for Humanists project (twice funded by the NEH) and co-editor of the Post45 Data Collective (funded by the NEH) and Responsible Datasets in Context (funded by the Mozilla Foundation).
She is currently at work on a book, When Postwar American Fiction Went Viral: Protest, Profit, and Popular Readers in the 21st Century, which explores how social media transformed the circulation and representation of literary works. Her recent research on AI and literature has been published or is forthcoming in NLP and humanities venues including EMNLP and Modern Fiction Studies.
About the Series
Data shapes what we know. Power determines who gets to know it. And the ethics of knowledge is not a question the data can answer for itself. We live in a moment when AI is reshaping every field, and the pressure to be proficient has never been greater. But proficiency alone does not tell you what your data means, whose stories it erases, or what you are responsible for when you use it. It is not a gap in technology, but where technology ends, and the thinking begins.
Over six evenings this summer, this series brings together scholars and practitioners chosen for the rigor, depth, and ethical commitment of their multidisciplinary work. The diversity of their backgrounds across academia, industry, public engagement, and across disciplines that rarely share the same room is itself a methodological choice. In the age of AI, the decisions we make about data — how we collect it, interpret it, and act on it, demand collective inquiry and shared accountability.
This series is an initiative of the Digital Humanities Summer Program at UC Berkeley, a space committed to fostering dialogue across disciplines, communities, and ways of knowing. Our goal is not to arrive at answers, but to think together — to ask what responsible data use looks like in practice, what ethical AI means beyond compliance, and how humanistic inquiry can help us navigate the choices that technology alone cannot make for us. We hope every participant leaves not just informed, but challenged and better equipped to act with care and intention in their own fields.
Digital humanities is not a discipline. It is a practice and a commitment to interpretation, to accountability, to the ethics of knowledge production. What does it mean to build digital archives that don’t reproduce the erasures of colonialism? How do computational methods change the stories we can tell and the ones we can no longer ignore? Who gets counted in our datasets, and who disappears? How do we recognize when AI-generated narratives become the infrastructure of misinformation? These are not technical questions. They are ethical ones, and it is precisely the responsibility of Digital Humanities to hold them open, to refuse easy answers, and to ask not just what we can do with data and digital tools, but what we should do, and for whom.
Each event is a free public lecture followed by Q&A, open to the entire Berkeley community. No prior knowledge required. Only curiosity.
