I School Research Exchange

Building an Annotated Dataset of Literary Entities and Events

Wednesday, February 20, 2019
12:00 pm to 2:00 pm
David Bamman

Literary novels push the limits of natural language processing. While much work in NLP has been heavily optimized toward the narrow domain of contemporary English newswire (and now increasingly social media), literary novels are an entirely different animal--the long, complex sentences in novels strain the limits of syntactic parsers with super-linear computational complexity, their use of figurative language challenges representations of meaning based on neo-Davidsonian semantics, and their long length (ca. 100,000 words on average) rules out existing solutions for problems like coreference resolution that expect a small set of candidate antecedents.

At the same time, fiction drives computational research questions that are uniquely interesting to that domain. In this talk, I will focus on developing computational models that lay groundwork for a uniquely literary problem: plot.  While "plot" itself is a complex abstraction, one contribution of our work here is to decompose it into solvable sub-problems, each of which can be researched and evaluated on its own terms. At the very least, plot involves people (characters), places (the setting where action takes place), time (when those actions take place), and things (objects that are important), all interacting through depicted events.

In this talk, I'll outline our progress to date on two of these sub-problems: recognizing the entities in literary texts (the characters, locations, spaces of interest) and identifying the events that are depicted as having transpired.  Both efforts involve the annotation of 200,000 words evenly drawn from 100 different English-language literary texts and building computational models to automatically identify each phenomenon.

(This is joint work with Matt Sims, Jerry Park, Sejal Popat and Sheng Shen.)

David Bamman is an assistant professor in the School of information working on natural language processing and machine learning.


The I School Research Exchange offers I School faculty and Ph.D. students opportunities to learn about, discuss, and contribute to research developing within the school, across the campus, and in the region.

Lunch, for those who have signed up, will be served at 12:00. The talk start at 12:30 and we try to wrap up between 1:45 and 2:00.

Meetings are open to I School faculty, I School Ph.D. students, I School visiting scholars, and invited guests

Last updated:

February 13, 2019