Featured Faculty Member

David Bamman

Assistant Professor

Professor David Bamman is a scholar on the cutting edge of natural language processing, digital humanities, and computational social sciences.

Earlier this year, Professor Bamman’s work using machine learning and data science for literary analysis received considerable attention in the Journal of Cultural Analytics, the media, and across the Twitter-sphere.

Professor Bamman recently received a $500,000 National Science Foundation (NSF) grant for “Building Subjective Knowledge Bases by Modeling Viewpoints” (co PI with Professor Brendan O’Connor of the University of Massachusetts Amherst).

Professor Bamman answered a few questions about his life and work:

What are you currently working on?

One dimension of my ongoing research is developing more sophisticated computational models of plot in literary texts using methods in natural language processing and machine learning. Most of the attention in NLP over the past few decades has focused on a relatively small set of domains (like newswire or product reviews), but literary texts present a host of interesting challenges for text understanding you don't see elsewhere. Plot is one of these; while itself is a complex abstraction, at the very least it involves people (characters), places (the setting where action takes place), and time (when those actions take place), all interacting through the depicted events that constitute the action of the story. In my group, we’re decomposing plot into solvable sub-problems, each of which can be researched and evaluated on its own terms. Our current area of focus is setting — trying to reconstruct the physical geography of a novel by grounding every event in the location that it took place.

“The most interesting work on the research side is not purely in the modeling and coding phase of these problems; it’s in defining and theoretically motivating what a good measurement looks like.”

On the computational social science side, one project we’re starting up now is developing what we’re calling “subjective” knowledge bases. Lots of work in NLP over the past five years has focused on “open information extraction” — trying to read through all of the text on the web to learn facts like “Obama was president in 2014.” Of course, people say lots of things on the web, and many of them aren’t factual, so this work is designed instead to learn a set of opinions and viewpoints as they’re asserted in text, and use those assertions to build a subjective knowledge base that can accommodate contradictory and conflicting statements from different authors. We’ll be looking at attitudes expressed on Twitter, Reddit and in a collection of 5 million historical books.

What research questions do you find most compelling?

The questions I work on all see text as a form of data — using literary novels to measure the amount of attention given to characters as a function of their gender; using political speeches to measure rhetorical strategies to solicit applause. In all of this work, measurement is an important concept — how do we design an algorithmic instrument that can measure some abstract and often ill-defined quantity like “attention” or “rhetorical strategy” from raw text?  Many of the methods we have in NLP can ultimately be seen through this lens when considering text as data — in some cases that instrument may already exist (in mature technologies like named entity recognition or syntactic parsing), but in the most interesting cases, we need to design a new one from the bottom up. The most interesting work on the research side is not purely in the modeling and coding phase of these problems; it’s in defining and theoretically motivating what a good measurement looks like. I tend to gravitate towards questions where that’s not clear from the outset.

“I work on empirical problems in the social sciences and humanities and collaborate with other researchers in those fields and it’s invigorating to be a place like the I School where the students often have a depth of knowledge and critical insight that comes from those disciplines.”

What makes the I School and I School students unique?

The problems that my students work on are pretty technical on the NLP side, but the problems aren’t just algorithmic; they holistically involve every aspect of experimental design (from theoretically motivating a research problem, implementing its solution, and designing what real validation looks like). I School students are great at this entire process, and I think their interdisciplinary backgrounds are an important part of that; many of them majored in computer science plus something else (like music or comparative literature). I work on empirical problems in the social sciences and humanities and collaborate with other researchers in those fields and it’s invigorating to be a place like the I School where the students often have a depth of knowledge and critical insight that comes from those disciplines.

How did you get into your field?

I started out as a Classics major in college; what put me on the path to my field was working as a researcher at the Perseus Project (a digital library of Greek and Latin at Tufts University, one of the flagships of the digital humanities) for a few years before getting my Ph.D. I had a background in computational linguistics at that point, but that experience really opened my eyes to what computational and empirical methods can do for the research questions asked in a discipline as traditional as Classics (there I worked on automatic syntactic parsing for Greek and Latin, building bilingual dictionaries using techniques from machine translation, and automatically identifying allusions in Latin poetry).

Last updated:

November 8, 2018