Reasoning about Social Dynamics in Language
Humans reasons about social dynamics when navigating everyday situations. Due to limited expressivity of existing NLP approaches, reasoning about the biased and harmful social dynamics in language remains a challenge, and can backfire against certain populations.
In the first part of the talk, I will analyze a failure case of NLP systems, namely, racial bias in automatic hate speech detection. We uncover severe racial skews in training corpora, and show that models trained on hate speech corpora acquire and propagate these racial biases. This results in tweets by self-identified African Americans being up to two times more likely to be labelled as offensive compared to others. We propose ways to reduce these biases, by making a tweet’s dialect more explicit during the annotation process.Then, I will introduce Social Bias Frames, a conceptual formalism that models the pragmatic frames in which people project social biases and stereotypes onto others to reason about biased or harmful implications in language. Using a new corpus of 150k structured annotations, we show that models can learn to reason about high-level offensiveness of statements, but struggle to explain why a statement might be harmful. I will conclude with future directions for better reasoning about biased social dynamics.
Maarten Sap is a Ph.D. student at the University of Washington, advised by Noah Smith and Yejin Choi. He is interested in natural language processing for social understanding. Specifically, how can NLP help us understand human behavior, and how can we endow NLP systems with social intelligence, social commonsense or theory of mind.
In the past, he has interned at AI2 on project Mosaic working on social commonsense for artifical intelligence systems, and at Microsoft Research working on on long-term memory and storytelling with Eric Horvitz.