MIDS Capstone Project Summer 2022

Neutrally: Detecting and Replacing Inappropriate Subjectivity

Team members

Welcome to Neutrally

We have developed a language model that acts as a bias-checker - it takes in text and is able to detect and replace any inappropriate subjectivity with language that is more neutral in point of view.

Our Mission

We aim to reduce inappropriate subjectivity in media and literature and replace it with language that is more neutral in point of view. Inappropriate subjectivity is defined as language that introduces attitudes via framing, presupposing truth, or casting doubt in a case when it should be free from opinion.

Impact

In the United States, the assumption of bias in information erodes public trust: Nearly half of all Americans think the media is very biased and 60% of Americans believe bias is present in their own media choices. Neutrally is the first tool of its kind that can begin to remove bias in text and suggest neutral language replacements.

Examples

Here are a few examples of our model in action:

UC Berkeley is the greatest University of all time. -> UC Berkeley has been described by many as the greatest University of all time.

We have surprisingly met the deadline for our project. -> We have met the deadline for our project.

Mankind may eventually destroy Earth. -> Humanity may eventually destroy Earth.

The fake news media is going crazy. -> The mainstream news media is going crazy.

How It Works

Neutrally is powered by a T5 language model fine-tuned on additional data. T5 (“Text-to-Text Transfer Transformer”) is an encoder-decoder transformer model built by Google that performs well on many natural language processing tasks. It is pre-trained on a dataset called C4 (“Colossal Clean Crawled Corpus”).

Our T5 model is fine-tuned on the Wiki Neutrality Corpus (WNC), created by Pryzant et. al. The WNC was built using Wikipedia revisions tagged for “Neutral Point of View” justification. Each of the 55,503 sentence pairs includes an unrevised sentence and a “neutralized” version of that same sentence. By fine-tuning T5, we can train it on our specific task of reducing inappropriate subjectivity in text.

Course

Data Science 210. Capstone , Summer 2022

Class Project Gallery

Last updated: July 30, 2022