MIMS Final Project 2010

Migratory Words: Computational Tools for Investigative Political Research

Eric Kansa

With recent progress toward more transparent government systems in the United States, new problems arise around how citizens can meaningfully engage and understand such a huge quantity of raw, and often context-less data. By extension, the new challenge isn't just finding the needle in the haystack, it's finding patterns that bind together individuals, organizations, and policy.

A common public sentiment is that the political process is ruled simply by polling trends and money. In contrast, our goal is to bring light to the knowledge sharing process which exists inside the beltway, and to partially expose the transference of talking points, and highlighting organizations that are particularly influential.

For example, in November of 2009, journalist Robert Pear of the New York Times revealed an instance where some 42 members of Congress used text derived from a Roche/Genentech lobbying email in their comments to the Congressional Record. In this case, manual comparisons were performed between the emails provided by Genentech, and the Congressional Record, requiring Pear to intelligently target his search against a particular set of data.

As a result, our project seeks to reveal statistical relationships between the language used in influential text in the media, corporate position papers, and think tank publications, and to highlight the subsequent propagation of this text to the congressional record. Informed by qualitative interviews with journalists, researchers, and professionals in the field of open government, the result of our computational system will be a dynamic web interface built for a mainstream audience, offering meaningful visualization and categorization of our results.

To achieve this end our project will partner with some existent word correlation and sentiment analysis tools developed by UC Berkeley's StatNews Project and Intel Labs Berkeley.

External advisor: Laurent El Ghaoui (EECS)


Last updated:

March 30, 2017