MIDS Capstone Project Summer 2019


Document dumps of emails from organizations and individuals have become more common in recent years. Journalists and researchers face the daunting task of sifting through these documents, often under the pressure of a short deadline. An obstacle to searching these documents is that the software tools (text editors) used by these groups are not well suited for this task. Rather than performing an exploration of the data, they resort to searching for key words which may be relevant.

Docsund offers a solution for handling these types of document collections. Docsund is an interactive, web based tool to allow the rapid exploration of large, unstructured document collections. It provides a full search feature to quickly search for words within all the documents, and an intuitive browser for viewing results. The entity browser shows the relationship between entities (people, organizations, money, and time) in the document. Finally, the topic explorer automatically finds latent topics within the documents, giving the user an high level overview of the documents, and also starting points for further exploration.

More Information

Last updated:

August 6, 2019