Cultural institutions such as museums and libraries house large collections of biographical information in the form of archival
context and name authority file records. However, today these institutional collections are complemented by sizable, but less
authoritative, online resources such as Freebase and IMDB. These online collections contain significant information that is either not
easily accessible through or not present at all in the institutional collections. BioGraph is the result of our effort to connect and
enrich such institutional collections with online resources and to provide tools to explore the latent social network present in these
In BioGraph, we have developed new techniques and improved existing ones. This includes hybrid machine learning techniques to
semi-automatically merge the collections and techniques to de-duplicate entities across collections. As well, we have designed
and implemented a visual query language that allows users to intuitively express structural queries and explore social networks of
the entities in our combined biographical collection.
More specifically, we use these techniques to enrich the biographical information generated by the Social Networks and Archival Context
Project (SNAC, http://socialarchive.iath.virginia.edu/xtf/search), which includes curated information from the Library of Congress, the
Online Archive of California and other cultural institutions, by connecting it with the Freebase and IMDB collections. We report on the
challenges faced in merging these diverse collections and discuss how our visual interface can be used to query the rich social network
information present in the combined and enriched collection.