Mapping Folklore: hGIS, Machine Learning and the Danish Folklore Archive
The project has multiple parts to it, and the goal is to get the parts eventually working together. The problem is large: folklore, which emerges from the dialectic tension between individual and tradition, is conditioned by social networks and reflects individuals' use of the resource of tradition to understand changes in the physical and manmade environment, and negotiate their shifting status in the rapidly changing economic and social environments. The Danish folklore archive material with which I work is based on 200+ fieldtrips made by Evald Tang Kristensen from 1867-1910, during which he collected 250,000+ stories, songs, games, rhymes, cures, observations of daily life, etc, from 6500+ named informants. The goal of my work is to attach all of these collections back to the individuals who told them, and to situate these individuals into a data rich environment, in which their stories—and patterns that emerge in the storytelling--can be interrogated not only as broad phenomena, but also in depth (drilling down to the individual story). By using information from other sources—census data, church records, voting environments. The Danish folklore archive material with which I work is based on 200+ fieldtrips made by Evald Tang Kristensen from 1867-1910, during which he collected 250,000+ stories, songs, games, rhymes, cures, observations of daily life, etc, from 6500+ named informants.
The goal of my work is to attach all of these collections back to the individuals who told them, and to situate these individuals into a data rich environment, in which their stories—and patterns that emerge in the storytelling—can be interrogated not only as broad phenomena, but also in depth (drilling down to the individual story). By using information from other sources—census data, church records, voting patterns, parish out/in migration, I hope to be able to present a far richer interpretive environment for the study of folklore while, at the same time, making the voices of the generally disenfranchised available to other researchers. You can play with the current interface (which is more of a book project than a research interface) at: dev.cdh.ucla.edu/danishfolklore/bin/mainview.html
Some of the patterns are discernible using simple fairly simple math on the graphs drawn in ArcGIS, while other machine learning techniques (particularly unsupervised machine learning) allow for the discovery of other patterns based on the text(s) themselves rather than simply on places mentioned in the texts. Projecting the results of machine learning back into the hGIS environment allows for another set of secondary evaluations of the "clusters" using standard GIS tools (the wide circles vs. narrow ellipses for example, based on informant gender).
The use of machine learners (particularly supervised learners), could also help us infer social networks either based on individual story comparisons, or on repertoire comparisons (eg all of the fairy tales told by a single informant).
Why the need to use these techniques—sheer volume: if one were doing this for one or two, or even ten people, you might be able to do some excellent close reading; doing it for dozens or hundreds or thousands requires a different set of tools.
Where does the NLP come in? For Danish, morph analysis is actually trivial; but for some of the cognate languages, such as New Norwegian, Icelandic, etc., the problem is quite significant—if one wants to be able to consider connecting the Danish materials to the Icelandic materials to the Norwegian to the Swedish—and perhaps to the English and Irish--one needs good NLP. Morph analysis is one small part of that equation, but would allow for greater efficiency and accuracy in the machine learning environment. Named-entity detection for inflected languages would also work an awful lot better if you had automated morpho-syntactic markup. So, the Icelandic work is focusing on the morphological side of this problem. Once we get that running, then (a) NED and auto-mapping from say the sagas (or the giant db of Icelandic folklore) would be quite easy, (b) lemmatized searching in the corpus would be possible and (c) cross language searching would be more accurate.
After college (AB in Folklore and Mythology) and graduate school (PhD in Scandinavian Languages and Literatures), I began my academic career in the Scandinavian Section at UCLA. My main areas of Scandinavian research are folklore, modern literature, film and Old Norse literature. My main Scandinavian language is Danish. Much of my research has focused on the study of storytelling in late nineteenth and early twentieth century rural communities in Denmark. I have considered the role witchcraft accusations play in these communities, as well as explored aspects of legends about the Black Death. I have also tried to answer the question, "When ghosts appear in the neighborhood, who ya' gonna call?" Indeed, my first book, Interpreting Legend, provides a methodology for the study of storytelling in small communities. I used much of this approach in my recent study of storytelling among paramedics, Talking Trauma, a consideration of storytelling in a contemporary American city. I have held appointments at the University of Copenhagen, where I directed the folklore program, and at Harvard University. Because of my work in Korean folklore and popular culture, my UCLA appointment was officially shared with the East Asian Languages and Literatures Department in 1998. I recently completed a documentary on punk rock in South Korea.
While not selflessly slaving away to make my courses the most fulfilling of all possible experiences for my students, or engrossed in my exciting (albeit incredibly dangerous) research--dangerous largely because of the chemicals needed to properly analyze a text in my newly minted “decomposition" school of textual criticism--or engaging the challenging administrative tasks that are among the most rewarding aspects of the professorial enterprise, I can be found practicing my skeet shooting in the deserts east of the city. I also enjoy travel; I have visited New Jersey, Maryland and even once made it to Delaware. When not shooting or traveling, I engage my other passion--composing baroque sonatas for harpsichord and flugelhorn. In the evenings, I volunteer my spare time at a pancake cooperative in Agoura Hills and on the weekends, my wife and I, if not teaching neighborhood children how to make batik, are usually tending our clam beds in the waters north of Malibu.