Crowdsourcing for Information Retrieval: From Statistics to Ethics
The first computers were people. While we have successfully automated many routine information processing tasks once performed by human computers, our best intelligent systems today still cannot match human performance on many other tasks, such as analyzing text or imagery. Such continuing demand for human labor, coupled with an increasingly digitally-connected and well-educated global population, has driven the rise of crowdsourcing and a renaissance in research on human computation.
In this talk, I will highlight some of the ways crowdsourcing has specifically impacted Information Retrieval (IR), the science of search engines. In particular, as rapid information growth has led to our searching ever-more massive information repositories, manually judging the relevance of so many search results for evaluation purposes has become increasingly infeasible. While IR has made great strides in reducing the scale of human effort needed for evaluation, the fundamental bottleneck remains. Fortunately, crowdsourcing offers the potential for faster, cheaper, and easier data labeling than ever before. Unfortunately, it also raises a host of new challenges, spanning such diverse areas as human-computer interaction, psychology, machine learning, economics, and regulation, to name a few. Success seems to require tackling everything from statistical quality assurance to thornier ethics. After discussing recent work in my lab on statistical consensus, I will change gears to touch on some ethical issues in crowdsourcing which increasingly concern me. I will argue that those using crowdsourcing have an ethical obligation to wrestle with such considerations so that we may better assess how our technological “advances” are impacting the lives of people powering our human computation systems.
Matthew Lease is an assistant professor in the School of Information at the University of Texas at Austin. He holds a Ph.D. in computer science from Brown University, and his research centers on information retrieval (IR), crowdsourcing, and their intersection. Lease has received early career awards from NSF, IMLS, and DARPA, as well as the Modeling Challenge Award at the 2012 International Conference on Social Computing, Behavioral-Cultural Modeling, & Prediction. He presented an invited keynote at IJCNLP 2011 and invited talks at Frontiers of Information Science and Technology 2012 and ID360 2013: The Global Forum on Identity. He has presented crowdsourcing tutorials at ACM SIGIR (2011-2012), ACM WSDM (2011), CrowdConf (2011), and SIAM Data Mining (2013). He is co-organizing the 3rd annual Crowdsourcing Track for the National Institute of Standards and Technology (NIST) Text REtrieval Conference (TREC), as well as the CrowdScale workshop and shared task at AAAI HCOMP 2013.