Data Mining Meets HCI: Making Sense of Large Graphs
We have entered the era of big data. Datasets surpassing terabytes now arise in science, government, and enterprises. Yet making sense of these data remains a fundamental challenge. Where do we start our analysis? Where to go next? And how to visualize our findings? My research takes a step towards answering these questions.
I work in data mining and human-computer interaction (HCI), and I combine the best from both worlds to create tools that help people make sense of graphs with billions of nodes and edges. I present my work in three interrelated topics.
Attention Routing: I introduce this idea, based on anomaly detection and machine inference, that automatically draws people’s attention to interesting parts of the graph. I describe two examples: the Polonium technology unearths malware from 37 billion machine-file relationships; the NetProbe system fingers bad guys who commit auction fraud.
Mixed-Initiative Graph Sensemaking: I describe the Apolo system that combines machine inference and visualization to guide the user to interactively explore large graphs. The user gives examples of relevant nodes, and Apolo recommends which areas the user may want to see next. In a user study, Apolo helped participants find significantly more relevant articles than Google Scholar.
Scaling Up: I show how we may enable interactive analytics of large graphs with a hybrid architecture that harnesses parallel computation for expensive tasks, and local computation for fast machine inference, visualization, and interaction.
Duen Horng “Polo” Chau is an assistant professor at Georgia Tech, in the School of Computational Science & Engineering of the College of Computing. Polo received a Ph.D. in machine learning and a master’s in human-computer interaction (HCI), both from Carnegie Mellon.
Polo is working to bridge the fields of data mining and HCI to develop tools that combine the best of both worlds for making sense of billion-node graphs. His research interests span data mining, machine learning, information visualization, and HCI.
Polo solves large-scale, real world problems that impact society. His NetProbe auction fraud detection system appeared in The Wall Street Journal, CNN, TV and radio. His Polonium malware detection technology (with Symantec, patent-pending) protects 120 million people worldwide.
Polo is the only two-time Symantec fellow. He received a Yahoo! Key Scientific Challenges Award. He contributes to the PEGASUS peta-scale graph mining that won an Open Source Software World Challenge Silver Award. Polo is also an award-winning designer; he designed Carnegie Mellons ID card.