What is Wrong with my Model? Detection and Analysis of Bugs in NLP Models
I will present two projects that deal with evaluation and analysis of NLP models beyond cross validation accuracy. First, I will talk about Errudite (ACL2019), a tool and set of principles for model-agnostic error analysis that is scalable and reproducible. Instead of manually inspecting a small set of examples, we propose systematically grouping of instances with filtering queries and counterfactual analysis (if possible).
Then, I will talk about ongoing work in which we borrow insights from software engineering (unit tests, etc) to propose a new testing methodology for NLP models. Our tests reveal a variety of critical failures in multiple tasks and models, and we show via a user study that the methodology can be used to easily detect previously unknown bugs.
Marco Tulio Ribeiro is a senior researcher at Microsoft Research in the Adaptive Systems and Interaction group. He is also an affiliate assistant professor at the University of Washington, where he also received his Ph.D.
His research is on facilitating the communication between humans and machine learning models, including interpretability, trust, debugging, feedback and robustness.