MICS Capstone Project Spring 2026

Agent Watch

Before an agent acts, who is watching?

AI browsing agents can read your screen, fill out your forms, and act on your behalf online. Most Users have no way of knowing how safely they do it. 

Unlike traditional chatbots, browsing agents take autonomous actions on the open web, introducing a novel risk surface around unintended data disclosure, prompt injection, and over-permissioned behavior. Team AgentWatch audited five leading agents across five dimensions: data disclosure, prompt misinterpretation, hallucination, prompt injection, and browser sandbox isolation, using a custom scoring framework that adjusts for the stochastic nature of agent behavior.

The evaluation infrastructure and scenario library are open-source, designed to grow alongside the field.

 

Contribute to the Test Scenario Library here --> https://agentwatch.figma.site/

Last updated: June 7, 2026