“It Still Feels like I’m Doing the Work": Co-Designing for Mutual Human-AI Agent Communication
Large language models (LLMs) have quickly progressed from chat‑based advisors to agents that can perceive their surroundings and act within them. Recent “computer‑using” agents—such as Anthropic’s Computer‑Use and OpenAI’s Operator—work by clicking, typing, and scrolling through a graphical user interface (GUI) exactly as a human would (Wang et al., 2025). In principle, this transforms everyday digital work: instead of filling out web forms or comparison‑shopping by hand, people can now express intent in natural language and allow the agent to execute the task end‑to‑end. Yet our research findings reveal a paradox: although these agents are visibly doing more, users often feel they must watch, steer, and correct them so closely that “it still feels like I’m doing the work.”
A growing body of Human‑AI literature suggests that this friction is less about raw capability and more about communication breakdowns during collaboration. Prior guidelines urge systems to expose what they can do, show what they are doing, and invite corrective feedback (Amershi et al.,2019; Google, 2019; Horvitz, 1999). However, they were written for assistants that generate a single answer, not for agents that autonomously navigate third‑party websites, enter credit‑card data, or email a client on the user’s behalf. Early studies of GUI agents echo this gap: people struggle to anticipate an agent’s next move, to verify whether a silent step succeeded, or to convey nuanced preferences that shape downstream actions. Bansal et al.’s twelve Human‑Agent Communication Challenges catalogue these pain‑points, but offer limited design tactics for resolving them (Bansal et al., 2024).
We argue that effective mutual communication—users articulating intent to the agent, and the agent revealing plans, progress, and rationale to the user—is now the primary bottleneck to productive delegation. To investigate how these challenges manifest in real tasks and how they might be overcome, we conducted a co‑design study with ten participants interacting with OpenAI Operator, one of the first widely‑released computer‑using agents. Participants attempted to reserve a birthday dinner, rated their experience on the twelve communication challenges, and then sketched “ideal‑world” design solutions to each breakdown.
Our research is guided by three overarching questions. First, we ask where, how, and why users encounter breakdowns as they attempt to build common ground with a computer‑using agent. Second, we explore how participants envision seamless, satisfying collaboration once those communication hurdles are lifted. Third, we examine which concrete design patterns surface from their proposals and how those patterns could advance current human–AI design guidance.