Agentes

Computer Use

Computer Use is an AI capability, first released publicly by Anthropic in October 2024, that allows a language model to control a computer's graphical interface—moving the cursor, clicking buttons, typing text, and reading screenshots—to complete tasks like a human operator.

Computer Use refers to an AI system's ability to perceive and manipulate a desktop or virtual machine through the same graphical interface a human would use: observing the screen via screenshots, issuing mouse clicks and keyboard events, and iterating until a goal is reached. Anthropic introduced the capability as a beta feature of Claude 3.5 Sonnet in October 2024, marking the first time a general-purpose commercial language model was explicitly trained and evaluated for GUI interaction at production scale.

The technical loop works as follows: the model receives a screenshot of the current screen state; it reasons about the next action needed; it outputs a structured action command such as a click at specific pixel coordinates, a typed string, or a key combination; a thin execution layer applies that action to the OS or a sandboxed virtual machine; the updated screenshot feeds back into the model. This perceive-plan-act cycle continues until the task is complete or an error condition is detected. The model operates on visual pixels rather than DOM or accessibility-tree data by default, making it applicable to any GUI—including legacy applications—without integration work.

Computer Use matters because it allows AI to operate software for which no API exists and to perform multi-step workflows that span several applications. Prior automation approaches such as Selenium-based RPA required predefined element selectors that broke when UIs changed; a vision-based agent can adapt to layout changes the way a human operator does. Benchmark evaluations on OSWorld and WebArena show meaningful but imperfect performance—models in 2025 achieved roughly 20–40% success rates on complex multi-step tasks, with scores improving with each successive model generation.

By 2026, Computer Use-style capabilities are offered by multiple providers: Anthropic (Claude), OpenAI (Operator, launched January 2025), and Google (Project Mariner). Enterprise applications include automated QA testing, data entry into legacy ERP systems, and replacement of brittle RPA bots. Safety challenges—particularly preventing malicious web content from hijacking the agent through prompt injection embedded in visible page text—remain an active research area.

Exemplo

A business analyst instructs a Computer Use agent to open the company's legacy ERP, navigate to the monthly expense report section, export the data as CSV, and paste it into a spreadsheet template—completing a 15-minute manual task in under two minutes.

Termos relacionados

Últimas notícias sobre o tema

← Glossário