xAI запустила /goal в Grok Build: автономный агент планирует и верифицирует многошаговые задачи
xAI добавила в Grok Build режим /goal — автономное выполнение длинных многошаговых задач без ручного контроля каждого шага. Вы передаёте одну цель, агент сам…
AI-processed from MarkTechPost; edited by Hamidun News
xAI launched /goal in Grok Build: autonomous agent plans and verifies multi-step tasks
xAI launched /goal mode in Grok Build — a tool for autonomous execution of long multi-step tasks that builds a plan on its own, works through a checklist, and verifies the result until complete.
How /goal works
The principle of operation differs from the usual dialogue with an LLM. You formulate a single goal — for example, "implement OAuth authentication," "write and test a JSON parser," or "migrate a component from class to hooks" — and hand it to the agent. Then /goal takes control.
The agent analyzes the task, builds a step-by-step plan, and breaks it down into specific actions. Each action is executed independently: the agent writes code, runs commands, checks intermediate results. If something goes wrong — it corrects the approach without your involvement. The cycle of planning → execution → verification repeats until the original goal is fully achieved.
In the usual mode, a developer conducts a dialogue with an LLM: gives a prompt, receives a response, corrects, clarifies, asks again. In /goal, you delegate not only task execution but also the management of the entire process. This is a fundamentally different level of autonomy.
Built-in result verification
The key feature of the mode is built-in verification at each step. /goal does not execute steps mechanically in sequence: after each stage, the agent evaluates whether the intermediate result matches expectations, and only then moves forward.
For multi-step coding tasks, this is critical:
- Write code — step 1, not the final result
- Run tests and make sure they pass — step 2
- Verify that the new code didn't break existing behavior — step 3
- Confirm that the goal is fully achieved — final verification
The lack of verification is one of the most common complaints about existing coding agents. The tool technically "completed" the task, but the result doesn't match what's needed. "Silent" errors — when an agent confidently moves down the wrong path — are one of the most challenging scenarios in autonomous systems. /goal tries to address this.
/goal in market context
xAI positions Grok Build as a full-fledged development environment where Grok participates in the code creation cycle, rather than simply answering questions. /goal is the next step in this strategy.
"You pass a single goal, the agent plans the approach, goes through
the checklist, and verifies the result until completion," — this is how xAI's team describes the mode.
The market for developer agents is becoming saturated. GitHub Copilot Workspace offers multi-step planning sessions directly in the repository. Devin from Cognition positions itself as a fully autonomous developer agent. JetBrains, Cursor, and other IDEs integrate agent capabilities. Google and Anthropic develop agent modes in their platforms. Against this backdrop, /goal is a logical response from xAI: an autonomous mode where users already work with code.
Notably, autonomous agents are turning into a standard feature at remarkable speed, rather than experimental development. A year ago, such capabilities were limited to niche B2B tools. Today they're being rolled out directly into mass-market products for developers.
What this means
When a tool plans, executes, and verifies on its own, the developer shifts into a task-setting mode instead of micromanaging each step. For long coding projects, this changes the entire work scenario. The question is no longer whether autonomous agents are needed — they're here. The question is how reliably they handle verification when the task is truly complex.
Want to stop reading about AI and start using it?
AI News is a curated feed of AI/tech news. Hamidun Academy teaches you to use AI systematically in your work.
The AI world, distilled — once a week
Seven stories that actually mattered, hand-picked. No noise, no reposts, no press releases.
Done! Check your inbox for a confirmation.