Cursor Revealed Lessons from a Year of Developing Cloud AI Agents
Cursor demonstrated three key lessons from a year of developing cloud AI agents: a complete development environment is critical for quality, long-running tasks
AI-processed from Cursor Blog; edited by Hamidun News
When Cursor launched cloud agents a year ago, they seemed like a straightforward extension of local agents. It's now clear: cloud agents operate in a different paradigm—they run on their own virtual machines, work in parallel, and solve tasks that stretch across hours and days. This requires a completely different approach to infrastructure.
Environment Is the Product
The year's key discovery: development environment quality is the primary factor in cloud agent productivity. On a local machine, an agent inherits your environment for free—the entire installation history, configurations, and variables. In the cloud, you need to recreate everything from scratch.
When something is wrong, the agent doesn't crash with an obvious error—it degrades silently. The output simply looks worse than before, and it's easy to blame the model. But in practice, the environment is the culprit: missing dependencies, incorrect paths, lack of verification tools.
A year ago, this wasn't noticeable—models couldn't effectively leverage the environment. Now, as the GPT family has become more capable, environment has become the determining factor for full agent performance.
A complete cloud environment requires surprisingly much infrastructure:
- Tools for building and configuring agent environments
- Hibernation and fast VM resumption mechanisms between messages
- Pipelines for reliable checkpoint, restore, and VM image forking
- Tight integration with harness and client—so agent and human read and act the same way in the environment
Additionally, cloud agents need controlled network access: to open PRs, pull dependencies, and conduct research. This spawned an entire direction—something like enterprise IT for agents, with secret redaction, network policies, and credential management.
From One Nine to Two Nines
Cloud agents revealed a new class of reliability problems. Each agent runs on its own isolated VM, allowing them to run in parallel and handle multi-hour tasks. But this creates vulnerability to inference provider outages, pod replacements, and node failures.
Initially, Cursor built cloud agents with a work-stealing architecture: worker nodes picked up tasks and carried them to completion. This model worked locally, but proved fragile in the cloud—early beta provided around one nine of reliability (90% uptime).
As agents matured, the team noticed they were reinventing durable execution primitives that Temporal already elegantly solved: retry mechanisms, job scheduling across machines, durability against node failures.
They decided to migrate the entire agent-loop to Temporal. Result: reliability increased to two nines (99% uptime). Now Temporal processes over 50 million actions per day across 7 million workflows.
Internally, over 40% of all PRs in Cursor are generated by cloud agents—a testament in itself that the system is working.
What This Means
A year of cloud agent operation has shown: this isn't simply porting local code to the cloud. It's building an entire operational layer around the agent—with a complete developer environment, reliable task delivery, and controlled network access. As agents take on more work, the engineering challenge becomes clearer: provide the machine with exactly what a developer has, and guarantee it won't break it.
Want to stop reading about AI and start using it?
AI News is a curated feed of AI/tech news. Hamidun Academy teaches you to use AI systematically in your work.