AI Agents in Code: Why a Beautiful PR Can Hide Production Problems
Coding agents generate code at unprecedented speed. But a beautiful PR and green CI are not a guarantee of production safety. Inside Vercel, they developed a fr
AI-processed from Vercel Blog; edited by Hamidun News
Coding agents generate code at unprecedented speed. In skilled hands, this is a powerful productivity multiplier, but without discipline—it's just an efficient way to deliver wrong assumptions directly to production.
Why Green CI Doesn't Guarantee Safety
AI-generated code looks convincing. Clean PR description, static analysis doesn't complain, tests are green, code follows repository conventions. On the surface, it looks like the work of an experienced engineer.
But green CI is no longer proof of safety. In the AI era, it's just a reflection of the agent's ability to convince your system that a change is safe, even if it instantly degrades infrastructure at scale.
The agent doesn't know your production. Doesn't know traffic patterns, failure modes, implicit infrastructure constraints. Doesn't know that Redis is approaching capacity, that the database is tied to a specific region, that a flag rollout will change the load on downstream services.
These aren't theoretical—a query that passes tests but will scan every row of the database in production. Retry logic that looks correct but triggers a thundering herd on a dependent service. A cache without TTL that silently grows until Redis dies.
The gap between 'looks right' and 'safe to ship' has always existed. Agents make it wider because they produce code that looks more flawless than ever while remaining completely blind to production reality.
Use, Don't Rely
There's a fundamental difference between relying on AI and using it.
Relying is when you assume: the agent wrote it, tests passed, it's ready to ship. The author never builds a mental model of the change. The result—massive PRs with implicit assumptions that can't be reviewed because neither author nor reviewer sees the full picture.
Using is when the agent helps you iterate quickly, but all responsibility for the output remains with you. You know how the code behaves under load. You understand the risks. You're willing to take them. When you sign off on the PR, you're saying: 'I've read this and I understand what it does.'
If you need to re-read your own PR to explain its production impact—your engineering process has failed.
The criterion is simple: are you willing to own the production incident related to this PR?
How to Protect Production
The answer isn't to stop using agents. The productivity gains are undeniable, and models keep improving. AI assistants for code review and analysis are incredibly powerful tools that catch bugs and surface-level risks better than humans.
But relying solely on review is a losing battle against the scale of agent-generated code. We've hit an inflection point where code implementation is abundant. The scarce resource now isn't writing code—it's judgment about what's safe to ship.
All infrastructure must reflect this new reality. We need a closed system where agents operate with high autonomy because the environment is standardized, verification is trivial, and deployment is safe by design.
- Self-driving deployments—each change rolls out incrementally through gated pipelines
- Canary deployments with automatic rollback on degradation
- Verification steps that cannot be bypassed
- Production insights built into the process, not added at the end
The idea is simple: make the right decision the easiest one.
What This Means
The world of development is transitioning from blind trust in tools to a controlled process. This isn't about extra approvals—it's about AI agents operating in conditions that guarantee safety.
For teams, this means: prepare not for agent features, but for discipline and structure around them.
Want to stop reading about AI and start using it?
AI News is a curated feed of AI/tech news. Hamidun Academy teaches you to use AI systematically in your work.