AI Inference Theft: How Hackers Profit from Vercel via Residential Proxies
Attackers steal expensive AI calls (from $1-2 per request to frontier models) and resell them as OpenAI API with markup. Vercel caught itself with 1,300…
AI-processed from Vercel Blog; edited by Hamidun News
Inference theft is the theft of expensive AI calls for resale. Attackers steal tokens from startups, wrap them in their own API, and resell them as cheap alternatives to OpenAI or Anthropic. Vercel published a detailed report on an attack against its AI endpoints, revealing the economics of theft and why standard web protections are completely ineffective.
Why AI Calls Are So Expensive
A typical HTTP request costs approximately $2 per million requests—nearly free. But a single request to a frontier model (GPT-5.5, Claude 3.5 Sonnet) can cost $1-2. That's a million times more expensive than a standard endpoint. For attackers, this is ideal theft economics: steal one call for $2 and resell it for $1.50—pure profit with zero marginal costs for inference.
How Theft Works—Attack Architecture
Attackers create an adapter—a software layer that converts a stolen endpoint into an OpenAI-compatible API. The victim pays for inferences; the attacker pays zero. The process looks like this:
- Register thousands of throwaway accounts with the victim
- Buy residential proxy IPs in bulk (thousands of addresses)
- Wrap the stolen API in an adapter
- Provide it to their own customer base or resell on the dark market
- Profit from the difference between the stolen price and resale price
A real example is Chipotlai Max, a fork of a coding agent that turns Chipotle's support chatbot into an OpenAI-compatible endpoint. The project openly seeks developers to do the same with Home Depot, Lowe's, Target, and Starbucks.
Why Rate Limits and Auth Don't Work
Rate limits and authentication were designed to protect against password brute-force attacks and DDoS. The math goes like this: stealing a million passwords costs more than protecting them. With inference theft, the math is reversed. Attackers simply buy residential proxy IPs individually—hundreds and thousands of addresses. A rate limit checked once per session is spread across an entire thousand stolen calls, rather than per individual request. A realistic-looking account passes auth. By the time the request reaches your API, it has already crossed the boundary you intended to protect.
Real Attack on Vercel
On April 12, 2026, traffic to the AI chat in Vercel documentation increased 10-fold. At peak—1,300 requests per minute to Claude Haiku 4.5. This corresponded to a run rate of $10,000 in losses per hour. Attackers used residential proxies and fresh accounts to dilute rate limits.
How Vercel Defends Itself
Vercel gates every AI request through BotID—deep analysis that runs not once per session, but on each individual request. Instead of checking at the start, checking happens in every byte of data. This can be implemented on your own endpoints—a few lines of code block automated theft attempts.
What This Means
If you have a public AI endpoint (playground, support, document-AI)—rate limits and auth no longer help. Protection must run at the request level, not the session level. For startups with open access, this is critical: one serious attack can cost tens of thousands of dollars in losses.
Want to stop reading about AI and start using it?
AI News is a curated feed of AI/tech news. Hamidun Academy teaches you to use AI systematically in your work.