Vercel reveals the top AI models in production: Anthropic leads in spending

Q: Источник материала?

Оригинальная публикация на Vercel Blog. Hamidun News обрабатывает и адаптирует материалы с помощью AI.

Q: Когда опубликовано?

2026-05-21. Время чтения: 3 мин.

Vercel compiled data on real-world AI model usage in production. Anthropic accounts for more spending (61% of total spend), Google processes more tokens (38%),

Hamidun News Editorial

AI monitoring · Vercel Blog

2026-05-21· 3 min

Vercel reveals the top AI models in production: Anthropic leads in spending — Source: Vercel Blog. Collage: Hamidun News.

◐ Listen to article

Vercel analyzed seven months of traffic on its AI Gateway — it processes trillions of tokens through hundreds of models in real applications and agents. The results show what the production AI market actually looks like, unlike synthetic benchmarks that change weekly.

Who spends more, who processes more

By spending in April 2026, Anthropic leads: 61% of all spending goes to Claude. This is despite a higher price per token — developers pay more because the result is more valuable. Google takes 21%, OpenAI — 12%, the rest is split between xAI and open models.

By volume of processed tokens, the picture is opposite. Google is first here: 38% of all traffic goes through Gemini (primarily Flash — the fast and cheap version). Anthropic processes 26%, OpenAI — 13%, xAI and others — 23%.

Such a spread seems odd, but the logic is simple. Different models compete on different layers:

Claude Opus goes for complex, expensive tasks — when an error costs money
Gemini Flash eats volumes — for tasks where speed matters more than accuracy
GPT-5.5 is evenly distributed between both layers

It's like two different markets in one market. When a developer chooses a model, they don't think about reputation — they think about the price-to-risk ratio.

Price of error determines model choice

Behind this pattern is a simple principle: a model is expensive if an error is expensive.

Personal assistants — 20% of spending on 40% of tokens. They can run on cheap models, because if the assistant makes a mistake, the user notices and fixes it quickly. The error is local.

Coding agents — 22% of spending on 20% of tokens. An error in code costs developer time and debugging. More expensive than a chat error, but not critical.

Back-office systems — 6% of spending on 15% of tokens. They save money here because volumes are huge, but still don't choose the cheapest option. An error could affect finances or operations.

App generation — 7% of spending on 11% of tokens. Generated code goes through code review before use, so there's a safety net.

There's a larger pattern too: B2B applications spend roughly twice as much per token as B2C. In B2B, an error can lead to financial losses, lawsuits, or downtime. B2C error costs less.

Who wins in what tasks

If you slice the data by work type, you see a fragmented market picture.

Anthropіc is noticeably ahead in software development — developers choose Claude for complex coding and code analysis. This reflects the model's reputation in ML and systems design.

Google dominates in consumer applications — Gemini Flash captured the mass segment thanks to low cost and acceptable quality. This is a strategy: cheap, good enough, volume.

OpenAI is most evenly distributed across all categories — this means GPT-5.5 is used everywhere, from mobile apps to enterprise systems.

xAI and open models collect use cases in special niches — for example, companies that want to work without the cloud or need full customization for themselves.

Over half a year, this picture changes quickly. The release of a new GPT version in April significantly increased OpenAI's share of spending. Gemini Flash in March was much more modest, but quickly captured volumes. This shows that the market responds sharply to quality and price, not momentum.

What this means

The AI market in 2026 is not a search for one best choice. Developers choose models by task, not by prestige. Expensive models go to high-stakes scenarios (when an error costs), cheap ones — to low-stakes (when speed and volume matter). New versions quickly gain share if they solve real problems better and cheaper than competitors. And all models simultaneously win in their segment.

Hamidun News

AI news without noise. Daily editorial selection from 400+ sources. A product by Zhemal Khamidun, Head of AI at Alpina Digital.

Telegram channel RSS hamidun.com