Jiqizhixin (机器之心)→ original

Tokens Chinese Style: How to Save 50% on API When the Market Grew 300 Times

За последние 18 месяцев потребление токенов в Китае взлетело в 300 раз. Пока гиганты меряются параметрами моделей, бизнес считает убытки от счетов за API. Старт

AI-processed from Jiqizhixin (机器之心); edited by Hamidun News
Tokens Chinese Style: How to Save 50% on API When the Market Grew 300 Times
Source: Jiqizhixin (机器之心). Collage: Hamidun News.
◐ Listen to article

While the world watches the next GPT update, China is experiencing a quiet, yet extremely expensive revolution. Over the past year and a half, token consumption in China has grown an incredible 300 times. This isn't just statistics — it's an explosion that exposed the industry's main problem: artificial intelligence today costs outrageously.

If you thought cloud computing bills hurt, imagine the scale of disaster for companies trying to embed neural networks in every business process. The situation has reached a point where even tech giants are questioning whether such spending is justified. Let's recall how we got here.

A year and a half ago, the Chinese AI market was in the "hundred models war" stage. Every self-respecting tech giant felt obliged to release its own LLM. In the race for answer quality and accuracy, everyone forgot about economics.

As a result, we got a market where demand for computing grows exponentially, while business margins approach zero. The classic "money-burning" model that worked in the Uber and food delivery era fails here due to the colossal cost of GPU hours and infrastructure maintenance. Help came from a team from Tsinghua University — the main talent forge for Chinese high-tech.

A new wave of startups specializing in AI infrastructure decided to approach the problem not from the side of model training, but from their exploitation. They claim they can cut API spending in half. It sounds like marketing hype, but behind it lies serious engineering work.

It's about deep optimization of resource planning, intelligent caching, and what the industry calls co-design of software and hardware. They don't just rent servers — they rebuild the way the model communicates with the hardware. Why does this matter right now?

Because the market is transitioning from amazement to pragmatism. Investors are no longer willing to sign checks just for the presence of "AI" letters in a presentation. They need ROI numbers.

If a startup spends 80% of its revenue on API payments from OpenAI or local Baidu, it has no future. Infrastructure optimization solutions become that very "secret sauce" that will allow AI to move out of labs and expensive toys for geeks into the real economy — from manufacturing to retail. Interestingly, this "frugal AI" trend was born in China precisely because of chip shortages and sanctions restrictions.

When you don't have endless access to the latest H100s, you start thinking about how to squeeze the maximum from what you have. In this sense, Chinese engineers are currently at the forefront of optimization. They're learning to do more with less, and this experience will soon be in demand around the world.

After all, ultimately, it won't be the one with a 1% smarter model who wins, but the one who can provide this intelligence at a price that won't bankrupt the customer. The bottom line: the era of AI excess is coming to an end. The time for infrastructure engineers is coming — those who will make neural networks truly cheap.

Will Western companies be able to compete on efficiency if token costs in China continue to fall at this rate?

ZK
Hamidun News
AI news without noise. Daily editorial selection from 400+ sources. A product by Zhemal Khamidun, Head of AI at Alpina Digital.

Want to stop reading about AI and start using it?

AI News is a curated feed of AI/tech news. Hamidun Academy teaches you to use AI systematically in your work.

What do you think?
Loading comments…