Google Reveals Scale: 3.2 Quadrillion AI Tokens Per Month, Sevenfold Growth

Q: What is the source?

Originally published on 3DNews AI. Hamidun News processes and adapts the material with AI.

Q: When was it published?

2026-05-21. Reading time: 3 min.

At I/O 2026, Google revealed: it processes 3.2 quadrillion AI tokens per month, seven times more than a year ago. The company offers Gemini 3.5 Flash to save co

Hamidun News Editorial

AI monitoring · 3DNews AI

2026-05-21· 3 min

AI-processed from 3DNews AI; edited by Hamidun News

Google Reveals Scale: 3.2 Quadrillion AI Tokens Per Month, Sevenfold Growth — Source: 3DNews AI. Collage: Hamidun News.

◐ Listen to article

At Google's I/O 2026 conference, CEO Sundar Pichai revealed the scale of artificial intelligence processing. Google processes 3.2 quadrillion tokens per month—seven times more than a year ago. This represents the largest AI infrastructure deployment in tech history.

Exponential Growth in Computing

The figure of 3.2 quadrillion tokens per month reflects explosive growth in AI demand. The sevenfold increase in a year shows that infrastructure is scaling nonlinearly. This is not simply adding new servers, but a complete rethinking of architecture under demand pressure.

Google requires such volumes for:

Running Gemini in search results and on YouTube (hundreds of millions of users)
Embedding AI in Gmail, Docs, Maps, Photos, and other consumer services
Training and fine-tuning new models on internal data
Serving the cloud segment (Google Cloud) for clients
Experimenting with new formats for integrating AI into everyday products

At such a scale, even small improvements in model efficiency yield enormous savings in electricity and server hardware. Every percentage point of optimization translates to tens of millions of dollars per year.

Gemini 3.5 Flash—Optimization for Load

This is why Google introduced Gemini 3.5 Flash—a lighter version of its flagship. The model is designed to handle most routine tasks with lower computational cost, but without critical losses in response quality. Flash takes the load off the main models, allowing the company to distribute computing resources more efficiently. The cost of processing a single token in Flash is several times lower compared to full versions. This is not a stripped-down model, but an engineering solution for real-world tasks that don't require maximum power.

The Race for Infrastructure Supremacy

Google is clearly signaling that it has invested more in scaling than its competitors. OpenAI does not disclose such figures, but it is known they are also expanding capacity for ChatGPT. Launching models from Anthropic, Meta Llama, and other players requires serious computational volumes. This is an arms race, but not for the number of parameters in a model, but for the infrastructure itself. Whoever scales faster and cheaper wins the war for the market.

What This Means

Public disclosure of these figures is a signal to investors about the real scale of Google's AI bets. The company doesn't hide that it's expensive. But the costs pay off: integrating AI into search, video, and cloud—these are powerful monetization points. For developers and startups, the conclusion is simple: invest in optimization, not just scaling.

*Meta is recognized as an extremist organization and is banned in the Russian Federation.

Hamidun News

AI news without noise. Daily editorial selection from 400+ sources. A product by Zhemal Khamidun, Head of AI at Alpina Digital.

Telegram channel RSS hamidun.com

Want to stop reading about AI and start using it?

AI News is a curated feed of AI/tech news. Hamidun Academy teaches you to use AI systematically in your work.

🎓 Academy — 7 days free Free consultation