@demishassabis→ original

Google DeepMind releases DiffusionGemma — a diffusion-based LLM 4x faster than other Gemma 4 models

Google DeepMind announced DiffusionGemma, a new class of language model that uses diffusion instead of autoregression. It runs 4x faster than the other…

AI-processed from @demishassabis; edited by Hamidun News
Google DeepMind releases DiffusionGemma — a diffusion-based LLM 4x faster than other Gemma 4 models
Source: @demishassabis. Collage: Hamidun News.
◐ Listen to article

Google DeepMind has introduced DiffusionGemma — a language model of a new class that applies the diffusion principle of text generation instead of the standard autoregressive approach. According to the team, the model runs 4 times faster than any other model from the Gemma 4 family while maintaining comparable quality.

How Diffusion-Based LLM Works

Classical language models — GPT, Llama, Gemma — generate text sequentially: token by token, left to right. To generate 500 tokens requires 500 consecutive inference steps. This is a fundamental architectural limitation: each subsequent token depends on all previous ones, so parallelism is impossible by the very nature of autoregression.

The diffusion approach works differently. The same idea underlying Stable Diffusion and DALL-E for images is now applied to text: the model learns to restore the original text from random noise, gradually refining the entire sequence as a whole — not left to right, but iteratively, in parallel across all positions simultaneously.

  • Autoregression: 500 tokens require 500 consecutive steps
  • Diffusion: 500 tokens are processed in 10–50 steps regardless of length
  • Speed gains are non-linear — the longer the text, the more pronounced the advantage

Many teams have attempted to master diffusion for text generation since 2021. The main problem has been quality: diffusion-based text models long underperformed autoregressive ones, with texts losing coherence and precision of formulation. Based on Hasabis's statement, DiffusionGemma has overcome this barrier.

Demis Hasabis Announced It Personally

The CEO of Google DeepMind announced the result himself — this is unusual. Executives of this level typically promote entire products or strategic directions but rarely single out specific architectural solutions as a separate occasion for celebration. Hasabis personally congratulated researcher Brian O'Donoghue and the entire team, calling the development 'lightning-fast'.

"An excellent innovation in text diffusion.

DiffusionGemma is lightning-fast — 4 times faster than other Gemma 4 models. Can't wait to see what people will build with it!" — Demis Hasabis

Important context: this is not a comparison with outdated benchmarks, but with the current Gemma 4 family, which itself is considered one of the most efficient in the class of open models. A fourfold increase over such a baseline is a significant architectural achievement.

The Economics of Inference Is Changing

Generation speed determines both the cost of APIs and the latency of the end product. If DiffusionGemma generates 4 times faster at comparable quality, this opens up a range of practical opportunities:

  • Reduced cost of inference — less GPU time per response
  • Long contexts without exponential growth in latency
  • Competitiveness in latency-sensitive scenarios: chatbots, autocompletion, agent pipelines
  • Potential for unification with diffusion-based image and audio generation

Multimodal synergy is particularly interesting: if text diffusion is combined with already mature approaches for images and audio, a single architecture emerges that processes all modalities by one principle. Google is already moving in this direction with the Gemini series — DiffusionGemma appears to be the first step toward full multimodal diffusion.

What This Means

Diffusion-based LLMs have ceased to be an academic experiment. When the CEO of one of the world's largest AI labs personally announces an architectural breakthrough, the market responds. If DiffusionGemma's speed metrics are confirmed in independent tests, this could reshape pricing in the LLM inference market and force competitors to accelerate their own diffusion research. For developers who haven't yet explored this architecture — now is the time.

ZK
Hamidun News
AI news without noise. Daily editorial selection from 400+ sources. A product by Zhemal Khamidun, Head of AI at Alpina Digital.

Need AI working inside your business — not just in your newsfeed?

I build production AI for companies — custom CRM, internal tools, autonomous agents, workflow automation. Owned by you, shaped to your process, no per-seat tax. Built by Zhemal Khamidun, CPO of AlpinaGPT (AI platform, 6,000+ users).

What do you think?
Loading comments…