Sakana AI Introduces DiffusionBlocks: A Method for Block-by-Block Neural Network Training

Sakana AI proposed DiffusionBlocks—a method that converts residual networks into independently trainable blocks. The idea: interpret layer updates as steps of reverse denoising. This simplifies parallel training and reduces synchronization requirements.

Khamidun Zhemal

AI monitoring · MarkTechPost

May 31, 2026· 3 min

AI-processed from MarkTechPost; edited by Hamidun News

Sakana AI Introduces DiffusionBlocks: A Method for Block-by-Block Neural Network Training — Source: MarkTechPost. Collage: Hamidun News.

◐ Listen to article

Sakana AI introduced DiffusionBlocks—an innovative method that transforms residual networks into independently trainable modules. Key idea: interpret layer updates as reverse denoising steps, borrowing concepts from diffusion models.

Revolution in Training Architecture

Traditional deep network training is a synchronous process: gradients flow through all layers, with each layer dependent on updates from neighboring layers. This creates bottlenecks in large models. DiffusionBlocks offers a different approach. If we view each layer of a residual network as a step in the reverse diffusion process, then each module can be trained with its own denoising process. In other words, layers become semi-autonomous. Sakana AI demonstrated that such an interpretation is not only theoretically interesting but also practically viable. Models trained with DiffusionBlocks maintain quality and even show faster convergence.

Practical Advantages

Independent block training offers several tangible advantages:

Parallelism without synchronization: blocks train simultaneously without waiting for each other
Memory efficiency: each block stores only its own gradients, not the entire computational graph of the network
Architectural flexibility: individual layers can be stopped, replaced, or updated without full retraining
Scalability: the method is better suited for distributed systems and multi-node training
Reduced communication overhead: less data exchange between nodes in a cluster

Tests showed that on identical hardware, models with standard training and DiffusionBlocks have roughly equal training speed, but the latter requires less synchronization.

Why This Matters

Training large neural networks is one of the major engineering challenges in modern AI. Each new order of magnitude in parameters (billions, trillions) demands a revolution in infrastructure: new specialized chips, optimized algorithms, distributed systems. DiffusionBlocks is an example of how theoretical breakthroughs (in this case, diffusion-based interpretation) can lead to practical improvements. If the method gains widespread adoption, it could potentially reduce training costs and accelerate development. This is especially important for startups and research groups with limited resources. If DiffusionBlocks becomes the standard, it could democratize access to training high-performance models.

What This Means

DiffusionBlocks is a striking example of idea transfer between different areas of AI. A concept born in the context of generative models (diffusion) is now being applied to classical architecture (residual networks). If the method proves scalability in production scenarios, it could become an industry standard for training large models.

Hamidun News

AI news without noise. Daily editorial selection from 50+ sources. A product by Zhemal Khamidun, Head of AI at Alpina Digital.

Telegram channel RSS hamidun.com

Want to stop reading about AI and start using it?

AI News is a curated feed of AI/tech news. Hamidun Academy teaches you to use AI systematically in your work.

🎓 Academy — 7 days free Free consultation