FlashAttention-4: how Together AI accelerated attention on Blackwell GPUs

Q: Источник материала?

Оригинальная публикация на Together AI Blog. Hamidun News обрабатывает и адаптирует материалы с помощью AI.

Q: Когда опубликовано?

2026-05-21. Время чтения: 3 мин.

FlashAttention-4 rebuilt the attention kernel specifically for Blackwell. The speedup comes from the new tensor memory (TMEM) and the 2-CTA MMA mode, which addr

Hamidun News Editorial

AI monitoring · Together AI Blog

2026-05-21· 2 min

FlashAttention-4: how Together AI accelerated attention on Blackwell GPUs — Source: Together AI Blog. Collage: Hamidun News.

◐ Listen to article

FlashAttention-4 rebuilt the attention kernel specifically for Blackwell. The speedup comes from the new tensor memory (TMEM) and the 2-CTA MMA mode, which address the bottleneck—not in matrix operation speed, but in the SFU for softmax and in memory. Result: 1605 TFLOPs/s (71% utilization), 1.3× vs cuDNN and 2.7× vs Triton.

Hamidun News

AI news without noise. Daily editorial selection from 400+ sources. A product by Zhemal Khamidun, Head of AI at Alpina Digital.

Telegram channel RSS hamidun.com