Together AI: how kernel optimizations close the gap between models and GPUs
Together AI’s team adapted CUDA kernels for the new Blackwell GPUs in one week — work NVIDIA had spent a year on. All thanks to FlashAttention (2022) and Thunde

◐ Listen to article
Together AI’s team adapted CUDA kernels for the new Blackwell GPUs in one week — work NVIDIA had spent a year on. All thanks to FlashAttention (2022) and ThunderKittens. This closes the gap between model mathematics and real hardware power.