ThunderKittens from Together AI: a new language for efficient GPU kernels
Together AI released ThunderKittens, a compact programming language for writing optimized GPU kernels. On the H100 chip, it runs noticeably faster than standard

◐ Listen to article
Together AI released ThunderKittens, a compact programming language for writing optimized GPU kernels. On the H100 chip, it runs noticeably faster than standard FlashAttention2. The interface resembles PyTorch, so ML engineers can pick it up quickly. The authors openly say it is an experimental project. The code is fully open source and already integrated with NanoGPT for developer training.