Latest publications

Together AI Unveiled ATLAS: A Speculator That Accelerates LLM by 4x
Together AI's new ATLAS adaptive-learning speculator technology accelerates LLM inference by 4x without manual tuning—automatically adapting to user workloads.

Together AI Launched Self-Service Instant Clusters on NVIDIA H100 and B200
Together AI officially launched Instant Clusters — self-service GPU clusters based on NVIDIA H100 and B200 that deploy in minutes and are ready for production without lengthy approvals.

Together AI raised Batch Inference API limits 3,000x and cut prices by 50%
Batch Inference API now handles 30 billion tokens (up from 10 million) and costs half as much as the real-time API. It supports all 40+ models on the platform.

Together AI Expanded Its Platform: Fine-tuning Models with 100B+ Parameters
Together AI's fine-tuning platform now trains the most powerful open models — DeepSeek-R1, Qwen3-235B, and Llama 4 — with support for extended contexts and Hugging Face integration.

FlashAttention-3 Will Accelerate Transformers Twofold at 75% GPU Load
Together AI released FlashAttention-3 — an algorithm that accelerates transformers 1.5-2x and utilizes 75% of H100 GPU performance while maintaining low-precision FP8 computation.

Together AI achieves 90% faster training on NVIDIA Blackwell
Together AI announced access to NVIDIA Blackwell GPU clusters with its own optimizations, achieving 90% faster Llama 70B training and 15,264 tokens per second per GPU.

ThunderKittens by Together AI: A New Language for Efficient GPU Kernels
Together AI has unveiled ThunderKittens—a GPU kernel programming language that reads like PyTorch but runs like pure CUDA. On H100, the code runs even faster than classic FlashAttention2.

DSGym: A Framework for Training Data Science Agents with 90+ Scientific Tasks
Together AI released DSGym — a unified framework for evaluating and training LLM agents on data science tasks. It includes 90+ bioinformatics tasks and 92 Kaggle competitions, with a 4B model trained on synthetic data ac

Together AI Explains Why Cloud for AI Is a Completely Different Architecture
AI startups like Cursor iterate weekly and consume GPUs like web apps of 2012. Together AI figured out what cloud infrastructure should look like to keep pace with AI-native companies.

Together AI: How Kernel Optimizations Close the Gap Between Models and GPUs
Together AI's kernel optimization team created technology that accelerates GPU performance by 2-3x. In a week, they adapted kernels for new Blackwell GPUs—work that took NVIDIA a year.

FlashAttention-4: How Together AI Accelerated Attention on Blackwell GPUs
Together AI unveiled FlashAttention-4, an attention algorithm optimization for Blackwell GPUs that achieves 1605 TFLOPs/s and runs 2.7× faster than Triton.