Together AI raised Batch Inference API limits 3,000x and cut prices by 50%

Together AI updated Batch Inference API to process massive data volumes without bottlenecks. Limits increased 3,000x to 30 billion tokens per job. The price fell by half compared with the real-time API. A new web interface simplified creating and tracking batch jobs.

Hamidun News Editorial

AI monitoring · Together AI Blog

May 21, 2026· 2 min

AI-processed from Together AI Blog; edited by Hamidun News

Together AI raised Batch Inference API limits 3,000x and cut prices by 50% — Source: Together AI Blog. Collage: Hamidun News.

◐ Listen to article

Together AI updated its Batch Inference API — a service for processing large volumes of requests to LLM models with delayed execution. The company announced three major improvements: a 3000x increase in limits, 50% price reduction, and a redesigned interface for simplified workflow.

Scaling Without Bottlenecks

The main change affected the limits. Previously, the ceiling was 10 million tokens per user per model; now it's 30 billion. This is not just a number — it's a solution to an architectural problem faced by companies processing massive datasets.

Previously, teams with large workloads would handle it this way: split the dataset into chunks, create many small batch tasks, track each separately, coordinate results. This was inconvenient, time-consuming, and expensive. Now you can upload an entire dataset in one operation and get results within a 24-hour SLA — often much faster.

Pricing was updated in parallel. Batch processing now costs approximately twice as much less than real-time API for the same compute volume. When dealing with billions of tokens, the price difference becomes significant for project budgets.

Any Model, Simple UI

The API now works with all 40+ models on the Together platform, including private deployments. Previously, the selection was limited to a few models, which created problems for teams wanting to experiment and test different models in batch mode. The interface was completely redesigned. Previously, you had to write API calls, understand documentation, debug code. Now everything is done through a web application: task creation, progress monitoring, result downloads. A few clicks — and you're done. This lowers the barrier to entry for teams that don't want to be distracted by writing code for every batch request.

Who Needs This

Sentiment analysis and text classification on millions of documents
Detecting fraudulent transactions — scanning millions of payments and operations
Synthetic data generation for training new models
Vectorizing large text corpora (embedding generation)
Content moderation on social networks and UGC platforms
Benchmark testing for evaluating and comparing model quality

A concrete example: Inception Labs is already using the batch API as the foundation of its production workflow. According to co-founder Vladimir Kuleshov:

"We rely on the Batch Inference API to process very large volumes of requests.

High limits allow us to run massive experiments without bottlenecks. Tasks complete significantly faster than the 24-hour SLA, often within hours."

What This Means for the Industry

Batch Inference is moving out of the niche of specialists into the category of mass-market tools. Previously, high costs and technical complexity were serious barriers. Only large research labs, government projects, and large corporations could afford to use batch processing. Now startups and mid-size teams have access to the same tooling. The 50% price reduction and 3000x limit increase eliminate the main obstacles to mass adoption. In 2025, we expect a surge in batch inference use in production applications — from content moderation at scale to synthesizing large volumes of training data for fine-tuning your own models.

Hamidun News

AI news without noise. Daily editorial selection from 400+ sources. A product by Zhemal Khamidun, Head of AI at Alpina Digital.

Telegram channel RSS hamidun.com

Want to stop reading about AI and start using it?

AI News is a curated feed of AI/tech news. Hamidun Academy teaches you to use AI systematically in your work.

🎓 Academy — 7 days free Free consultation