Together AI Blog→ original

Together AI raised Batch Inference API limits 3,000x and cut prices by 50%

Together AI updated Batch Inference API to process massive data volumes without bottlenecks. Limits increased 3,000x to 30 billion tokens per job. The price fel

Together AI raised Batch Inference API limits 3,000x and cut prices by 50%
Source: Together AI Blog. Collage: Hamidun News.
◐ Listen to article

Together AI updated its Batch Inference API — a service for processing large volumes of requests to LLM models with delayed execution. The company announced three major improvements: a 3000x increase in limits, 50% price reduction, and a redesigned interface for simplified workflow.

Scaling Without Bottlenecks

The main change affected the limits. Previously, the ceiling was 10 million tokens per user per model; now it's 30 billion. This is not just a number — it's a solution to an architectural problem faced by companies processing massive datasets.

Previously, teams with large workloads would handle it this way: split the dataset into chunks, create many small batch tasks, track each separately, coordinate results. This was inconvenient, time-consuming, and expensive. Now you can upload an entire dataset in one operation and get results within a 24-hour SLA — often much faster.

Pricing was updated in parallel. Batch processing now costs approximately twice as much less than real-time API for the same compute volume. When dealing with billions of tokens, the price difference becomes significant for project budgets.

Any Model, Simple UI

The API now works with all 40+ models on the Together platform, including private deployments. Previously, the selection was limited to a few models, which created problems for teams wanting to experiment and test different models in batch mode. The interface was completely redesigned. Previously, you had to write API calls, understand documentation, debug code. Now everything is done through a web application: task creation, progress monitoring, result downloads. A few clicks — and you're done. This lowers the barrier to entry for teams that don't want to be distracted by writing code for every batch request.

Who Needs This

  • Sentiment analysis and text classification on millions of documents
  • Detecting fraudulent transactions — scanning millions of payments and operations
  • Synthetic data generation for training new models
  • Vectorizing large text corpora (embedding generation)
  • Content moderation on social networks and UGC platforms
  • Benchmark testing for evaluating and comparing model quality

A concrete example: Inception Labs is already using the batch API as the foundation of its production workflow. According to co-founder Vladimir Kuleshov:

"We rely on the Batch Inference API to process very large volumes of requests.

High limits allow us to run massive experiments without bottlenecks. Tasks complete significantly faster than the 24-hour SLA, often within hours."

What This Means for the Industry

Batch Inference is moving out of the niche of specialists into the category of mass-market tools. Previously, high costs and technical complexity were serious barriers. Only large research labs, government projects, and large corporations could afford to use batch processing. Now startups and mid-size teams have access to the same tooling. The 50% price reduction and 3000x limit increase eliminate the main obstacles to mass adoption. In 2025, we expect a surge in batch inference use in production applications — from content moderation at scale to synthesizing large volumes of training data for fine-tuning your own models.

ZK
Hamidun News
AI news without noise. Daily editorial selection from 400+ sources. A product by Zhemal Khamidun, Head of AI at Alpina Digital.
What do you think?
Loading comments…