Together AI Blog→ original

Together AI Expanded Its Platform: Fine-tuning Models with 100B+ Parameters

Together AI has expanded its fine-tuning platform. It now supports 100B+ parameter models: DeepSeek-R1, Qwen3-235B, and Llama 4. Support for extended contexts a

AI-processed from Together AI Blog; edited by Hamidun News
Together AI Expanded Its Platform: Fine-tuning Models with 100B+ Parameters
Source: Together AI Blog. Collage: Hamidun News.
◐ Listen to article

Together AI's fine-tuning platform has received a significant update. Developers can now train the largest open models — with hundreds of billions of parameters.

Giant Models in Training

In 2025, many models with 100+ billion parameters have been released. DeepSeek-R1, Qwen3-235B, and Llama 4 Maverick demonstrate results comparable to the best proprietary models on certain tasks. Fine-tuning allows customizing these giants for specific company needs — but previously this was complex, expensive, and required deep ML engineering expertise. Together AI has optimized its platform architecture to make training large models simple and cost-effective.

The company has added support for the latest versions of leading models:

  • DeepSeek: V3, R1, and their base versions
  • Qwen: Qwen3-235B and Qwen3-Coder-480B with context up to 32K tokens
  • Meta Llama: Llama 4 Scout and Llama 4 Maverick
  • OpenAI: gpt-oss-120b as a pilot

Standard support: 16K token context for SFT (Supervised Fine-Tuning) and 8K for DPO (Direct Preference Optimization). Some models receive larger contexts. After training completes, developers can deploy a Dedicated Endpoint for inference or download intermediate checkpoints for analysis.

Extended Contexts for Training

Long documents, large codebases, reasoning chains of AI agents — all of this requires a model that understands extended contexts. The problem: if training examples are shorter than real-world tasks, the model may struggle in production. Together AI has added support for large contexts directly in the training process. This eliminates the mismatch between training and deployment. For example, Qwen3-235B can now train with context up to 32K tokens for SFT tasks. This is particularly useful for training models to edit large files, write documentation, and analyze long conversations.

Integration and New Training Methods

The platform has improved integration with Hugging Face Hub — the largest repository of open models and datasets. Developers can now load models from Hub with one click, run training, and upload results back. This reduces the time from idea to a ready-trained model. New DPO options have also been added — training methods that make models more responsive to human preferences. DPO requires less data than older approaches and often delivers better results on real-world tasks.

What This Means

Training large models is shifting from an exclusive, expensive task to a mainstream tool. Startups, research labs, and mid-sized companies can now adapt DeepSeek, Qwen, or Llama for their needs without requiring multimillion-dollar budgets. This accelerates AI adoption and reduces dependence on closed models.

*Meta has been recognized as an extremist organization and banned in the Russian Federation.

ZK
Hamidun News
AI news without noise. Daily editorial selection from 400+ sources. A product by Zhemal Khamidun, Head of AI at Alpina Digital.

Want to stop reading about AI and start using it?

AI News is a curated feed of AI/tech news. Hamidun Academy teaches you to use AI systematically in your work.

What do you think?
Loading comments…