Together AI Launched Self-Service Instant Clusters on NVIDIA H100 and B200

Q: What is the source?

Originally published on Together AI Blog. Hamidun News processes and adapts the material with AI.

Q: When was it published?

2026-05-21. Reading time: 3 min.

Together AI launched Instant Clusters — self-service GPU clusters for model training and inference. They support NVIDIA H100 and B200, ready to operate in minut

Hamidun News Editorial

AI monitoring · Together AI Blog

2026-05-21· 3 min

AI-processed from Together AI Blog; edited by Hamidun News

Together AI Launched Self-Service Instant Clusters on NVIDIA H100 and B200 — Source: Together AI Blog. Collage: Hamidun News.

◐ Listen to article

Together AI officially launched Instant Clusters — self-service GPU clusters that deploy in minutes and are ready for production without lengthy approvals and manual configuration.

What is it

Instant Clusters are GPU clusters based on NVIDIA H100 and B200, deployed via API as cloud services. You create a cluster through a web console, CLI, or programmatically, and within minutes it's ready to handle workloads.

The architecture lets you start with a compact configuration — 8 GPUs on a single node — and scale to hundreds of GPUs in a distributed network configuration without changing application code. Clusters are flexible in orchestration choice: they support Kubernetes for containerized workloads and Slurm for traditional HPC. You can pin NVIDIA Driver and CUDA versions for each cluster, ensuring reproducibility across runs and experiments. Integration with infrastructure-as-code tools (Terraform, SkyPilot) makes deployment part of your CI/CD pipeline.

Full Stack Included

Building a GPU cluster typically requires days of engineering work: installing drivers on each node, configuring network fabrics, setting up HTTPS certificates, organizing storage and resource management. Instant Clusters solve this problem: all critical components are already built into the image and ready to run.

What's included:

GPU Operator — automatic installation and management of NVIDIA drivers, including runtime for CUDA and containers
Ingress Controller — routing incoming traffic to the cluster, with load balancing and failover support
NVIDIA Network Operator — management of high-speed networks (NVIDIA Quantum InfiniBand and Spectrum-X Ethernet with RoCE)
Cert Manager — automatic creation and rotation of TLS certificates for HTTPS endpoints
Storage — high-performance parallel storage located near compute nodes for fast access

Result: clusters are production-ready out of the box, without weeks of post-deployment configuration.

Optimized for Large-Scale Training

Clusters are designed for distributed model training. Inter-node communication uses NVIDIA Quantum-2 InfiniBand with guaranteed low latency and high bandwidth. Within each node, GPUs are connected via NVLink and NVLink Switch, enabling ultra-fast communication.

This architecture is critical for reinforcement learning, large model pre-training, and multi-phase training schedules. A concrete example: Latent Health trains models that reason like clinicians, analyzing multimodal clinical data. Models must account for complex preferences (e.g., how to resolve conflicting diagnoses) and requirements from different insurers. With Instant Clusters, they can run large-scale reinforcement learning on full clinical datasets, experiment quickly, then distill results into small, efficient models that often outperform much larger foundation models.

"With

Instant Clusters, we can start a full-scale experiment in hours instead of weeks of infrastructure preparation."

What It Means

GPU infrastructure finally feels like modern cloud: API-first, self-service, predictable scaling. Historically, GPU clusters were built manually—a long and complex process. Now it's a managed cloud service. For startups, this means a fast path to first inference without infrastructure engineering costs. For enterprises, it means quick response to demand: unexpected inference traffic growth or a new research project requires only an API call, not lengthy procurement.

Hamidun News

AI news without noise. Daily editorial selection from 400+ sources. A product by Zhemal Khamidun, Head of AI at Alpina Digital.

Telegram channel RSS hamidun.com

Want to stop reading about AI and start using it?

AI News is a curated feed of AI/tech news. Hamidun Academy teaches you to use AI systematically in your work.

🎓 Academy — 7 days free Free consultation