AWS Machine Learning Blog→ original

AWS launched hourly GPU reservations — for ML testing and release preparation

AWS launched EC2 Capacity Blocks for ML, enabling hourly GPU reservations instead of long-term contracts. It suits load testing, model validation, and infrastru

AWS launched hourly GPU reservations — for ML testing and release preparation
Source: AWS Machine Learning Blog. Collage: Hamidun News.
◐ Listen to article

AWS introduced EC2 Capacity Blocks for ML and integrated SageMaker training plans — a new solution for reserving GPU capacity for short periods. This addresses the main pain point for ML engineers: acute shortage of available GPUs and the need to pay for long contracts even when computing power is needed only for a few hours. Now you can reserve exactly as many GPUs as you need and exactly when you need them.

When Short GPU Time Is Needed

In practice, such scenarios are far more common than it seems. Load testing before a new feature release requires full infrastructure, but only for a day or two — after testing, there's no point in spending money. Model validation — checking a new prompt or fine-tuned model against real data — often takes 4-8 hours. Team workshops where engineers learn to work with frameworks (PyTorch, TensorFlow) require GPU for the session duration, not permanently. Before a major release, you need to prepare inference infrastructure — spin up servers, warm the cache, run smoke tests. Plus temporary traffic spikes during peak hours, when additional computing power is needed, but interest drops afterward.

  • Load testing before feature releases
  • Model validation after fine-tuning
  • Team training and workshops
  • Preparing inference capacity before release
  • Handling temporary traffic spikes

How Capacity Blocks Works

The logic is straightforward: instead of a Reserved Instance (monthly or yearly contract) or On-Demand (expensive for continuous use), you reserve a GPU block for a specific time — from hours to several days. AWS guarantees that capacity will be reserved and available during your chosen period. This gives engineers predictability: you know the GPU will be ready when scheduled.

The service is integrated with SageMaker Training Plans — you launch a training job and don't worry that the GPU will run out in the middle of model training. EC2 Capacity Blocks works with various GPU types: NVIDIA H100 (for LLM), A100 (universal choice), L4 (compact, for inference). You choose the config depending on your workload type.

Everything is managed through the familiar AWS interface, with integration into SageMaker, CloudFormation, and other tools.

Pricing and Flexibility

Before, the choice was grim. Either Reserved Instance for a year ahead — cheap, but you lose flexibility. Or On-Demand by the hour — flexible, but you pay 3-4 times more. Capacity Blocks occupies a sweet spot between the two extremes: cheaper than On-Demand, but without a long-term contract. Most importantly, you don't pay for unused time. For business, this means more accurate infrastructure budget planning and avoiding overpayment. Engineers won't ask for GPU "just in case" and thus inflate expenses. DevOps teams can flexibly scale infrastructure before critical moments — releases, conferences, marketing campaigns — knowing the exact price per hour.

What This Means

Cloud services are increasingly adapting to the real needs of ML work. The era when you had to order GPUs in advance and pay for unused time is fading into the past. Instead, you pay only for what you use, at the exact moment you use it — this is more economical, more logical, and reduces waste in infrastructure projects.

ZK
Hamidun News
AI news without noise. Daily editorial selection from 400+ sources. A product by Zhemal Khamidun, Head of AI at Alpina Digital.
What do you think?
Loading comments…