New GPUs Will Lower Inference Costs, But Not Prices for Users
Inference (AI model deployment) costs are rising due to growing infrastructure demands. A new generation of GPUs and specialized accelerators promises to…
AI-processed from 3DNews AI; edited by Hamidun News
Every day, AI services grow more expensive due to rising infrastructure demands. Companies spend increasingly on servers and GPUs for inference—the phase when a trained model runs and responds to user requests. Consumers feel this in their API and subscription bills.
Why Inference Is So Expensive
Inference isn't model training. A model is trained once, then run thousands of times daily across thousands of servers. Each user request requires a GPU calculation. When millions of people simultaneously use ChatGPT, it creates enormous load. Developers have two options: buy more GPUs or accept queues.
NVIDIA sells H100s and B100s for hundreds of thousands of dollars each. OpenAI, Google, and Meta buy them by the thousands. On top of that, they pay for electricity (several kilowatts per chip) and cooling (specialized liquid cooling systems). That's why Claude Pro costs $20 a month—it's simply infrastructure engineering.
Salvation from New Hardware
Chip manufacturers see the problem and are releasing specialized hardware for inference. NVIDIA is preparing its Blackwell series for AI, Intel is developing Gaudi, and AMD is improving MI300X. The new generation promises:
- Lower power consumption (30–40% cheaper per year on electricity)
- Higher performance per watt (one new chip replaces two old ones)
- Optimization for typical models (less memory, faster calculations)
- Scalability (easier to build farms with thousands of chips)
In theory, this could reduce inference operating costs by 25–50%.
But Prices for Users Won't Drop
The Register rightly reminds us: when equipment becomes cheaper, it rarely leads to lower prices for end consumers. Here's why:
First, developers still pay for electricity, server racks, cooling, and depreciation on old GPUs (which don't disappear overnight). Second, companies use the savings to develop new features and expand model parameters—expensive work requiring more GPU resources. Third, the market is young. OpenAI, Google, and Anthropic still set prices without aggressive price competition. They compete on quality and capabilities. When there are 20 comparable services on the market, prices will drop—but not today.
What This Means
New hardware is a gift for companies, not consumers. Cheaper GPUs will let AI services remain profitable even as demand grows. The savings will likely go toward training new models, geographic expansion, and service improvement—not subscriber discounts. AI services will stay expensive as long as the model works.
*Meta is recognized as an extremist organization and banned in Russia.
Want to stop reading about AI and start using it?
AI News is a curated feed of AI/tech news. Hamidun Academy teaches you to use AI systematically in your work.