NVIDIA at GTC 2026 Shifts Focus From Chips to Token Factories and Agent-as-a-Service
NVIDIA at GTC 2026 demonstrated a shift from competing on individual GPUs to inference economics. Key themes: 20 years of CUDA as the ecosystem foundation…
AI-processed from Habr AI; edited by Hamidun News
NVIDIA at GTC 2026 demonstrated that the next phase of the AI market will be built not around individual GPUs, but around inference factories, where tokens and agent actions become the primary product. The keynote's central thesis: the company is no longer selling just accelerators, but a complete infrastructure for industrial-scale AI output — from CUDA libraries to server racks, networks, and enterprise software layers.
The twenty-year trajectory of CUDA served as the starting point for this pivot. It was the commitment to a software platform that once transformed NVIDIA's graphics cards from niche hardware into a universal computing tool for machine learning. At GTC, this path was presented as a sequential evolution: first, an ecosystem of libraries and frameworks; then DGX systems; and now, ready-made modular blocks for large AI clusters.
The logic is straightforward: even the most powerful chip means little without software, optimizations, and the ability to quickly deploy practical scenarios in production. This leads to NVIDIA's second thesis: the market is shifting from SaaS to Agent-as-a-Service. Whereas companies once paid for access to a tool and employees extracted the results, businesses now pay for executed AI actions. An agent must not simply generate text; it must close the task: process a request, conduct analysis, prepare a document, make decisions within defined rules. Therefore, the measure of efficiency is no longer abstract performance in FLOPs, but the cost of a useful token and the final price of meaningful action.
In this logic, inference becomes a separate economy, and data centers become production facilities for outputting intellectual work. This is where NVIDIA is advancing the concept of Token Factory. The company proposes viewing modern AI data centers not as data storage locations, but as factories where electricity and infrastructure go in, and a stream of tokens for applications, assistants, and autonomous agents come out.
At GTC, an estimate was shared that by 2027, global spending on building and upgrading such capacity could approach $1 trillion. Demand for these facilities is fueled not only by enterprise AI but also by the growth of open models, which have come close to the state-of-the-art in quality and make launching proprietary services more accessible to a broader range of companies.
The architectural foundation of this strategy is the Vera Rubin architecture. NVIDIA describes it not as another incremental performance gain compared to the previous generation, but as an attempt to repackage the entire stack for inference. What matters now is not a single card or even a single server, but the entire rack as a whole: compute, CPU, memory, storage, networking, security, and optical interconnects between modules.
This approach is necessary to simultaneously increase throughput and system responsiveness without exploding electricity costs. Special emphasis was placed on modularity: configurations can be assembled for different workload types — from mass fast responses to expensive real-time reasoning. This also defines a new market segmentation: cheap responses for mass consumption and premium inference for complex agent scenarios.
Another important signal from GTC: AI agents are increasingly being viewed as part of corporate infrastructure, not as an experimental layer on top of chatbots. Therefore, alongside hardware, NVIDIA is advancing reference software architectures for deploying agents in large enterprises. The idea is for agents to operate within security policies, access only authorized interfaces, and integrate predictably into a company's existing IT landscape. For enterprises, this may even be more important than the chips themselves: without control, audit, and manageability, no autonomy will reach production.
The main conclusion from GTC 2026 is that NVIDIA is seeking to occupy not only the accelerator market, but the position of a foundational provider of inference economy. Whereas competition once centered on transistor count and leadership in model training, the center of gravity now shifts toward the cost of useful action, AI service resilience, and deployment speed of agent systems. For the market, this signals a transition from discussions about "the most powerful GPU" to the question of who can deliver intelligence as a service more cheaply and reliably.
Want to stop reading about AI and start using it?
AI News is a curated feed of AI/tech news. Hamidun Academy teaches you to use AI systematically in your work.