Negócios

Compute

Compute, in AI contexts, refers to the computational resources — primarily GPU or TPU processing power, memory, and supporting infrastructure — required to train and run AI models. It is the primary limiting resource and cost driver in frontier AI development.

In AI research and industry, "compute" is shorthand for the aggregate hardware resources consumed to execute the mathematical operations — principally matrix multiplications — at the core of neural network training and inference. It is most often quantified as total floating-point operations (FLOPs) for a training run, or as GPU-hours or chip-hours consumed. Unlike data or algorithmic design, the other two canonical inputs to AI progress, compute is directly purchased with capital and scales in a relatively predictable way with investment, making it both an economic lever and a strategic asset.

Training compute depends on three factors: model parameter count, number of training tokens processed, and the efficiency of the hardware and software stack. DeepMind's Chinchilla scaling paper (2022) established principled guidelines for allocating a fixed compute budget between model size and dataset volume. A frontier language model training run processes trillions of tokens over weeks to months on clusters of thousands of accelerators. Inference compute — the cost of serving a model per query — is smaller per request but can aggregate to totals comparable to or exceeding training compute once a widely-used model is deployed at scale.

Compute availability has become a geopolitical variable. Advanced AI accelerators — primarily NVIDIA's H100, H200, and Blackwell-series GPUs — are manufactured by a small number of companies using fabrication processes concentrated at TSMC in Taiwan. U.S. export controls enacted in October 2022 and expanded in subsequent rounds restrict the sale of high-performance AI chips to China and certain other countries, explicitly targeting compute access. These controls have directly affected which organizations can pursue frontier training runs and have accelerated domestic accelerator development programs in China and the European Union.

By 2026, the largest AI training clusters contain tens of thousands of NVIDIA B200 or H200 GPUs or equivalent custom accelerators, representing capital expenditure ranging from hundreds of millions to over one billion dollars per cluster. Microsoft, Google, Meta, Amazon, and xAI have each announced or deployed data centers at this scale for AI workloads. Cloud providers offer compute on-demand or as reserved capacity, with spot markets that fluctuate with demand. Algorithmic efficiency improvements — mixture-of-experts architectures, better data curation, and optimized training recipes — have partially moderated compute requirements per unit of model quality, but total industry compute consumption continues to grow year over year.

Exemplo

A startup planning to fine-tune a large open-weight model estimates the job requires approximately 2,000 A100 GPU-hours and evaluates cloud providers to find the most cost-effective mix of reserved and spot compute before committing to the training run.

Termos relacionados

Últimas notícias sobre o tema

← Glossário