Hardware

GPU (Graphics Processing Unit)

A GPU is a processor originally designed for computer graphics rendering, built around thousands of small parallel compute cores. Since the early 2010s, GPUs have become the dominant hardware for AI model training and inference because neural network computations map naturally onto this parallel architecture.

A GPU contains thousands to tens of thousands of small compute cores — NVIDIA's H100 SXM5, for example, contains approximately 16,896 CUDA cores — optimized for executing the same arithmetic operation on many data elements simultaneously (Single Instruction, Multiple Data, or SIMD). This contrasts with a CPU, which has a small number of powerful cores optimized for complex sequential logic. Neural network training reduces largely to repeated matrix multiplications and element-wise nonlinear transformations, operations that map naturally onto GPU parallelism and give GPUs a 10–100× throughput advantage over CPUs for these workloads.

During AI training, a GPU stores model weights and input batches in high-bandwidth on-chip memory (HBM — High Bandwidth Memory) and performs forward and backward passes through the model in a highly pipelined parallel fashion. Multiple GPUs are connected via fast interconnects — NVLink within a single server, InfiniBand or RoCE across servers — to distribute training jobs that exceed a single GPU's memory capacity. NVIDIA's CUDA programming framework, released in 2007, became the de facto software layer enabling AI researchers to write general-purpose parallel programs for GPUs without requiring graphics expertise, and it remains the dominant platform as of 2026.

Access to GPU compute is the primary bottleneck for AI development. Training a frontier large language model requires running thousands of GPUs in parallel for weeks to months; inference at production scale similarly requires large GPU fleets operating continuously. GPU scarcity directly determines which organizations can train frontier models and on what timescale. Supply of NVIDIA's most advanced AI chips is constrained by TSMC's leading-edge manufacturing capacity and, since October 2022, by U.S. Department of Commerce export controls restricting shipment of advanced AI accelerators to certain countries — rules that were tightened further in 2023 and 2024.

As of 2026, NVIDIA holds dominant market share in AI accelerators, with its H100 (2022), H200 (2024), and Blackwell-architecture B100/B200 (2024–2025) as the primary platforms for frontier model training. AMD's MI300X and Intel's Gaudi 3 are competitive alternatives used in some hyperscaler deployments. Several large technology companies — Google (TPU v5), Amazon (Trainium 2), Meta (MTIA), and Microsoft (Maia) — have deployed custom AI accelerators to reduce dependency on NVIDIA, though NVIDIA's CUDA ecosystem and software maturity maintain its position as the default platform for most AI research and commercial deployment.

Example

A startup training a 7-billion-parameter language model rents a cluster of 64 NVIDIA H100 GPUs from a cloud provider for two weeks, using model parallelism to split the model across GPUs and gradient checkpointing to fit the training run within available memory.

Latest news on this topic

Meta explains the difference between GPU, CPU, and its own MTIA chips in AI infrastructure2026-06-30 Together AI: GPT-5.5, Gemini and Opus cannot write fast multi-GPU kernels2026-06-30 NVIDIA optimized BEV pooling on GPU for autonomous vehicles, robots, and spatial AI2026-06-29 NVIDIA TensorRT now scales generative AI inference across multiple GPUs2026-06-29 UC Berkeley created mKernel: a unified library for GPU synchronization in clusters2026-05-29

← Glossary

GPU (Graphics Processing Unit)

Example

Related terms

Latest news on this topic