Hardware

TPU (Tensor Processing Unit)

A TPU is a custom ASIC designed by Google specifically to accelerate tensor operations—the matrix multiplications at the core of neural network training and inference—offering higher throughput per watt than general-purpose GPUs for those workloads.

A Tensor Processing Unit (TPU) is an application-specific integrated circuit developed by Google to accelerate machine learning workloads, particularly the matrix multiply-accumulate operations that dominate neural network computation. Unlike GPUs, which are designed for general parallel computation, TPUs are optimized exclusively for dense linear algebra—the core mathematical operation in both training and running large models.

TPUs are built around a systolic array architecture: a grid of multiply-accumulate units that pass data between neighbors without storing intermediate results in main memory, dramatically reducing memory bandwidth pressure. Google's TPU v1, deployed internally in 2015 and disclosed at ISCA 2017, delivered 92 tera-operations per second (TOPS) on 8-bit integers. Successive generations added high-bandwidth memory (HBM), bfloat16 support, and multi-chip interconnects. TPU v5e and v5p, available through Google Cloud as of 2024–2025, support pods of thousands of chips linked by custom interconnects for large-scale distributed training.

TPUs matter because they reduce the cost and time of training and serving large AI models. Google credits them with making it economically feasible to train AlphaGo, BERT, and the PaLM and Gemini model families at scale. For inference at production volume—billions of queries per day across Google Search, Translate, and Gemini—TPU clusters provide latency and cost advantages over GPU fleets, particularly for transformer workloads with predictable memory access patterns.

As of 2026, Google offers TPU v5p and v5e through Google Cloud, while the sixth generation (Ironwood TPU, announced at Google Cloud Next 2025) targets exaflop-scale AI training. Other major cloud providers and AI labs have introduced competing custom accelerators, but Google's TPUs remain the most widely deployed first-party AI ASICs in production, having processed the majority of Google's internal ML workloads since 2017.

Exemplo

A research team fine-tuning a 70-billion-parameter language model on Google Cloud provisions a TPU v5e pod of 256 chips, completing training in hours rather than the days the equivalent GPU cluster would require.

Termos relacionados

← Glossário