NPU (Neural Processing Unit)
An NPU is a dedicated hardware block integrated into a system-on-chip (SoC) that accelerates neural network inference locally on a device, enabling AI features—like voice recognition or image enhancement—without requiring a cloud connection.
A Neural Processing Unit (NPU) is a specialized processor core embedded within a system-on-chip, designed to execute the matrix multiplications and activation functions of neural network inference with far greater energy efficiency than a general-purpose CPU or GPU in the same SoC. NPUs are fixed-function or near-fixed-function accelerators tuned for the data types and operation patterns found in modern deep learning models, particularly convolutional and transformer architectures.
NPUs typically implement a small systolic array or vector engine alongside dedicated on-chip SRAM to hold model weights close to the compute units, minimizing the energy cost of data movement—which dominates total power in memory-bound workloads. They process 8-bit integers (INT8) or 4-bit integers (INT4) by default, exploiting the tolerance of neural network inference for quantized arithmetic. Apple's Neural Engine, introduced in the A11 Bionic in 2017, and Qualcomm's Hexagon NPU are early commercial examples; by 2025, virtually every flagship mobile SoC from Apple, Qualcomm, MediaTek, and Samsung includes a dedicated NPU rated at tens of TOPS.
NPUs matter primarily for on-device AI: running models locally preserves user privacy, eliminates network latency, and enables AI features in offline environments. iOS and Android capabilities such as real-time photo segmentation, wake-word detection, live speech transcription, and on-device small language model inference all depend on NPU acceleration to meet the power and thermal constraints of a smartphone or laptop.
By 2026, NPUs have expanded beyond mobile into laptops—Intel Meteor Lake and Lunar Lake, AMD Ryzen AI, and Qualcomm Snapdragon X Elite all ship rated NPU blocks. Microsoft's Copilot+ PC certification requires at least 40 TOPS of NPU performance, pushing OEMs to select SoCs with capable NPU silicon. The trajectory points toward running billion-parameter models locally, with NPUs as the primary execution engine.