NPUs in laptops: new requirements for corporate IT
NPUs (neural processors) are becoming part of laptop SoCs — AMD Ryzen AI 300, Intel Core Ultra. Microsoft requires them for Copilot+ PCs. Corporate IT faces new

Cloud is slowly losing its monopoly on AI computing. Microsoft has written a neural processor into mandatory requirements for Copilot+ PC, AMD and Intel are embedding NPU directly into SoC — and corporate IT faces a choice that didn't exist before.
Three Processors in One Chip
Modern SoCs like AMD Ryzen AI 300 or Intel Core Ultra contain three compute engines: a classical CPU, GPU, and NPU. In theory it sounds simple, but a neural processor is a completely different beast. NPU is optimized for matrix operations: neural networks run faster here than on a universal GPU. But only if the model fits in local memory.
AMD and Intel put just 16 GB of VRAM (or less) into the NPU, while the GPU shares the main system memory of the laptop. This compromise is visible in benchmarks. The NPU in AMD Ryzen AI 300 generated an image in 70 seconds, while the built-in GPU of the same chip handled it in 30. The specialized processor lost to the universal one by half — on the task it was designed for. The bottleneck: memory. When the model is larger than what the NPU can hold, the neural processor becomes a bottleneck instead of an accelerator.
Hybrid Architecture for Corporate IT
The main scenario promoted by Microsoft and partners:
- Light AI tasks (classification, search, generation of small texts) — on NPU locally
- Complex models (large LLMs, video processing) — in the cloud
- Data partially remains on the device, partially synchronized
- Offload logic is built into the application
For corporate IT, this means new challenges. Before, you could simply roll out a cloud service and forget about it. Now you need to:
Manage models on devices. Each laptop receives a set of ONNX or TensorFlow models. Versions can diverge. Updates are pulled over the internet. For 50K company laptops, this becomes a logistics challenge.
Control memory. If the model doesn't fit in 16 GB of NPU — the application crashes. Before, a cloud engineer solved this once. Now you need an algorithm to select the model on the client depending on the configuration.
Monitor battery. NPU consumes less power than GPU, but this is a smaller gain than it appears at first glance.
What Actually Works Right Now
Small LLMs (like Phi-3.5 at 3.8B parameters) fit entirely on a laptop. Inference runs on the NPU in acceptable time. But this only works for reading and classification. Text generation on a full Llama-2-7B is still slow.
Copilot for Microsoft 365 — the main beneficiary. Document search, email summaries, meeting scheduling — all of this works locally. Microsoft promises this will improve privacy (data doesn't go to the cloud) and speed (no network delays).
What This Means
Laptop makers have won in niche markets. Corporate IT has gained a tool it needs to learn how to use. The boldest are already preparing model delivery platforms. The rest are waiting for it to simplify. No one is leaving the cloud — by definition, a hybrid future contains both parts.