Habr AI→ original

Apple bets on local AI in M-series chips, not giant models

Apple is increasingly called a loser in the AI race, but the company might have a different bet — not on training giant models, but on cheap local inference…

AI-processed from Habr AI; edited by Hamidun News
Apple bets on local AI in M-series chips, not giant models
Source: Habr AI. Collage: Hamidun News.
◐ Listen to article

Apple rarely makes headlines in AI discussions as loudly as OpenAI, Google, or NVIDIA. But the thesis that the company lost the race may be too narrow: Apple is betting not on the largest models, but on making inference work locally, efficiently, and without constant dependence on the cloud.

Not the Right Metric

When the market discusses AI, the conversation almost always comes down to the same set of status symbols: how many GPUs does the company have, how large are its data centers, how much did the latest training run cost, and can the model outperform competitors in benchmarks. Against this backdrop, Apple does look strange. Siri has long been a convenient target for jokes, its own large model doesn't dominate the news, and integrations with external systems are perceived more as a catch-up move.

But that's precisely the main thesis of this piece: Apple may not be competing in the race for the largest model, but rather in the race for the most practical way to use AI. If you look not at training, but at inference, the picture changes. What matters is not the size of the cluster, but how quickly, cheaply, and stably a model can be run next to the user—on a laptop, workstation, or other local device.

Betting on Chips

The key argument here is Apple Silicon's architecture. In the typical scheme, CPU and GPU work with different pools of memory, and data has to constantly be shuttled between them. This creates unnecessary delays, wastes energy, and runs into bus bandwidth limitations. In the M-series chips, Apple uses unified memory: the CPU, GPU, and Neural Engine work with a single shared memory space. This reduces unnecessary copying and makes the system better suited for inference tasks.

  • CPU, GPU, and Neural Engine access shared memory without constant data copying
  • Lower losses in transfer between computing blocks and reduced energy consumption
  • Running models locally becomes more realistic for everyday tasks
  • AI workloads can be moved closer to the user, not just to the cloud

The author separately highlights the Neural Engine—a specialized block designed for tensor operations, which modern AI relies on. The logic is simple: if inference is not universal computing but mostly repetitive matrix operations, it's more efficient to give them dedicated hardware rather than trying to solve everything with just CPU or GPU power. In this logic, Apple isn't copying NVIDIA's data-center approach but building a more compact and practical infrastructure on the device side.

Where This is Useful

The practical value of this approach is especially evident where cost, latency, and energy consumption matter. This could be edge deployment, back-office automation, local data processing, scenarios with privacy requirements, or workflows where it doesn't make sense to constantly send every request to the cloud. Yes, a cloud H100 cluster will deliver higher peak performance. But for many real-world tasks, businesses don't need a record—they need predictable economics and the ability to keep the system at hand.

This approach has its limits. Apple Silicon doesn't eliminate data centers and doesn't make large-model training on massive GPU clusters unnecessary. If you need to train frontier models or serve millions of concurrent users, cloud infrastructure isn't going away. The point is different: a significant portion of the AI market is not in training but in applying already trained models, and it's precisely here that local inference could become Apple's strength.

"This is not losing the race.

This is participating in a completely different race."

What This Means

The main idea is simple: Apple doesn't have to beat NVIDIA or OpenAI by their rules to occupy an important place in the AI ecosystem. If the market truly shifts from demonstrating power to profitable deployment, those who can run models closer to the user, more cheaply, and with lower overhead will have the advantage. For developers and companies, this looks not like hype but like working infrastructure. And in this version of the race, Apple truly does have a strong position.

ZK
Hamidun News
AI news without noise. Daily editorial selection from 400+ sources. A product by Zhemal Khamidun, Head of AI at Alpina Digital.

Want to stop reading about AI and start using it?

AI News is a curated feed of AI/tech news. Hamidun Academy teaches you to use AI systematically in your work.

What do you think?
Loading comments…