Local AI on M1: Why Apple Magic Shattered Against Harsh Reality
Запуск локальных языковых моделей через Ollama на чипе M1 казался отличной идеей для тех, кто ценит приватность. Однако реальность оказалась жестче маркетинговы
AI-processed from ZDNet AI; edited by Hamidun News
Remember that feeling when Apple introduced the M1 chip? It was a moment when the world of Windows laptops suddenly seemed like an antique. We got used to our MacBooks handling everything: from 4K video editing to hundreds of Chrome tabs. But then the era of large language models arrived, and it turned out this magic had a tangible limit. The attempt to transform the M1 into a personal artificial intelligence center using Ollama became an excellent cold shower for everyone who believed in the eternal youth of the first generation of Apple Silicon.
The context here is simple: right now, every other tech blogger urges you to abandon your ChatGPT subscription in favor of local models. The arguments are solid—privacy, no censorship, and work without the internet. Tools like Ollama made the installation process so simple that even your grandmother could handle it. You download the app, enter a command in the terminal, and there it is—Llama 3 or Mistral—living right on your SSD. It sounds like victory until you press Enter and start waiting.
The main problem that dreams crash into is RAM. Apple has spent years convincing us that 8 GB of unified memory in the M1 is equivalent to 16 GB in regular PCs. For web browsing, this may be true, but neural networks don't read marketing brochures. Models weighing 4 or 8 gigabytes literally consume all system resources. As soon as you run something more serious than the simplest chatbot, the system starts frantically swapping to disk, and generation speed drops to the level of "one word every three seconds." Reading such a response is like watching a sloth try to type a dissertation.
The second unpleasant surprise is heating. We're used to the M1 being cold and silent. But local AI loads the graphics cores and neural engine at 100%. After ten minutes of active dialogue, the casing starts to resemble a frying pan surface, and the system engages throttling, further slowing text generation. This creates a funny paradox: you have an incredibly smart machine in your hands that knows the answers to all of humanity's questions, but it's too busy not melting to answer you quickly.
Why do we need this experience at all? It highlights a critical shift in the industry. Apple has long been stingy with RAM in the base versions of its devices. Now this strategy is backfiring. If the company wants to truly implement Apple Intelligence at scale, they'll have to admit that 8 GB is no longer the "golden standard" but a technical debt. Even the Unified Memory architecture doesn't save you when model weights simply don't fit in the physical chips.
For the industry, this means the beginning of a new arms race where megahertz matter less than memory bandwidth and capacity. We're entering a phase where local AI stops being just a software gimmick and becomes the main driver of sales for new hardware. If you were planning to use your M1 for text-based work for another couple of years, I have bad news for you: neural networks will force you to upgrade much sooner than you planned.
Ultimately, the Ollama experiment on old hardware isn't a software failure but an honest diagnosis. Local AI today is a luxury for owners of Max and Ultra versions of chips with massive RAM. For everyone else, cloud solutions like ChatGPT or Claude remain the only way to get reasonable performance without risking burning your knees.
The bottom line: Apple will either need to radically increase memory in base MacBook Airs, or admit that their "most popular laptops" aren't ready for the future they themselves announced.
Want to stop reading about AI and start using it?
AI News is a curated feed of AI/tech news. Hamidun Academy teaches you to use AI systematically in your work.