Jiqizhixin (机器之心)→ original

DeepMind and activation function 'mining': why ReLU should retire

DeepMind представила новый подход к поиску функций активации, используя методы, напоминающие майнинг. Вместо того чтобы вручную выводить формулы, исследователи

AI-processed from Jiqizhixin (机器之心); edited by Hamidun News
DeepMind and activation function 'mining': why ReLU should retire
Source: Jiqizhixin (机器之心). Collage: Hamidun News.
◐ Listen to article

For decades, we lived in a world where the ReLU activation function was an unshakeable standard. It was simple as a brick, and effective just enough to avoid getting in the way of neural networks learning. But let's be honest: ReLU (Rectified Linear Unit) became popular not because it was ideal, but because in 2012 we didn't have the resources to try something more complex.

Now DeepMind has decided it's time to stop guessing and turned the search for mathematical formulas into actual industrial mining. The team of researchers built what they call a "computational gold mine." The idea is simple and simultaneously insane: if we don't know which mathematical function works best for deep learning, let's just try them all.

This is the classical brute-force method, taken to its absolute limit. Instead of making mathematicians spend years deriving elegant proofs, DeepMind threw thousands of graphics processors at the task to "excavate" the perfect algorithm. Why is this happening right now?

The LLM market has hit an efficiency ceiling. We keep scaling up the number of parameters, but the basic building blocks of models barely change. DeepMind realized that even a tiny efficiency gain at the activation function level, when scaled to GPT-4 or Gemini, saves millions of dollars on electricity and weeks of training time.

This isn't just academic interest—it's pure economics. In the process of their "mining," the system tested millions of combinations of mathematical operators. Researchers were looking for functions that not only showed high accuracy on paper, but also "played well" with modern hardware.

It turned out that many theoretically strong functions are too complex for GPU computation, making them useless in real production. DeepMind sought the golden mean—computational simplicity and mathematical flexibility. The results are impressive.

The discovered functions outperform not just old-fashioned ReLU, but more modern alternatives like Swish or GeLU. The most interesting thing here is the paradigm shift. We're moving from the era of "smart people inventing algorithms" to the era of "smart systems growing algorithms."

This is real AutoML, what we've been dreaming about for five years, but now it's reached the very foundation of neural connections. What does this mean for the industry? Most likely, in the next generation of large language models we'll see architectures that will seem strange to us.

They'll use functions that no sane person would ever derive on a blackboard, because they don't look "beautiful" from the perspective of classical mathematical analysis. But they will work. And work faster than anything we've ever seen.

The key takeaway: DeepMind has clearly shown that the "gold rush" in AI is shifting from the realm of giant datasets to the realm of reinventing basic mathematics. If you thought the foundations of deep learning were already cemented, get ready—they're currently being torn down with a jackhammer.

ZK
Hamidun News
AI news without noise. Daily editorial selection from 400+ sources. A product by Zhemal Khamidun, Head of AI at Alpina Digital.

Want to stop reading about AI and start using it?

AI News is a curated feed of AI/tech news. Hamidun Academy teaches you to use AI systematically in your work.

What do you think?
Loading comments…