Billions of Parameters: How We Measure Intelligence in Numbers
В мире ИИ принято хвастаться размером. 175 миллиардов у GPT-3, триллионы у GPT-4. Но что это такое? Параметры — это переменные, которые модель настраивает в про
AI-processed from KDnuggets; edited by Hamidun News
Each time a major AI release happens in the industry, the first question in the comments sounds the same: "How many parameters does it have?" This has become a kind of measure of sophistication, equivalent to horsepower in the world of internal combustion engines. We've grown accustomed to figures like 7, 70, or even 175 billion, but rarely wonder what exactly they mean.
If we strip away the marketing hype, a parameter is simply a number. But it is from these numbers that the fabric of modern machine learning is woven. To understand the essence, imagine a giant control panel with billions of knobs and switches.
Each such switch is a parameter. When a model is first created, all these knobs are set randomly. If you ask such an "empty" model about the meaning of life, it will output a random string of characters.
The training process is the painstaking tuning of each of the billions of parameters until meaningful text or images emerge at the output.
Historically, we've followed the path of enlargement. Early neural networks got by with thousands of parameters and could only recognize handwritten digits. Then came the era of Deep Learning, and the count went into millions. A real breakthrough happened with the arrival of the Transformer architecture, which made it possible to scale models to incredible limits. When OpenAI released GPT-3 with 175 billion parameters, the world trembled. It seemed we had found a universal recipe: just add more parameters and data, and the model becomes smarter. This phenomenon was called Scaling Laws. However, in this race for size, we encountered the law of diminishing returns. Massive models require colossal computational power, consume electricity like small cities, and run slowly.
What do these parameters actually do inside the model? In technical terms, they are divided into weights and biases. Weights determine the strength of connections between neurons: how strongly one word in context should influence the choice of the next word.
Biases help the model correct its predictions when data deviates from the norm. In the backpropagation process, the algorithm calculates which direction to turn each of the billions of "knobs" so that the next answer is slightly more accurate. This process repeats trillions of times across vast datasets from the internet, books, and code.
As a result, parameters crystallize human knowledge within them, becoming a kind of compressed database that not only knows how to store facts but also how to combine them.
However, the medal has a flip side — overfitting. If you have too many parameters but not enough quality data, the model can simply "memorize" the training set. It becomes a brilliant top student on exams with familiar questions, but completely fails in real life when faced with an unfamiliar task. This is one of the main challenges in modern development: how to balance model power with its ability to generalize. Moreover, we increasingly see that architectural tricks, such as Mixture of Experts (MoE), allow the use of trillions of parameters without activating them all at once. This makes models more efficient, although their size continues to grow formally.
Today, the industry is gradually moving away from the cult of "gigantism." We see the emergence of small language models (SLM), which at 7 billion parameters show better results than old giants with 100 billion. This happens due to higher-quality data cleaning and smart training methods. Parameters have stopped being just a number in a press release; they've become a resource that needs to be spent wisely. Ultimately, what matters is not how many "knobs" you have on your control panel, but how precisely they are tuned. We are entering an era where architectural efficiency and the density of knowledge in each parameter matter far more than their total quantity.
The bottom line: the race for the quantity of parameters is being replaced by a race for their quality. Could a model with 1 billion parameters ever match the human brain through perfect tuning?
Want to stop reading about AI and start using it?
AI News is a curated feed of AI/tech news. Hamidun Academy teaches you to use AI systematically in your work.