Habr AI→ original

Neural networks are still weak at multiplication: why AI writes code but gets arithmetic wrong

Large language models are impressive at code and text, but they still often get multiplication wrong. The reason is simple: an LLM usually predicts the next…

AI-processed from Habr AI; edited by Hamidun News
Neural networks are still weak at multiplication: why AI writes code but gets arithmetic wrong
Source: Habr AI. Collage: Hamidun News.
◐ Listen to article

Large language models can write code, translate texts, and maintain long conversations, but they still have a systemic weakness with multiplication. The problem is that most neural networks don't "calculate" numbers step by step, but instead predict the most probable sequence of symbols — and this quickly becomes apparent in arithmetic.

Why This Happens

For humans, multiplication is an algorithm: break numbers into digits, multiply the parts, carry the tens, and add the intermediate results. For a language model, an expression like 37 × 48 is first and foremost a text template, similar to millions of other sequences it saw during training. It doesn't run a built-in "calculator" by default, but tries to continue the string in a statistically plausible way. On short and frequent examples, this approach sometimes gives the right answer, but it's not the same as reliable computation.

"They don't count in the way we understand it, but rather remember and

approximate answers."

Because of this, a model can appear very intelligent on tasks where some variation in formulation is acceptable, but stumble where a single precise result is needed. Text, code, and even article summaries often forgive small deviations: meaning can be conveyed in different ways. In arithmetic, there's no such luxury. An error in a single digit turns a correct answer into an incorrect one, and a beautiful explanation won't help. This is precisely why the contrast between "writes poems" and "gets confused by multiplication tables" seems so stark.

Where Models Fail

This is best seen in tasks that require strict step-by-step adherence rather than pattern recognition. If an example appeared many times, the model can reproduce the answer almost flawlessly. But the longer the numbers and the more carries between digits, the higher the chance that it will start improvising. Add some extra text to the problem, an unusual format, or several operations in a row — and the probability of failure rises noticeably.

  • Multiplication of multi-digit numbers with multiple carries
  • Rare combinations that were almost absent from training data
  • Tasks where numbers are mixed with text, units of measurement, or conditions
  • Chains of calculations where an early error breaks the entire subsequent answer
  • Verifying its own result without an external tool

The paradox is that writing code is often easier for the model than doing arithmetic itself. In programming, it relies on a huge corpus of repetitive structures: syntax, typical functions, known libraries, solution templates. If asked not to calculate itself but to write a short program to perform the calculation, the result is often more reliable. In other words, the model can successfully describe a procedure or generate a tool that solves the problem, but doesn't always reliably execute that procedure in its own "head."

How This is Overcome

This is precisely why practical AI systems are increasingly supplemented with external tools. If a product needs accurate math, the model shouldn't guess the answer from memory: it's better to direct it to a calculator, Python interpreter, SQL engine, or specialized computation module. This approach has already become standard in agent systems and corporate scenarios where the cost of error is too high.

There are also deeper attempts to solve the problem at the architecture level. Researchers are experimenting with models that work better with symbolic rules, retain intermediate states, or are trained more precisely to execute step-by-step operations. Techniques like chain-of-thought also help, where the model lays out the solution step by step, but this isn't magic: if the underlying mechanism is still based on token prediction, a long chain of reasoning can also lead carefully to the wrong number. Reliability doesn't come from a beautiful explanation, but from a verifiable computational loop.

What This Means

The main conclusion is simple: impressive language doesn't equal accurate calculation. As AI increasingly transitions from the role of conversational partner to the role of working tool, the more important it becomes to separate tasks for "generate a plausible answer" and tasks for "get a guaranteed correct result" — and to use separate verification and computation mechanisms for the second class.

ZK
Hamidun News
AI news without noise. Daily editorial selection from 400+ sources. A product by Zhemal Khamidun, Head of AI at Alpina Digital.

Want to stop reading about AI and start using it?

AI News is a curated feed of AI/tech news. Hamidun Academy teaches you to use AI systematically in your work.

What do you think?
Loading comments…