Program Verification in the AI Era: Why Hallucinations Make Code Verification More Important
AI assistants change code writing, but verification is not easier — quite the opposite. Researchers created a formally verified conference management system…
AI-processed from Habr AI; edited by Hamidun News
Generative AI revived a long-held dream of programmers: simply state a task — and the machine will write code itself. A new academic study tempers this optimism, without denying real progress: coding accounts for only a small portion of software engineering, and its most difficult parts — requirements, architecture, and verification — still remain with engineers.
Why AI Doesn't Solve the Main Problem
Software engineering is not just about writing code. Industry surveys show: developers spend 20 to 30% of their work time on direct coding. The rest goes to requirements gathering, architecture design, reviews, testing, and bug fixing. AI assistants can generate functions and explain unfamiliar code, but they cannot prove the correctness of the system as a whole — this is a fundamentally different task.
The main problem is hallucinations. AI confidently offers code that looks plausible, compiles, and passes basic tests, but contains subtle logical errors. "Almost correct" components stacked together form an unreliable system. The authors collected documented incidents with AI-generated code in production: from incorrect legal citations in chatbots to errors in medical recommendation algorithms. The pattern is the same: the system worked correctly in most cases — until it encountered a boundary case.
Experiment with AutoProof and Eiffel
Researchers didn't limit themselves to theory and conducted a concrete experiment. A conference management system was created with the help of an AI assistant in conjunction with AutoProof — a formal verifier for the Eiffel language. The fundamental difference from testing: formal verification proves correctness for all possible inputs, not just a selection of handpicked examples.
The process required strict iterative discipline:
- Formulate a small fragment of requirements as formal pre- and postconditions
- Ask the AI assistant to implement the corresponding code
- Run AutoProof and get a verification error or confirmation of correctness
- Fix the specification or implementation — and repeat the cycle
Key observation: the AI assistant did not verify the code independently. It helped formulate specifications and write the implementation, while an independent formal prover checked correctness. This division of roles is fundamental — and according to the authors, this is how next-generation tool chains will be organized.
Federation of Agents as a New Paradigm
The authors propose a new perspective on development: not a single AI helper, but a federation of interacting agents with clear areas of responsibility. One generates code, a second writes tests, a third runs formal verification, a fourth analyzes production incidents. No single agent bears full responsibility for correctness — this is a property of the system as a whole.
"Hallucination as a type of failure makes correctness guarantees more,
not less important," the authors argue.
This approach requires developers to understand classical software engineering: formal specifications, invariants, contracts. AI does not eliminate this knowledge — it makes its practical application more accessible. A developer who can formulate requirements precisely and read verification results is in a fundamentally advantageous position.
What This Means
The authors' conclusion is cautiously optimistic. For everyday development, AI is a leveling technology: a developer without experience approaches an experienced one, routine code is written faster, obvious errors are found sooner. For critical and business systems, AI becomes an amplifying technology: an experienced engineer gets a powerful tool, but classical lessons of software engineering apply more than ever. Verification becomes not less, but more important — precisely because AI can make mistakes confidently and invisibly.
Want to stop reading about AI and start using it?
AI News is a curated feed of AI/tech news. Hamidun Academy teaches you to use AI systematically in your work.