ZDNet: BlueOptima showed that AI for code in production is noticeably weaker than promised

Q: What is the source?

Originally published on ZDNet AI. Hamidun News processes and adapts the material with AI.

Q: When was it published?

May 2, 2026. Reading time: 3 min.

AI is sold as a fast path to efficiency gains, but the picture in production is harsher. The BlueOptima study cited by ZDNet found that the best models for…

Hamidun News Editorial

AI monitoring · ZDNet AI

May 2, 2026· 3 min

AI-processed from ZDNet AI; edited by Hamidun News

ZDNet: BlueOptima showed that AI for code in production is noticeably weaker than promised — Source: ZDNet AI. Collage: Hamidun News.

◐ Listen to article

AI is often sold as a button for instant efficiency rather than as a complex engineering project. ZDNet highlights research from BlueOptima and warnings from analyst David Linthicum: without proper preparation, measurement, and expertise, implementation can deliver a far different result than what is promised in presentations.

Production Testing

The main argument against AI hype in the article is very grounded: you need to look not at demos and benchmarks, but at real work in production. The BlueOptima BARE study ran 57 large language models through refactoring tasks related to code maintainability. The test used 4,276 real files in nine languages—from C and C++ to Python, PHP, and TypeScript. In total, there were 243,732 model-file pairs. On this material, even the best AI models showed success in less than 23% of cases.

Even more painful is the gap between the impressive numbers from the lab and real-world application. On popular benchmarks, many models scored above 85%, but on tasks where production code maintainability needs improvement, the average result was around 17%. Success was measured strictly: the code had to compile, run correctly, not break the original behavior, and actually improve maintainability, not just look neater. The difference across languages is also significant: around 32% success in JavaScript versus approximately 4% in C, and on complex architectural tasks, the rate fell to 1.5%.

Where Does the Hype Come From

According to ZDNet, the problem is not that AI is useless. The problem is that it is often sold as a ready-made solution, hiding the volume of work behind the scenes. For a model to truly deliver value, you need integrations, clean data, a review process, regression control, security, observability, and people who understand the tool's limitations. Without these, a company gets not acceleration but an expensive experiment that looks convincing only on slides for management.

If technology sounds too good to be true, it probably is.

David Linthicum adds another layer to the problem: the market rewards not the most competent but the most confident. AI has become a convenient label for anything "smart" and "modern," so a layer of consultants, evangelists, and managers who have learned the vocabulary but do not understand how it all works in a business context is growing rapidly around the topic. As a result, investment and strategy decisions can be based on superficial expertise. Linthicum warns that such systems sometimes cost 10–20 times more than traditional alternatives, and mistakes in strategic direction easily turn into unnecessary expenses and strategic missteps.

How to Resist

Resisting hype does not mean rejecting AI. It means stopping buying the promise of "magic" and starting to manage the technology as an ordinary complex system. Evaluation should start from a specific task, not from a trendy label. If the goal can be solved through ordinary automation, rules, or process improvement, that is also a valid outcome. AI makes sense where its advantages can be measured in real scenarios, not guessed from a vendor's presentation.

First, define the business task and baseline metric before implementation.
Test models on your own data, code, and workflows, not on vendor demos.
Calculate the full cost: licenses, infrastructure, review, security, and support.
Assign responsibility to people who understand both AI's strengths and its limits.

This approach sobers expectations. It does not rule out useful cases—AI can accelerate rough work, help with search, suggest refactoring options, and save team time. But where complex architecture, critical changes, or autonomous solutions without human review are involved, the cost of error is still too high. That is why mature teams look not at the loudness of promises but at reproducible results and do not confuse a lucky suggestion with a mature product.

What It Means

Currently, AI disappoints more often not because it lacks potential, but because the market sells it faster than companies can understand the real limits of the technology. Those who will measure impact in production, filter out the noise, and buy not hype but solid expertise will win.

Hamidun News

AI news without noise. Daily editorial selection from 400+ sources. A product by Zhemal Khamidun, Head of AI at Alpina Digital.

Telegram channel RSS hamidun.com

Want to stop reading about AI and start using it?

AI News is a curated feed of AI/tech news. Hamidun Academy teaches you to use AI systematically in your work.

🎓 Academy — 7 days free Free consultation