Habr AI→ original

Doubletapp explained why poor datasets prevent AI from improving NPS, CTR, and conversion

Doubletapp said many AI projects fail not because of the model, but because of poor data. A high-quality dataset affects support NPS, CTR, and conversion in…

AI-processed from Habr AI; edited by Hamidun News
Doubletapp explained why poor datasets prevent AI from improving NPS, CTR, and conversion
Source: Habr AI. Collage: Hamidun News.
◐ Listen to article

Doubletapp released an interview explaining why AI projects more often break not on model selection, but on data. Ilnur Fayziev, head of the Data LLM unit, explained how dataset quality directly reflects on support NPS, catalog CTR, and purchase conversion rates.

Where metrics get lost

The main idea of the interview is simple: business buys not a model as such, but an improvement in a specific metric. In support, this is the speed of issue resolution and customer satisfaction; in online retail — card clickability and order share; in knowledge base search — answer accuracy. If a dataset is collected with noise, poor annotation, or without connection to real scenarios, the model begins to make mistakes where each error costs the business money. Therefore, the conversation about data here is not academic, but a direct conversation about revenue, costs, and service quality.

The material also provides less obvious use cases. For an industrial company, AI can search for answers in internal regulations and reduce the number of errors in production. For computer vision tasks — determine steel quality by process parameters and help maintain stable results. In all cases, there is the same logic: at the top is the business metric, below it — the quality of ML-system operation, and beneath it lies the dataset, which either strengthens the model or imperceptibly pulls it down.

  • NPS and response time in support
  • CTR and conversion in e-commerce
  • Accuracy of search in internal knowledge base
  • Reduction of errors in production processes
  • Recognition quality in computer vision systems

When a dataset is mandatory

According to Fayziev, quality dataset is needed in two typical situations. The first — when a company is just comparing AI with manual labor and wants to understand if the solution can be deployed to production. The second — when the system is already working, but its metrics have stopped satisfying: answers are not relevant, recommendations don't lead to purchases, and speed or accuracy has hit a ceiling. In both cases, without measurable current quality and clear target metric, work with data turns into guesswork.

"Datasets are needed at two stages of product development."

A special emphasis was placed on economics. A dataset is not an endless custom development, but rather a final artifact that can be prepared, checked, and loaded into a training or fine-tuning pipeline. Yes, model audits need to be repeated regularly, but data collection and annotation are usually better to outsource to those who specialize in this process. If everything is kept in-house, engineers spend weeks selecting examples, setting up the environment, quality control, and managing annotators. For business, this is often more expensive than it seems at the start.

Why crowdsourcing weakens

The interview is also interesting in that it captures a market shift. Mass crowdsourcing worked well in the era of simple tasks like "cat or dog." Now such scenarios are handled by the models themselves confidently enough, so human annotation is shifting into expert domains.

If it's about a code assistant for a rare language, complex industrial validation, or domain-specific knowledge base, you need not just a large stream of executors, but people who really understand the task context and can spot subtle errors. A combined approach is still possible: the simple part of the pipeline can be given to mass annotation, while the complex part — to an expert team. But then business faces a new burden: task decomposition, finding different contractors, transferring context between them, and additional quality control at interfaces.

This is precisely why the market, according to Doubletapp's assessment, remains relatively narrow and revolves around large LLM companies and those projects where metric improvement can be easily converted to money.

What this means

For the market, this is a signal that competitive advantage in AI is increasingly shifting from choosing the loudest model to the quality of applied data. Large players still need large datasets, but the next wave of demand may come from small teams with niche AI products. They will first test the MVP on ready-made data, and when they see the economics, they will start buying targeted datasets for their weak spots — and that is where real metric growth will appear.

ZK
Hamidun News
AI news without noise. Daily editorial selection from 400+ sources. A product by Zhemal Khamidun, Head of AI at Alpina Digital.

Want to stop reading about AI and start using it?

AI News is a curated feed of AI/tech news. Hamidun Academy teaches you to use AI systematically in your work.

What do you think?
Loading comments…