Habr AI→ original

RAG в enterprise: почему 80% проблем в данных, а не в модели

В enterprise RAG часто ломается в продакшне не из-за модели, а из-за данных: путаница версий, потеря контекста, галлюцинации вместо источников. Разбор конкретны

RAG в enterprise: почему 80% проблем в данных, а не в модели
Source: Habr AI. Collage: Hamidun News.
◐ Listen to article

A RAG prototype is built in a week and demonstrated. It looks magical: the model answers questions about your documents without hallucinating from general knowledge. But in a couple of weeks in production, the system starts confusing versions, loses context, and confidently provides answers from non-existent sources. This is the typical path for most enterprise RAG systems. And the culprit here is not the transformer, but the data and architecture.

Where RAG Breaks

When scaling from prototype to production, an unpleasant truth emerges: 70-80% of RAG problems are related to data management, indexing, and search, not the capabilities of the language model itself. No matter how good GPT-4 or Claude are, if data is poorly indexed, the system will provide incorrect answers. This becomes especially apparent when scaling RAG from 100 to 10,000 documents.

Here are the real reasons why RAG fails in enterprise:

  • Document version confusion — old versions remain in the index, the system doesn't know which document is current. Users get answers based on regulations from two years ago.
  • Context loss — the system doesn't remember what was discussed in previous chat messages, repeats itself, or contradicts itself.
  • Poor chunking — documents are split by size rather than by meaning. Logic breaks apart between chunks, the system misses connections.
  • Lack of reranking — BM25 pulls in a lot of noise, the system can't distinguish relevant documents from random keyword matches.
  • Low-quality embeddings — vectors are trained on a general corpus but don't understand your domain-specific terminology.
  • No feedback loop — no one tracks incorrect answers, the system doesn't learn from mistakes.

How to Build RAG Right

At AlpinaGPT, we worked backwards: first we gathered requirements for an ideal system, then identified the specific problems blocking them. The result is an architecture that has passed real-world testing with corporate clients. Here are the key components:

  • Semantic chunking — we split by document structure, headings, and semantic blocks rather than by size. This keeps context from fragmenting across chunks.
  • Data versioning — each document version is indexed separately with a date stamp. The system knows which document is current.
  • Two-stage search — first, fast BM25 (keywords), then neural reranking (semantics). This is cheaper than searching everything through embeddings.
  • Context between messages — the system remembers the entire chat history and doesn't repeat what was already explained to the user.
  • Feedback — we track incorrect answers and retrain the ranker on these examples.
  • Separate indexes by type — regulations, instructions, FAQs, and code are indexed differently.

What This Means

RAG of the future is not just embeddings and vector search. It's document version management, proper semantic chunking, multi-stage search, and continuous feedback. If half your RAG answers are wrong, the problem is almost certainly not the GPT-4 or Claude model. The problem is how you prepare your data, how you chunk it, how you index it, and how you collect feedback. Rethink this entire pipeline — and quality jumps. This is what we learned from AlpinaGPT.

ZK
Hamidun News
AI news without noise. Daily editorial selection from 400+ sources. A product by Zhemal Khamidun, Head of AI at Alpina Digital.
What do you think?
Loading comments…