Habr AI→ original

10 RAG approaches that actually work in production: from basic to GraphRAG

A developer on Habr compiled a practical list of RAG approaches actually used in production. Hybrid search (dense + BM25) consistently delivers +15–30%…

AI-processed from Habr AI; edited by Hamidun News
10 RAG approaches that actually work in production: from basic to GraphRAG
Source: Habr AI. Collage: Hamidun News.
◐ Listen to article

A developer on Habr compiled a practical list of RAG approaches that are actually used in production — based on personal experience and analysis of other case studies from the last year of active LLM stack growth.

Where Everyone Starts

Naive RAG — the starting point for most projects. The scheme is simple: documents are split into chunks, indexed via embeddings, and when queried, the nearest ones by cosine distance are found and passed to the LLM as context. Works on small knowledge bases with simple questions and homogeneous documents. Problems begin at scale: long documents don't fit well into fixed chunks, complex questions require multiple fragments simultaneously, and user formulations often don't match the document style. This is where advanced approaches come in.

Hybrid Search and Reranking

Hybrid search — the first upgrade that almost always pays off. Combining dense vectors (semantic search) with BM25 (keyword search) consistently gives +15–30% accuracy improvement compared to embedding search alone. Dense vectors capture semantic similarity, sparse ones — exact matches on terms, abbreviations, and names. Adding a cross-encoder reranker on top of the top-20 results further boosts quality by 10–15%. The reranker is a heavier model, but works only on the final set of candidates, so latency remains acceptable for production.

Query-Level Techniques

Some RAG problems are more efficiently solved before the search — through query reformulation or expansion.

  • HyDE — the LLM generates a hypothetical document-answer, its vector is used for search. Particularly helpful when the question style and document style differ significantly.
  • Multi-query — from one question, 3–5 paraphrases are generated, search happens in parallel across all of them. Reduces dependence on exact user phrasing.
  • Step-back prompting — before search, the LLM generalizes the query to a higher level of abstraction. Useful when the specific question is too niche for good retrieval.
  • Parent Document Retriever — small chunks are indexed (high precision), and the parent document as a whole is passed into context. Good balance between precision and coverage.
  • Contextual compression — the LLM squeezes only the relevant part out of the found chunk. Saves tokens and reduces noise in the context.

Heavy Artillery

When simple techniques aren't enough, architecturally more complex approaches are engaged.

RAPTOR builds a hierarchical tree of documents: clusters chunks, summarizes each cluster, then clusters summaries again. On query, search happens at the needed level of abstraction. Works well on long documents — technical manuals, financial reports, books with varying levels of detail.

GraphRAG from Microsoft builds a knowledge graph: extracts entities and relationships from text, creates community summaries for different thematic clusters. Consistently outperforms standard RAG on analytical and comparative questions — "how is X related to Y", "what changed since point A" — and on tasks requiring synthesis across the entire corpus.

Self-RAG and Corrective RAG switch the system to agent mode: the model itself decides whether search is needed, evaluates the relevance of what was found, and reformulates the query if necessary. Adds latency and complexity, but noticeably boosts quality on multi-step and ambiguous tasks.

What This Means

Practical path: start with hybrid search plus reranking — this covers most problems with minimal costs. Then add multi-query or HyDE for diverse queries. Connect GraphRAG and Self-RAG only when simpler techniques fail: they require significant development and maintenance costs. For most B2B products, the first two steps are sufficient.

ZK
Hamidun News
AI news without noise. Daily editorial selection from 400+ sources. A product by Zhemal Khamidun, Head of AI at Alpina Digital.

Want to stop reading about AI and start using it?

AI News is a curated feed of AI/tech news. Hamidun Academy teaches you to use AI systematically in your work.

What do you think?
Loading comments…