RAG (Retrieval-Augmented Generation)
RAG (retrieval-augmented generation) is a technique that lets a language model pull relevant documents from an external knowledge base before answering and ground its response in them. It reduces hallucinations and keeps answers current without retraining the model.
RAG combines two components: a retriever and a generator. When a question comes in, the retriever converts it into an embedding, searches a vector database of your documents for semantically similar passages, and hands the best matches to the language model as context. The model then writes its answer grounded in those passages instead of relying only on what it memorized during training.
The approach solves three chronic LLM problems at once. Knowledge freshness: you update the document base, not the model. Hallucinations: the model cites retrieved text instead of inventing facts. Private data: corporate wikis, contracts and support tickets never need to enter model training — they stay in the index and are fetched at query time.
RAG is not a silver bullet: answer quality is capped by retrieval quality. If the search step returns irrelevant chunks, the model will confidently summarize the wrong material. Production systems therefore invest in chunking strategy, hybrid (keyword + vector) search, reranking, and evaluating the retrieval step separately from the generation step.