Sber: Yago knowledge graph barely helped search, while LightRAG added 12 percentage points of accuracy
Sber determined why knowledge graphs alone don't fix search. The first approach with ready-made Yago yielded only +3 percentage points in isolation and…
AI-processed from Habr AI; edited by Hamidun News
Sber shared how they attempted to improve the quality of internal search using knowledge graphs and hybrid RAG. The first attempt with a ready-made universal graph had almost no effect, but switching to LightRAG and their own document corpus noticeably increased the accuracy of answers.
Why the graph didn't take off
Sber's services already relied on vector and hybrid search schemes, but the team ran up against typical limitations of such an approach. A single document has to be compressed into one vector, so nuances get lost; semantic similarity doesn't always mean the document actually answers the question; and multi-hop queries, where you need to traverse through multiple entities and documents, are handled poorly by ordinary vector search. This led to a hypothesis: if we add a knowledge graph as a separate source of context, the answers would become more accurate and robust.
To test it, they took OpenAI's SimpleQA (translated into Russian) on 4,326 factual questions and used llm-as-a-judge for automatic evaluation. The first prototype was built on Yago 4.5, one of the largest open knowledge graphs, which they loaded into Apache Jena Fuseki, and raised an API and agent on top of the data.
The pipeline was classical: entity extraction from the query, template-based query to the graph database, ranking of found nodes and edges, then answer summarization via LLM. On paper everything looked convincing, but the gains turned out to be weak.
13 experiments in a row
After initial measurements, the team set up a separate test bench and ran 13 experiments with 184 measurements. First they tested the pure graph, then a combination of the graph with ordinary search via a reranker, which collected a single top set of answer candidates. The basic conclusion was unpleasant: on ruSimpleQA the graph in isolation gave only +3 percentage points, and combined with the existing search, the result didn't exceed the margin of error.
- Added additional sources, including IMDB, but without complex data aggregation at the database level
- Tried ranking entities by relevance, for example by the number of connections at a node
- Changed the ranking limits to balance context completeness and its size
- Tested smart graph traversal to a depth of three levels and breadth-first search on one to two hops
- Added vector search via node embeddings and graph algorithms like pathfinding between entities
The problems were not only in the infrastructure, but also in the nature of the graph itself. For embeddings, they had to use short and sparse entity descriptions, which made vector search over the graph itself unstable. Yago turned out to be too universal: it covers the world well overall, but poorly reflects specific domains and relationships important to real user queries. Plus, each step in the agent chain added new errors—from entity extraction to final summarization.
The turn to LightRAG
After this, the team changed strategy: instead of a universal world graph, they decided to build a graph directly from their own documents. For this, they chose LightRAG—a GraphRAG framework with two-level search that combines local relationships between entities and a broader thematic overview. The system first extracts nodes and edges from text, then describes them, vectorizes the values, and stores the graph together with embeddings. This approach helps avoid losing context between chunks and doesn't force the LLM to blindly stitch together random pieces from different documents.
"Garbage in with high probability gives garbage out."
Sber indexed their document corpus through LightRAG, selecting documents that addressed questions the product search couldn't answer, and ran the benchmarks again. The effect was noticeable: LightRAG provided correct answers to 74% of several hundred previously uncovered questions and added 12 percentage points to accuracy on the full set of 4,326 queries. An additional plus—efficiency: the article states that LightRAG is roughly 30–40 times cheaper than Microsoft GraphRAG at the indexing stage with comparable quality. The next step is testing on production traffic and speeding up indexing, which currently bottlenecks at around 200 documents per hour even on H100.
What this means
Sber's story demonstrates a simple thing: a large knowledge graph by itself doesn't make search smarter. What matters much more is how tied the graph is to your domain, how it's connected to vector search, and on which real gaps you measure it. For teams building RAG search, this is a good signal not to chase the pretty demo with a public graph, but to invest in quality corpus, hybrid retrieval, and honest evaluation on real scenarios.
Want to stop reading about AI and start using it?
AI News is a curated feed of AI/tech news. Hamidun Academy teaches you to use AI systematically in your work.