Microsoft GraphRAG and Ollama: How Graph-Based RAG Performed on Local Models
A practical breakdown of Microsoft GraphRAG combined with Ollama and local LLMs has been released. The author built a knowledge graph from "Johnny Mnemonic,"…
AI-processed from Habr AI; edited by Hamidun News
A detailed analysis of Microsoft GraphRAG in conjunction with Ollama and local LLMs has emerged. The author tested whether it's possible to implement graph-based RAG without expensive infrastructure, and ran the system on William Gibson's "Johnny Mnemonic" to evaluate answer quality using familiar cyberpunk material.
How the test was set up
The experiment centered on a practical question: can the graph-based approach truly replace conventional vector search in corporate RAG systems. To verify this, the author chose Microsoft GraphRAG, local execution via Ollama, and a text of approximately 38,000 words. The result was not only an index for search, but also a full-fledged knowledge graph with entities, relationships, and communities.
Visualization through Gephi showed that the system can assemble a fairly rich structure from a single work of fiction. Importantly, GraphRAG forms not only literal connections between objects, but also thematic clusters. In the reports, communities were grouped around the Yakuza, Johnny, Molly Millions, and other key plot elements.
However, typical limitations also emerged: entities don't always merge if names differ in form, so some duplicates have to be accounted for separately. For answers in Russian, the author adjusted system prompts, although questions are recommended to be asked in English, otherwise accuracy drops.
"In short, it works even on 4B models, albeit imperfectly."
How the system responds
The test compared several modes. Global search works on community descriptions using MapReduce logic and is better suited for questions about the entire corpus. Local search mixes relationships from the graph with fragments of the original text and proves more useful when analyzing a specific character, object, or episode. There is also BASIC—ordinary chunk-based search—and DRIFT, a more compute-intensive mode that resembles query expansion and attempts to expand context.
- Global search collected the main cyberpunk themes of the story: fusion of technology and biology, dystopian city, corporate conflicts, and technological inequality.
- Local search provided a more detailed answer about the character Jones and his connections with Johnny, Molly, and the Yakuza.
- DRIFT search on the same question took about forty minutes and did not yield a noticeable quality jump compared to local mode.
- BASIC remains a useful control point because vector search doesn't disappear inside GraphRAG.
From this the author draws an important practical conclusion: in a real product, a separate agent or router would be needed to select the search type based on the question's formulation and request history. Otherwise, all modes would have to be switched manually. Another detail—GraphRAG answers reference human_readable_id from parquet files, so for a user interface these links need to be additionally unwrapped and processed. This transforms GraphRAG from simply a search tool into a wrapper that must be adapted to real user scenarios.
Where problems occurred
With local models, the picture proved uneven. Mistral 7B from the examples found couldn't handle global search due to structured JSON output issues: map queries simply fail. Gemma 3 in versions 4B and 12B preserved the main entities but simplified the graph and distorted facts in places, to the point that Jones became a human instead of a dolphin.
The most workable option, according to the author, was Qwen3 14B. For embeddings, the user-bge-m3 model was used, which performs well in both Russian and English. There are many infrastructure nuances too.
GraphRAG relies on LiteLLM, and the author specifically warns against upgrading beyond version 1.82.6, because 1.
82.7 and 1.82.
8 were compromised. In combination with Ollama, a false 404 error occurs when requesting model parameters, and long calls can hit queue timeouts. Embeddings behave even worse: bge-m3 through Ollama sometimes crashes due to Inf and NaN serialization, so embedding had to be moved to a separate HuggingFace proxy.
Plus you need to manually edit settings.yaml, set api_base, vector size 1024 and enable graphml for visualization. Even on a machine with 16 GB of GPU, indexing text of this size takes more than an hour.
What this means
The main takeaway of the article is that Microsoft GraphRAG doesn't look like a direct replacement for conventional vector RAG. It's rather useful where the depth of semantic relationships matters more than response speed: in analytics, expert systems, and complex document collections. At the same time, the approach already has an API, a test application, and a clear path to MVP. But for more relevant answers, you have to pay with indexing time, pipeline fragility, and noticeably more complex configuration compared to a regular vector database.
Want to stop reading about AI and start using it?
AI News is a curated feed of AI/tech news. Hamidun Academy teaches you to use AI systematically in your work.