Knowledge Graph
A knowledge graph is a structured database representing real-world entities as nodes and their semantic relationships as labeled edges, enabling machines to traverse factual networks and support multi-hop reasoning queries.
A knowledge graph is a graph-structured database that encodes real-world entities as nodes and their semantic relationships as directed, labeled edges—typically stored as subject-predicate-object triples (for example, "Marie Curie — won — Nobel Prize in Physics"). Unlike relational databases, which enforce rigid schemas, knowledge graphs accommodate highly heterogeneous, interconnected information and support multi-hop reasoning: traversing several relationship edges to answer questions that no single record contains.
Knowledge graphs are built by extracting entities and relations from text corpora, integrating structured data sources, and applying ontologies that define entity types and relationship constraints. Large-scale public examples include Wikidata (over 100 million items as of 2024), which underpins Wikipedia's structured data layer, and the Google Knowledge Graph, which powers information panels in Google Search. Specialized domain graphs include SNOMED CT and UMLS in healthcare; large financial institutions maintain graphs linking companies, executives, transactions, and regulatory filings.
In AI pipelines, knowledge graphs serve as a grounding layer for factual queries, provide structured context for question answering, and enable explainable reasoning paths where the chain of traversed relationships is auditable. Microsoft's GraphRAG framework (2024) demonstrated that pre-building a knowledge graph from a document corpus and querying subgraphs at retrieval time outperforms standard vector RAG on questions that require synthesizing information across many documents—particularly community-level and thematic questions.
As of 2026, combining knowledge graphs with LLMs is an active product and research area. LLMs are used to construct graphs from unstructured text and to translate natural language questions into graph query languages such as Cypher (Neo4j) or SPARQL. Graph neural networks further enable embedding-based reasoning over graph structure. Production infrastructure is provided by Neo4j, Amazon Neptune, and Nebula Graph, while open-source projects such as Apache TinkerPop support lighter-weight deployments.