Essential GraphRAG you own this product

Knowledge Graph-Enhanced RAG

Tomaž Bratanič and Oskar Hane
Foreword by Paco Nathan

July 2025
ISBN 9781633436268
176 pages

Included with a Manning Online subscription

printed in black & white

available in Russian, Simplified Chinese

catalog / Data Science / AI

table of content

1 Improving LLM accuracy

1.1 Introduction to LLMs

1.2 Limitations of LLMs

1.2.1 Knowledge cutoff problem

1.2.2 Outdated information

1.2.3 Pure hallucinations

1.2.4 Lack of private information

1.3 Overcoming the limitations of LLMs

1.3.1 Supervised finetuning

1.3.2 Retrieval-augmented generation

1.4 Knowledge graphs as the data storage for RAG applications

2 Vector similarity search and hybrid search

2.1 Components of a RAG architecture

2.1.1 The retriever

2.1.2 The generator

2.2 RAG using vector similarity search

2.2.1 Application data setup

2.2.2 The text corpus

2.2.3 Text chunking

2.2.4 Embedding model

2.2.5 Database with vector similarity search function

2.2.6 Performing vector search

2.2.7 Generating an answer using an LLM

2.3 Adding full-text search to the RAG application to enable hybrid search

2.3.1 Full-text search index

2.3.2 Performing hybrid search

2.4 Concluding thoughts

3 Advanced vector retrieval strategies

3.1 Step-back prompting

3.2 Parent document retriever

3.2.1 Retrieving parent document strategy data

3.3 Complete RAG pipeline

4 Generating Cypher queries from natural language questions

4.1 The basics of query language generation

4.2 Where query language generation fits in the RAG pipeline

4.3 Useful practices for query language generation

4.3.1 Using few-shot examples for in-context learning

4.3.2 Using database schema in the prompt to show the LLM the structure of the knowledge graph

4.3.3 Adding terminology mapping to semantically map the user question to the schema

4.3.4 Format instructions

4.4 Implementing a text2cypher generator using a base model

4.5 Specialized (finetuned) LLMs for text2cypher

4.6 What we’ve learned and what text2cypher enables

5 Agentic RAG

5.1 What is agentic RAG?

5.1.1 Retriever agents

5.1.2 The retriever router

5.1.3 Answer critic

5.2 Why do we need agentic RAG?

5.3 How to implement agentic RAG

5.3.1 Implementing retriever tools

5.3.2 Implementing the retriever router

5.3.3 Implementing the answer critic

5.3.4 Tying it all together

6 Constructing knowledge graphs with LLMs

6.1 Extracting structured data from text

6.1.1 Structured Outputs model definition

6.1.2 Structured Outputs extraction request

6.1.3 CUAD dataset

6.2 Constructing the graph

6.2.1 Data import

6.2.2 Entity resolution

6.2.3 Adding unstructured data to the graph

7 Microsoft’s GraphRAG implementation

7.1 Dataset selection

7.2 Graph indexing

7.2.1 Chunking

7.2.2 Entity and relationship extraction

7.2.3 Entity and relationship summarization

7.2.4 Community detection and summarization

7.3 Graph retrievers

7.3.1 Global search

7.3.2 Local search

8 RAG application evaluation

8.1 Designing the benchmark dataset

8.1.1 Coming up with test examples

8.2 Evaluation

8.2.1 Context recall

8.2.2 Faithfulness

8.2.3 Answer correctness

8.2.4 Loading the dataset

8.2.5 Running evaluation

8.2.6 Observations

8.3 Next steps

Appendix

Appendix A: The Neo4j environment

A.1 Cypher query language

A.2 Neo4j installation

A.2.1 Neo4j Desktop installation

A.2.2 Neo4j Docker installation

A.2.3 Neo4j Aura

A.3 Neo4j Browser configuration

A.4 Movies dataset

A.4.1 Loading via the Neo4j Query Guide

A.4.2 Trying the online version

A.4.3 Loading via Cypher

Appendix B: references

Overview

1 Improving LLM accuracy

Large language models have transformed how we generate and interact with text, combining broad world knowledge with strong instruction-following to handle varied tasks. Yet they remain constrained by knowledge cutoffs, drifting into outdated answers, and producing confident but incorrect hallucinations. They also lack access to proprietary data and do not store facts explicitly, instead relying on learned statistical patterns. This chapter frames these accuracy gaps as the central challenge for building dependable, up-to-date AI assistants and sets the stage for practical remedies aimed at precision, reliability, and explainability.

Two strategies to inject domain knowledge are examined: supervised finetuning and retrieval-augmented generation (RAG). While finetuning can improve behavior, it is resource-intensive, operationally complex, and shows mixed results for reliably teaching new facts at production scale. RAG, by contrast, retrieves trusted information at runtime and feeds it to the model in context, shifting the model’s role from open-ended recall to task-oriented synthesis grounded in evidence. In practice, the system handles retrieval behind the scenes, enriching prompts with relevant passages or records so the model can generate answers that are more current, specific, and verifiable, with fewer hallucinations.

The chapter argues that knowledge graphs are an ideal backbone for RAG because they semantically connect structured and unstructured data. Graphs enable precise operations—filtering, counting, aggregations—while linking entity mentions in text to authoritative records for rich, contextual answers. This fusion supports both crisp, data-driven queries and open-ended explanations, improving traceability and trust. Readers are guided toward building GraphRAG pipelines—augmenting existing systems or creating new ones—gaining practical skills in data modeling, graph construction, retrieval workflows, and evaluation to deliver robust, accurate, and explainable results across domains.

LLMs are trained to predict the next word

Writing a haiku with ChatGPT

Retrieving factual information from ChatGPT

Neural network trained to predict the next word based on the input sequence of words

Example of a knowledge cutoff date prompt

Sometimes ChatGPT responds with outdated information

ChatGPT can produce responses with incorrect information

ChatGPT didn’t have access to some private or confidential information during training

Sample record of a supervised finetuning dataset

Providing relevant information to the LLM as part of the input

Providing relevant information to the answer as part of the prompt

Populating the relevant data from the user and knowledge base into the prompt template and then passing it to an LLM to generate the final answer

ChatGPT uses Web Search to find relevant information to generate an up-to-date answer

Knowledge graph can store complex structured and unstructured data in a single database system

Summary

Large Language Models (LLMs), such as ChatGPT, are built on transformer architecture, enabling them to process and generate text efficiently by learning patterns from extensive textual data.
While LLMs exhibit remarkable abilities in natural language understanding and generation, they have inherent limitations, such as a knowledge cutoff, potential to generate outdated or hallucinated information, and inability to access private or domain-specific knowledge.
Continuous fine-tuning of LLMs to enhance their internal knowledge base is not practical due to resource constraints and the complexity of updating the models regularly.
Retrieval-Augmented Generation (RAG) addresses LLM limitations by combining them with external knowledge bases, providing accurate, context-rich responses by injecting relevant facts directly into the input prompt.
RAG implementations have traditionally focused on unstructured data sources, limiting their scope and effectiveness for tasks requiring structured, precise, and interconnected information.
Knowledge Graphs (KGs) use nodes and relationships to represent and connect entities and concepts, integrating structured and unstructured data to provide a holistic data representation.
Integrating Knowledge Graphs into RAG workflows enhances their capability to retrieve and organize contextually relevant data, allowing LLMs to generate accurate, reliable, and explainable responses.

FAQ

What problem is this chapter trying to solve?

This chapter focuses on improving the factual accuracy and timeliness of Large Language Model (LLM) answers. It explains why LLMs alone can be inaccurate or outdated and introduces Retrieval-Augmented Generation (RAG) and Knowledge Graphs to reduce hallucinations, incorporate up-to-date and private data, and produce reliable, explainable results.

How do Large Language Models (LLMs) work at a high level?

LLMs are transformer-based neural networks trained to predict the next token in text. They learn patterns, grammar, and some reasoning from vast corpora and can follow instructions to generate useful outputs. They do not store explicit facts; instead, they encode statistical associations in learned weights and generate answers from those representations.

What are the key limitations of LLMs covered here?

- Knowledge cutoff: models don’t know about events after their last training date (e.g., post–Oct 2023).
- Outdated information: they may confidently state facts that have since changed.
- Hallucinations: they can produce plausible-sounding but false details (e.g., bogus IDs or citations).
- Lack of private data: they can’t answer about proprietary/internal information not seen during training.

Why do LLMs hallucinate?

LLMs are probabilistic language models, not fact databases or reasoning engines. They generate the most likely next tokens based on patterns, which can yield confident but incorrect specifics (such as URLs, IDs, or citations), even for pre-cutoff topics.

Why isn’t continuous fine-tuning a complete fix for LLM limitations?

Pretraining is too costly to run continuously, and supervised finetuning (SFT) shows mixed results for reliably teaching new facts. LLM training typically includes: (1) Pretraining, (2) Supervised Finetuning, (3) Reward Modeling, and (4) Reinforcement Learning. While SFT can help, deploying a consistently accurate, finetuned model for factual updates remains challenging in production.

What is Retrieval-Augmented Generation (RAG) and how does it work?

RAG pairs an LLM with an external knowledge base so the model can ground answers in retrieved facts at runtime. It has two stages: (1) Retrieval: fetch relevant documents/data; (2) Augmented Generation: combine the retrieved context with the user’s question in the prompt so the LLM produces a context-grounded answer. RAG reduces (but does not eliminate) hallucinations.

When should I prefer RAG over finetuning?

Use RAG when you need current information, access to private or domain-specific data, explainability, or quick updates without retraining. Finetuning can improve style or task adherence, but RAG is the simpler, more reliable way to inject fresh, verifiable knowledge into answers.

What are Knowledge Graphs and why add them to RAG?

Knowledge Graphs (KGs) are structured representations of entities, attributes, and relationships. In RAG, they: (1) bridge structured and unstructured data, (2) enable precise, explainable retrieval, and (3) connect text passages to concrete entities and relations—for example, linking a customer’s query to their exact order, or a drug question to clinical guidelines and patient history.

What kinds of questions need structured data vs. unstructured text?

- Structured data (ideal for precise queries): filtering, counting, aggregations (e.g., “How many tasks are completed?”, “Who reports to whom?”, “Who is the CEO?”).
- Unstructured text (ideal for open-ended context): explanations, narratives, and nuanced descriptions (e.g., article content about a new model).
Combining both in a KG lets you answer a broader range of questions accurately and with context.

Who is this book for and what will I learn?

It’s for developers, researchers, and data practitioners building robust, explainable, and capable RAG systems. You’ll learn to augment existing RAG with Knowledge Graphs and build GraphRAG pipelines from scratch, covering data modeling, graph construction, retrieval workflows, and system evaluation to deliver accurate, reliable, and explainable results.

pro $24.99 per month

access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
choose one free eBook per month to keep
exclusive 50% discount on all purchases
renews monthly, pause or cancel renewal anytime

lite $19.99 per month

access to all Manning books, including MEAPs!

team

5, 10 or 20 seats+ for your team - learn more

eBook

pdf, ePub, online

$39.99 $19.99

you save $20.00 (50%)

include audio $19.99 $9.99

pro $24.99 per month

access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
choose one free eBook per month to keep
exclusive 50% discount on all purchases
renews monthly, pause or cancel renewal anytime

lite $19.99 per month

access to all Manning books, including MEAPs!

team

5, 10 or 20 seats+ for your team - learn more

eBook

$39.99 $19.99

you save $20.00 (50%)

include audio $19.99 $9.99

eBook

pdf, ePub, online

$39.99 $19.99

you save $20.00 (50%)

include audio $19.99 $9.99

pro $24.99 per month

access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
choose one free eBook per month to keep
exclusive 50% discount on all purchases
renews monthly, pause or cancel renewal anytime

lite $19.99 per month

access to all Manning books, including MEAPs!

team

5, 10 or 20 seats+ for your team - learn more