Knowledge Graphs and LLMs in Action you own this product

Build AI systems using connected data

Alessandro Negro with Vlastimil Kus, Giuseppe Futia and Fabio Montagna
Forewords by Maxime Labonne, Khalifeh AlJadda

October 2025
ISBN 9781633439894
472 pages

Included with a Manning Online subscription

printed in black & white

available in Russian, Simplified Chinese

catalog / Data Science / Machine Learning / Knowledge Graphs

resources: Source code Supplememental material Book forum Source code on Github Register your pBook for a free eBook

table of content

Part 1 Foundations of hybrid intelligent systems

1 Knowledge graphs and LLMs: A killer combination

1.1 Knowledge graphs

1.2 Large language models

1.3 KGs and LLMs: Stronger together

1.4 The paradigm shift in data-driven applications

1.4.1 The four pillars of knowledge graphs

1.5 Building data-driven applications using KGs and LLMs

1.5.1 Example use case: Drug discovery and development

1.5.2 Example use case: Conversational AI for customer support

1.5.3 Deciding whether to use a KG

1.6 Knowledge graph technologies

1.6.1 Taxonomies and ontologies

1.7 How do we teach KGs and LLMs?

2 Intelligent systems: A hybrid approach

2.1 What is intelligence?

2.2 Designing an intelligent system

2.2.1 What is an intelligent system?

2.2.2 Categories of intelligent systems

2.2.3 Characteristics of an intelligent system

2.3 Knowledge acquisition and representation

2.4 Reasoning

2.5 Reasoning engines

2.5.1 Limitations of a pure deductive reasoning engine

2.5.2 Using inductive reasoning and ML

2.5.3 The role of LLMs in the reasoning engine

2.6 A KG approach to IASs

Part 2 Building knowledge graphs from structured data sources

3 Create your first knowledge graph from ontologies

3.1 Knowledge graph building: Warmup

3.1.1 Business and domain understanding

3.1.2 Data understanding

3.2 Understanding knowledge graph technologies

3.2.1 RDF or LPG? A goal-driven discussion

3.2.2 Representing edge properties with RDF and LPG

3.3 Building a knowledge graph

3.3.1 Ontology ingestion and processing with neosemantics

3.3.2 Annotation ingestion and processing

3.4 Querying the data

3.5 Reasoning over the KG

4 From simple networks to multisource integration

4.1 Biomedical knowledge graphs and applications

4.2 Multi-omic applications of KGs

4.2.1 Creating a KG from the PPI and protein-disease networks

4.2.2 High-level analysis of the resulting KGs

4.2.3 Domain-specific analysis of the PPI and disease KG

4.3 Pharmaceutical applications of KGs

4.3.1 Deep analysis of the Hetionet knowledge graph

4.3.2 LLM-assisted interpretation of pathway analysis results

4.4 Clinical applications of KGs

4.4.1 LLM-guided clinical decision support analysis

Part 3 Building knowledge graphs from text

5 Extracting domain-specific knowledge from unstructured data

5.1 The archives challenge

5.2 Key concepts of knowledge extraction

5.2.1 Recognizing named entities

5.2.2 Extracting relations

5.3 Building KGs with large language models

5.3.1 Using LLMs

5.3.2 Prompt engineering examples

5.3.3 Prompt engineering guidelines

5.3.4 KG building: Traditional NLP or LLMs?

6 Building knowledge graphs with large language models

6.1 Transforming an archive to a KG

6.1.1 Graph modeling

6.1.2 Creating a metagraph

6.1.3 Normalization and cleansing

6.1.4 Graph-based entity resolution

6.2 Intellectual network analysis: The value of graphs

6.3 Next steps in the Rockefeller Archive Center project

6.4 The value of knowledge graphs in the LLM era

7 Named entity disambiguation

7.1 From recognition to disambiguation

7.2 Understanding named entity disambiguation

7.3 Domain-based NED and LLMs

7.4 Business and domain understanding

7.4.1 Context

7.4.2 Use case definition

7.5 Understanding the data

7.5.1 Unstructured data

7.5.2 Domain ontologies

7.6 Building a SoHO knowledge graph

7.6.1 Defining the schema

7.6.2 Processing and ingesting documents

7.6.3 Disambiguating and ingesting medical entities

7.6.4 Processing, loading, and mapping ontologies

7.6.5 Generating entity co-occurrences

7.7 KG-based use cases

7.7.1 Conceptual search

7.7.2 Structured knowledge-based search

7.7.3 KG-based interpretability and discovery

7.7.4 Uncovering new knowledge

8 NED with open LLMs and domain ontologies

8.1 Understanding limitations of traditional NED systems

8.2 Ingesting the domain ontology

8.3 Setting up the model with Ollama and Llama 3.1 8B

8.4 End-to-end NED process

8.4.1 Named entity recognition

8.4.2 Candidate selection

8.4.3 Candidate disambiguation

8.5 Conclusions

Part 4 Machine learning on knowledge graphs

9 Machine learning on knowledge graphs: A primer approach

9.1 Machine learning on graphs: Why?

9.2 Machine learning on graphs: What?

9.2.1 Node classification

9.2.2 Link prediction (a.k.a. relationship prediction)

9.2.3 Clustering and community detection

9.2.4 Graph classification

9.3 Machine learning on graphs: How?

9.3.1 Node classification and link prediction

9.3.2 Graph classification

9.3.3 Graph clustering

10 Graph feature engineering: Manual and semiautomated approaches

10.1 Manual node features

10.1.1 Degree

10.1.2 Triangles

10.1.3 Density

10.1.4 Geodesic (or shortest) path

10.1.5 Closeness

10.1.6 Betweenness

10.1.7 PageRank

10.1.8 Prediction

10.2 Manual relationship features

10.2.1 Node-based representation

10.2.2 Path-based features

10.3 Semiautomated feature extraction

10.3.1 Performing ReFeX manually

10.3.2 Performing ReFeX automatically with code

11 Graph representation learning and graph neural networks

11.1 Embeddings in graph representation learning

11.1.1 Understanding graph embeddings: From discrete to continuous

11.1.2 Real-world applications and examples

11.2 The encoder–decoder model

11.2.1 The encoder: Converting graph structure to vectors

11.2.2 The decoder: Reconstructing graph properties

11.2.3 The power of the framework

11.2.4 Node2Vec: An example of an encoder–decoder framework

11.3 Shallow embeddings: A first approach to graph representation

11.3.1 Understanding shallow embeddings

11.3.2 Limitations of shallow embeddings

11.4 Embeddings in knowledge graphs

11.4.1 Loss function

11.4.2 Multirelationship decoder

11.5 Message passing and graph neural networks

11.5.1 The message-passing framework: A neural conversation

11.5.2 Motivation and intuition: Why message passing works

11.5.3 The basic GNN model

11.5.4 Message passing with self-loops

11.6 Generalized aggregation and update methods

11.6.1 Neighborhood normalization

11.6.2 Neighborhood attention

11.6.3 Multihead attention and transformer connections

11.6.4 Generalized update methods

11.7 The synergy of GNNs and LLMs

12 Node classification and link prediction with GNNs

12.1 Node classification for anti-money laundering applications

12.1.1 Input data

12.1.2 Graph processor: Data preparation

12.1.3 Graph processor: Homogeneous PyG graph

12.1.4 Encoder–decoder architecture

12.1.5 Evaluation and analysis

12.2 Link prediction for movie recommendations

12.2.1 Input data

12.2.2 Graph processor: Data preparation

12.2.3 Graph processor: Heterogeneous PyG graph

12.2.4 Encoder–decoder architecture

12.2.5 Evaluation and analysis

Part 5 Information retrieval with knowledge graphs and LLMs

13 Knowledge graph–powered retrieval-augmented generation

13.1 AI agents

13.2 Chatting with the LLM

13.3 Challenges in the production environment

13.4 Chatting with the AI about private data

13.4.1 Retrieval-augmented generation

13.4.2 Vector-based RAG limitations

13.4.3 Graph RAG

13.4.4 Reasoning agents

13.4.5 Let’s chat with our KG

14 Asking a KG questions with natural language

14.1 Querying a knowledge graph in the policing domain

14.1.1 Enabling domain experts with knowledge graphs

14.2 RAG for KG querying: Capabilities and challenges

14.2.1 RAG effectiveness with complete context

14.2.2 RAG fragility with incomplete retrieval

14.3 Schema-based approach for querying KGs

14.3.1 Understanding and using graph schemas

14.4 Think like an expert: Using metadata for enhanced querying

14.5 Intent detection: Understanding user expectations

14.5.1 Classifying by visualization type

14.5.2 Is it data, documentation, or just complaining?

14.6 From schema to LLM-ready context

14.6.1 Schema extraction and representation

14.6.2 Enriching schemas with descriptive annotations

14.6.3 A practical approach to schema representation

14.7 It’s time to think: Understanding LLM reasoning

14.7.1 The order matters: Answer first vs. reasoning first

14.7.2 Thinking in queries: From text to Cypher

14.7.3 Structuring output for reliable query generation

14.8 Response summarization: From results to insights

15 Building a QA agent with LangGraph

15.1 Building the LangGraph pipeline

15.1.1 System architecture overview

15.1.2 Configuring pipeline components

15.1.3 Schema translation service

15.1.4 State management design

15.1.5 Pipeline agent implementation

15.1.6 Pipeline integration layer

15.2 Streamlit application

15.2.1 Application overview

15.2.2 LangGraph integration

15.3 Expert-emulating investigation

15.3.1 Identifying the initial case

15.3.2 Spatial analysis of surveillance coverage

15.3.3 Vehicle pattern detection

15.3.4 Context-aware request refinement

15.3.5 Historical record analysis

15.4 Future directions and enhancements

15.4.1 Learning from use

15.4.2 Enhancing core capabilities

15.4.3 Advanced evolution paths

Appendixes

Appendix A: Introduction to graphs

A.1 What is a graph?

A.2 Graphs as models of networks

A.3 Representing graphs

Appendix B: Neo4j

B.1 Introduction to Neo4j

B.2 Installing Neo4j

B.2.1 Installing a Neo4j server

B.2.2 Neo4j Desktop installation

B.3 Cypher

B.4 Installing plugins

B.4.1 Installing APOC Core

B.4.2 GDS installation

B.5 Cleaning

Appendix C: Building knowledge graphs from structured sources

C.1 MicroRNA–disease association: Warmup

C.1.1 Key concepts

C.1.2 Business understanding

C.1.3 Data understanding

C.2 Building the miRNA knowledge graph

C.2.1 Importing known miRNA–disease connections

C.2.2 Importing the disease ontology

C.2.3 Importing miRNA information

C.3 Exploring and analyzing the miRNA KG

Appendix D: references

Overview

15 Building a QA agent with LangGraph

This chapter presents a practical, expert‑emulated question answering system over knowledge graphs that pairs large language models with LangGraph for orchestration and Streamlit for an interactive front end. The approach mirrors how human experts work: understanding schema context, planning steps, and constructing queries, while remaining observable at every stage. Users interact through a chat-like interface and receive real-time feedback as the pipeline progresses, with results rendered in the most suitable form—graphs, maps, tables—and complemented by concise, context-aware summaries.

The solution is structured as a modular, state-driven pipeline in LangGraph, where each node is a specialized agent that reads and writes to a shared state. A Configuration Provider centralizes prompts, examples, and domain notes, while a Schema Provider converts Neo4j’s technical schema into an LLM-friendly conceptual view via filtering and enrichment. The agents implement the end-to-end flow: intent detection, schema extraction, text-to-Cypher generation (enriched by annotations and the user’s current selection), query execution with robust error handling, dynamic routing for retries and summarization, and final summary generation. An integration layer exposes pipeline execution as a typed event stream, enabling frontends to track progress and outcomes cleanly; Streamlit’s MessageHistory manages a persistent conversation while placeholders surface live updates.

A hands-on investigation illustrates the system’s capabilities: starting from an active crime, the user locates nearby ANPR cameras, retrieves vehicles matching color and partial plate constraints on the incident date, and—with added investigative context—receives deeper summaries that spotlight suspicious temporal patterns. The analysis culminates by linking vehicles to owners with relevant criminal histories, demonstrating how spatial, temporal, and historical signals converge into actionable insights. Looking ahead, the chapter outlines evolution paths powered by observability—mining success and pain points to refine prompts and examples, enriching and layering schemas for scale, sharpening intent detection, and exploring fine‑tuned, knowledge‑graph‑aware components to boost accuracy and efficiency while preserving the transparent, expert‑emulated design.

Overview of the system architecture introduced in the previous chapter. We'll implement this using Streamlit to handle user input (questions and user selection) and output (visualization and summaries), while LangGraph will orchestrate the core pipeline.

State-based communication between agent functions in LangGraph. The diagram illustrates how agents remain decoupled while communicating through an evolving state object. Each agent function receives and updates the global state independently.

LangGraph implementation of the knowledge graph querying pipeline. The solid arrows show the main flow from intent detection through schema extraction and query execution, while dashed arrows indicate conditional paths based on query execution outcomes. This directed graph structure directly maps each component of our expert-emulated approach to a LangGraph agent function.

Backend architecture showing how the LangGraph pipeline integrates with supporting components. The Configuration Provider manages prompts and settings, while the Schema Provider handles database schema access. The Question Processing Interface bridges the core pipeline with frontend applications through an event-based API.

System architecture diagram highlighting the Configuration Provider component. The provider manages system configuration and prompt templates needed by LangGraph agents to process user questions.

System architecture diagram emphasizing the Schema Provider component, which connects to the graph database to extract and transform technical schema information into LLM-friendly formats.

LangGraph implementation of the knowledge expert-emulated graph querying pipeline.

Post-query execution routing logic (highlighted) in the QA pipeline, showing decision paths for retry, summarization, and direct completion

Pipeline integration architecture showing the Question Processing Interface mediating between LangGraph state updates and frontend interactions

Application interface layout demonstrating a question answering system with selection capabilities, interactive graph visualization, and real-time response tracking

Focused schema visualization showing how Crime, ANPRCamera, CameraEvent, Vehicle, and Person nodes interconnect for investigative queries.

Initial query response showing a crime node currently under investigation. The interface displays the current selection, the node's detailed properties in the selection panel (left), the node visualization in the canvas (center), and the query processing details in the chat interface (right).

Spatial query response showing an ANPR camera near the crime location. The system automatically chose a map visualization to display the spatial relationship between the crime and the nearby ANPR camera

Vehicle query results showing matching vehicles and their detection events. Each path represents a complete vehicle detection record, with timestamps visible on the event nodes. The system's response includes both the graph visualization and a detailed summary of each vehicle's properties.

Enhanced analysis showing the same vehicle data with investigative context. The system augments its response with an analysis section that identifies patterns of interest, demonstrating how additional context leads to more insightful summarization of the same underlying data.

Final investigative insight revealing criminal history. The graph expands to show that a vehicle owner has connections to multiple prior crimes, including a previous criminal trespass. The summary provides a detailed breakdown of the prior offenses, demonstrating the system's ability to integrate temporal, spatial, and historical evidence into a cohesive investigative narrative.

Reference

Bhatia, K., Narayan, A., De Sa, C., & Ré, C. (2023). TART: A plug-and-play Transformer module for task-agnostic reasoning. arXiv preprint.
Lester, B., Al-Rfou, R., & Constant, N. (2021). The Power of Scale for Parameter-Efficient Prompt Tuning. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (pp. 3045-3059). Association for Computational Linguistics.

FAQ

What is LangGraph and why is it a good fit for building a knowledge-graph QA pipeline?

LangGraph is a framework for stateful, multi-actor applications powered by LLMs. It uses a shared state (a global “whiteboard”) for agent communication and a directed graph of nodes (agents) with dynamic edges for flow control. This makes it ideal for orchestrating complex, reasoning-heavy pipelines like intent detection, schema extraction, query generation, and summarization.

How does the expert-emulated approach map to a LangGraph workflow?

Each expert step becomes a node (agent function) that reads and updates a shared AgentState. Edges define the standard sequence (intent → schema → text-to-Cypher → execution), while conditional edges route based on runtime outcomes (e.g., retry on error, summarize for graph/map, end for tables).

What information is stored in the AgentState and how is it used?

AgentState tracks the full pipeline context: question, output_type and reasoning, LLM-friendly schema, generated Cypher and its reasoning, execution errors, retries and error info, plus summary text and analysis flags. Agents read and write these fields to coordinate work and enable routing decisions.

What do the Configuration Provider and Schema Provider do?

The Configuration Provider centralizes prompts, notes, and examples (via Jinja2 templates) to keep logic clean and tunable. The Schema Provider extracts the technical schema (apoc.meta.schema), filters non-business elements via a skip list, and enriches what remains with business descriptions to produce an LLM-friendly conceptual schema.

How does the intent detection agent work?

It takes the user’s question, runs an intent detection prompt, and returns the output_type (table, graph, or map) plus the reasoning. This informs both visualization choices and downstream handling (e.g., whether summarization is needed).

How is natural language converted to Cypher with context awareness?

The text-to-Cypher agent merges the current state with configuration annotations and the user’s current selection from the graph UI. It prompts the LLM to generate Cypher and returns the query, the reasoning, and the raw response for debugging—allowing users to reference selected nodes naturally (e.g., “the selected crime”).

How are queries executed and errors handled?

The execution agent runs the Cypher, formats results based on output_type (list of records for graph/map, DataFrame for table), and captures any errors. It records errors and a detailed “information” message, increments a retry counter, and preserves successful results for visualization or summarization.

What is the post-execution routing logic?

It is a dynamic edge: if results_error exists and retries are below 3, it routes back to text-to-Cypher for retry; otherwise it ends. If successful, graph/map outputs route to summarization, while table outputs end directly.

How does the Streamlit integration provide real-time feedback?

A Question Processing Interface exposes the pipeline as a generator that streams events: “update” (progress), “result” (reasoning/errors/summary), and visualization payloads (graph/map/table). Streamlit placeholders show live updates, while a MessageHistory object persists the final conversation state when an END event arrives.

What future enhancements are suggested?

Improvements include learning from usage (collecting pain points and successful examples), schema enrichment and multi-layer schema views, more nuanced intent detection, and potentially fine-tuned, KG-aware agents to scale beyond in-context prompts and improve performance.

pro $24.99 per month

access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
choose one free eBook per month to keep
exclusive 50% discount on all purchases
renews monthly, pause or cancel renewal anytime

lite $19.99 per month

access to all Manning books, including MEAPs!

team

5, 10 or 20 seats+ for your team - learn more

eBook

pdf, ePub, online

$64.99 $42.24

you save $22.75 (35%)

include audio $24.99 $16.24

pro $24.99 per month

access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
choose one free eBook per month to keep
exclusive 50% discount on all purchases
renews monthly, pause or cancel renewal anytime

lite $19.99 per month

access to all Manning books, including MEAPs!

team

5, 10 or 20 seats+ for your team - learn more

eBook

$64.99 $42.24

you save $22.75 (35%)

include audio $24.99 $16.24

eBook

pdf, ePub, online

$64.99 $42.24

you save $22.75 (35%)

include audio $24.99 $16.24

pro $24.99 per month

access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
choose one free eBook per month to keep
exclusive 50% discount on all purchases
renews monthly, pause or cancel renewal anytime

lite $19.99 per month

access to all Manning books, including MEAPs!

team

5, 10 or 20 seats+ for your team - learn more