Knowledge Graphs and LLMs in Action you own this product

Build AI systems using connected data

Alessandro Negro with Vlastimil Kus, Giuseppe Futia and Fabio Montagna
Forewords by Maxime Labonne, Khalifeh AlJadda

October 2025
ISBN 9781633439894
472 pages

Included with a Manning Online subscription

printed in black & white

available in Russian, Simplified Chinese

catalog / Data Science / Machine Learning / Knowledge Graphs

resources: Source code Supplememental material Book forum Source code on Github Register your pBook for a free eBook

table of content

Part 1 Foundations of hybrid intelligent systems

1 Knowledge graphs and LLMs: A killer combination

1.1 Knowledge graphs

1.2 Large language models

1.3 KGs and LLMs: Stronger together

1.4 The paradigm shift in data-driven applications

1.4.1 The four pillars of knowledge graphs

1.5 Building data-driven applications using KGs and LLMs

1.5.1 Example use case: Drug discovery and development

1.5.2 Example use case: Conversational AI for customer support

1.5.3 Deciding whether to use a KG

1.6 Knowledge graph technologies

1.6.1 Taxonomies and ontologies

1.7 How do we teach KGs and LLMs?

2 Intelligent systems: A hybrid approach

2.1 What is intelligence?

2.2 Designing an intelligent system

2.2.1 What is an intelligent system?

2.2.2 Categories of intelligent systems

2.2.3 Characteristics of an intelligent system

2.3 Knowledge acquisition and representation

2.4 Reasoning

2.5 Reasoning engines

2.5.1 Limitations of a pure deductive reasoning engine

2.5.2 Using inductive reasoning and ML

2.5.3 The role of LLMs in the reasoning engine

2.6 A KG approach to IASs

Part 2 Building knowledge graphs from structured data sources

3 Create your first knowledge graph from ontologies

3.1 Knowledge graph building: Warmup

3.1.1 Business and domain understanding

3.1.2 Data understanding

3.2 Understanding knowledge graph technologies

3.2.1 RDF or LPG? A goal-driven discussion

3.2.2 Representing edge properties with RDF and LPG

3.3 Building a knowledge graph

3.3.1 Ontology ingestion and processing with neosemantics

3.3.2 Annotation ingestion and processing

3.4 Querying the data

3.5 Reasoning over the KG

4 From simple networks to multisource integration

4.1 Biomedical knowledge graphs and applications

4.2 Multi-omic applications of KGs

4.2.1 Creating a KG from the PPI and protein-disease networks

4.2.2 High-level analysis of the resulting KGs

4.2.3 Domain-specific analysis of the PPI and disease KG

4.3 Pharmaceutical applications of KGs

4.3.1 Deep analysis of the Hetionet knowledge graph

4.3.2 LLM-assisted interpretation of pathway analysis results

4.4 Clinical applications of KGs

4.4.1 LLM-guided clinical decision support analysis

Part 3 Building knowledge graphs from text

5 Extracting domain-specific knowledge from unstructured data

5.1 The archives challenge

5.2 Key concepts of knowledge extraction

5.2.1 Recognizing named entities

5.2.2 Extracting relations

5.3 Building KGs with large language models

5.3.1 Using LLMs

5.3.2 Prompt engineering examples

5.3.3 Prompt engineering guidelines

5.3.4 KG building: Traditional NLP or LLMs?

6 Building knowledge graphs with large language models

6.1 Transforming an archive to a KG

6.1.1 Graph modeling

6.1.2 Creating a metagraph

6.1.3 Normalization and cleansing

6.1.4 Graph-based entity resolution

6.2 Intellectual network analysis: The value of graphs

6.3 Next steps in the Rockefeller Archive Center project

6.4 The value of knowledge graphs in the LLM era

7 Named entity disambiguation

7.1 From recognition to disambiguation

7.2 Understanding named entity disambiguation

7.3 Domain-based NED and LLMs

7.4 Business and domain understanding

7.4.1 Context

7.4.2 Use case definition

7.5 Understanding the data

7.5.1 Unstructured data

7.5.2 Domain ontologies

7.6 Building a SoHO knowledge graph

7.6.1 Defining the schema

7.6.2 Processing and ingesting documents

7.6.3 Disambiguating and ingesting medical entities

7.6.4 Processing, loading, and mapping ontologies

7.6.5 Generating entity co-occurrences

7.7 KG-based use cases

7.7.1 Conceptual search

7.7.2 Structured knowledge-based search

7.7.3 KG-based interpretability and discovery

7.7.4 Uncovering new knowledge

8 NED with open LLMs and domain ontologies

8.1 Understanding limitations of traditional NED systems

8.2 Ingesting the domain ontology

8.3 Setting up the model with Ollama and Llama 3.1 8B

8.4 End-to-end NED process

8.4.1 Named entity recognition

8.4.2 Candidate selection

8.4.3 Candidate disambiguation

8.5 Conclusions

Part 4 Machine learning on knowledge graphs

9 Machine learning on knowledge graphs: A primer approach

9.1 Machine learning on graphs: Why?

9.2 Machine learning on graphs: What?

9.2.1 Node classification

9.2.2 Link prediction (a.k.a. relationship prediction)

9.2.3 Clustering and community detection

9.2.4 Graph classification

9.3 Machine learning on graphs: How?

9.3.1 Node classification and link prediction

9.3.2 Graph classification

9.3.3 Graph clustering

10 Graph feature engineering: Manual and semiautomated approaches

10.1 Manual node features

10.1.1 Degree

10.1.2 Triangles

10.1.3 Density

10.1.4 Geodesic (or shortest) path

10.1.5 Closeness

10.1.6 Betweenness

10.1.7 PageRank

10.1.8 Prediction

10.2 Manual relationship features

10.2.1 Node-based representation

10.2.2 Path-based features

10.3 Semiautomated feature extraction

10.3.1 Performing ReFeX manually

10.3.2 Performing ReFeX automatically with code

11 Graph representation learning and graph neural networks

11.1 Embeddings in graph representation learning

11.1.1 Understanding graph embeddings: From discrete to continuous

11.1.2 Real-world applications and examples

11.2 The encoder–decoder model

11.2.1 The encoder: Converting graph structure to vectors

11.2.2 The decoder: Reconstructing graph properties

11.2.3 The power of the framework

11.2.4 Node2Vec: An example of an encoder–decoder framework

11.3 Shallow embeddings: A first approach to graph representation

11.3.1 Understanding shallow embeddings

11.3.2 Limitations of shallow embeddings

11.4 Embeddings in knowledge graphs

11.4.1 Loss function

11.4.2 Multirelationship decoder

11.5 Message passing and graph neural networks

11.5.1 The message-passing framework: A neural conversation

11.5.2 Motivation and intuition: Why message passing works

11.5.3 The basic GNN model

11.5.4 Message passing with self-loops

11.6 Generalized aggregation and update methods

11.6.1 Neighborhood normalization

11.6.2 Neighborhood attention

11.6.3 Multihead attention and transformer connections

11.6.4 Generalized update methods

11.7 The synergy of GNNs and LLMs

12 Node classification and link prediction with GNNs

12.1 Node classification for anti-money laundering applications

12.1.1 Input data

12.1.2 Graph processor: Data preparation

12.1.3 Graph processor: Homogeneous PyG graph

12.1.4 Encoder–decoder architecture

12.1.5 Evaluation and analysis

12.2 Link prediction for movie recommendations

12.2.1 Input data

12.2.2 Graph processor: Data preparation

12.2.3 Graph processor: Heterogeneous PyG graph

12.2.4 Encoder–decoder architecture

12.2.5 Evaluation and analysis

Part 5 Information retrieval with knowledge graphs and LLMs

13 Knowledge graph–powered retrieval-augmented generation

13.1 AI agents

13.2 Chatting with the LLM

13.3 Challenges in the production environment

13.4 Chatting with the AI about private data

13.4.1 Retrieval-augmented generation

13.4.2 Vector-based RAG limitations

13.4.3 Graph RAG

13.4.4 Reasoning agents

13.4.5 Let’s chat with our KG

14 Asking a KG questions with natural language

14.1 Querying a knowledge graph in the policing domain

14.1.1 Enabling domain experts with knowledge graphs

14.2 RAG for KG querying: Capabilities and challenges

14.2.1 RAG effectiveness with complete context

14.2.2 RAG fragility with incomplete retrieval

14.3 Schema-based approach for querying KGs

14.3.1 Understanding and using graph schemas

14.4 Think like an expert: Using metadata for enhanced querying

14.5 Intent detection: Understanding user expectations

14.5.1 Classifying by visualization type

14.5.2 Is it data, documentation, or just complaining?

14.6 From schema to LLM-ready context

14.6.1 Schema extraction and representation

14.6.2 Enriching schemas with descriptive annotations

14.6.3 A practical approach to schema representation

14.7 It’s time to think: Understanding LLM reasoning

14.7.1 The order matters: Answer first vs. reasoning first

14.7.2 Thinking in queries: From text to Cypher

14.7.3 Structuring output for reliable query generation

14.8 Response summarization: From results to insights

15 Building a QA agent with LangGraph

15.1 Building the LangGraph pipeline

15.1.1 System architecture overview

15.1.2 Configuring pipeline components

15.1.3 Schema translation service

15.1.4 State management design

15.1.5 Pipeline agent implementation

15.1.6 Pipeline integration layer

15.2 Streamlit application

15.2.1 Application overview

15.2.2 LangGraph integration

15.3 Expert-emulating investigation

15.3.1 Identifying the initial case

15.3.2 Spatial analysis of surveillance coverage

15.3.3 Vehicle pattern detection

15.3.4 Context-aware request refinement

15.3.5 Historical record analysis

15.4 Future directions and enhancements

15.4.1 Learning from use

15.4.2 Enhancing core capabilities

15.4.3 Advanced evolution paths

Appendixes

Appendix A: Introduction to graphs

A.1 What is a graph?

A.2 Graphs as models of networks

A.3 Representing graphs

Appendix B: Neo4j

B.1 Introduction to Neo4j

B.2 Installing Neo4j

B.2.1 Installing a Neo4j server

B.2.2 Neo4j Desktop installation

B.3 Cypher

B.4 Installing plugins

B.4.1 Installing APOC Core

B.4.2 GDS installation

B.5 Cleaning

Appendix C: Building knowledge graphs from structured sources

C.1 MicroRNA–disease association: Warmup

C.1.1 Key concepts

C.1.2 Business understanding

C.1.3 Data understanding

C.2 Building the miRNA knowledge graph

C.2.1 Importing known miRNA–disease connections

C.2.2 Importing the disease ontology

C.2.3 Importing miRNA information

C.3 Exploring and analyzing the miRNA KG

Appendix D: references

Overview

14 Ask a KG with natural language

This chapter presents a practical blueprint for asking a knowledge graph with natural language by emulating expert behavior rather than relying solely on Retrieval-Augmented Generation. Using a law enforcement scenario, it motivates why domain experts need direct, intuitive access to KGs and sets out a framework with four pillars: detecting and routing user intent, extracting and enriching schema/metadata for LLM use, applying expert-like reasoning to construct precise queries, and transforming raw results into concise, actionable summaries. The approach is integration-first, designed to feed a front-end that can render graphs, tables, charts, or maps, and to deliver clear textual explanations alongside visual outputs.

The chapter explains where RAG breaks down in complex, high-stakes settings: answers hinge on retrieval completeness and granularity, and missing or fragmented context can flip conclusions. It proposes a shift from “generating an answer” to “asking the right question,” translating user intent into schema-aware, constrained traversals that mirror how experts query graphs (for example, mapping natural language to specific nodes, relationships, and filters). To do this reliably, the system classifies intent both by desired presentation (graph/table/chart/map) and by request type (data-related, system documentation, schema questions, or feedback), using simple, debuggable prompts that can evolve from a single broad classifier to multi-stage routing as needs grow.

To enable expert emulation, the system converts a technical database schema into a concise, conceptual, LLM-ready representation and enriches it with annotations (terminology, codes, relationship semantics) managed via a YAML configuration for skipping noise and adding descriptions. Query generation is then guided by a structured, reasoning-first prompt: the model lists intended relationships, writes out its plan, and produces a Cypher statement that follows the annotated schema, optionally incorporating current user selections and execution error feedback to iterate. Finally, a summarization component turns graph results into focused narratives (and light analysis when requested), filtering out incidental data and complementing the visual interface. The result is a robust, domain-aware question-answering workflow that scales beyond policing to any field where expert reasoning over complex graph structures is essential.

RAG System Limitations. The diagram illustrates how a RAG system can produce incorrect answers when the retriever fails to identify critical documents.

Translation Process from Natural Language to Cypher Query. This diagram illustrates the step-by-step process of translating a natural language query ("Red Camaros spotted in this area at this time") into a formal Cypher query. The translation occurs through three main stages: (1) parsing the natural language request into semantic components, (2) mapping these components to schema elements (Vehicle, CameraEvent, and ANPRCamera nodes with their relationships), and (3) constructing a formal Cypher query with the appropriate traversal patterns and constraints. The diagram shows how domain concepts are systematically transformed into graph database operations, demonstrating the bridge between user intent and executable queries.

Overview of the system highlighting the main components: Intent Detection, Schema Extraction, Query Generation, Query Execution, Visualization and Summary generation. The system processes user questions and selection inputs, while supporting error feedback during query execution.

The Intent Detection component analyzes user inputs to determine how to appropriately handle and classify the user's question, representing the first critical step in the request processing.

Intent detection system architecture for data visualization requests, showing how user requests are mapped to appropriate visualization formats (graph, chart, table, or map).

Classification of system-related questions in the intent detection system, showing how requests are routed to either documentation (for system functionality and feature questions) or schema (for knowledge graph structure queries). The system also identifies feedback and issues as a separate category for user complaints or enhancement requests.

Schema processing pipeline, highlighting the transformation of database schema into LLM-compatible formats. The diagram illustrates how raw schema structures are processed through Schema Extraction to create representations that LLMs can effectively process.

The technical schema obtained through APOC call is filtered so it is reduced to the conceptual graph schema representation. The conceptual schema is then mapped into a textual format that can be effectively understood by LLMs.

Query Generation stage of the system architecture, highlighting its central role in converting user inputs and schema information into formal database queries.

Comparison of two prompt structures for the same path-finding task. Left: The Answer-First approach encourages quick, potentially biased responses with post-hoc justification. Right: The Reasoning-First approach promotes systematic analysis before reaching a conclusion. Note how the JSON structure in each prompt guides the LLM's thinking process.

Query Generation Pipeline. The diagram illustrates the complete flow of transforming a natural language question into a Cypher query, showing how three inputs are processed through structured prompt components. The process is organized into three main stages (Input Processing, Context Building, and Final Guidelines) that culminate in a structured JSON.

Output generation pipeline, highlighting the system's final stages of data presentation and analysis. The diagram shows how processed query results are transformed into both visual presentations and analytical summaries, demonstrating the dual output approach of visualization and summarization of query results.

Summary

Expert emulation provides a systematic framework for building, improving and extending knowledge graph systems. When facing any challenge - whether implementing new features, fixing issues, or enhancing existing capabilities - we can find solutions by asking "what would an expert do?" and breaking down their approach into implementable steps.
A well-structured intent detection system requires two layers of classification. The first layer handles broader query categories (data-related, system-related, feedback) while the second identifies the visualization needs.
Converting technical database schemas into LLM-friendly formats involves more than just reformatting - it requires carefully filtering unnecessary elements, adding contextual annotations, and structuring information in ways that align with how LLMs process and understand data.
Prompt engineering for LLMs requires giving them "time to think." This means structuring prompts to encourage reasoning before answers and using techniques like chain-of-thought prompting to improve response quality and reliability.
Query generation prompts need several key elements to work effectively: comprehensive schema context, current user selection state, intent-specific requirements, and carefully chosen examples that demonstrate desired patterns.
Result summarization works best as a complement to visualization. Rather than repeating what's visible in the graph, effective summaries highlight insights and patterns that might not be immediately apparent visually.

Neo4j APOC Library: apoc.meta.schema https://neo4j.com/docs/apoc/current/overview/apoc.meta/apoc.meta.schema/
Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., & Zhou, D. (2023). "Chain of Thought Prompting Elicits Reasoning in Large Language Models." arXiv:2201.11903
Nye, M., Andreassen, A. J., Gur-Ari, G., Michalewski, H., Austin, J., Bieber, D., Dohan, D., Lewkowycz, A., Bosma, M., Luan, D., Sutton, C., & Odena, A. (2021). "Show Your Work: Scratchpads for Intermediate Computation with Language Models." arXiv:2112.00114
Raj, H., Gupta, V., Rosati, D., & Majumdar, S. (2023). "Semantic Consistency for Assuring Reliability of Large Language Models." arXiv:2308.09138

FAQ

What are the main limitations of RAG when answering complex questions on a knowledge graph?

RAG is only as good as its retriever. If key passages are missed or too coarse/fine-grained, the LLM receives fragmented context and can produce confident but wrong answers. Answers often span multiple documents, some relevant docs may not contain the explicit answer, and even the omission of a single critical piece (like one witness statement) can flip conclusions. These issues are amplified in KG tasks that require understanding relationships and constraints.

How does the expert-emulation approach differ from traditional RAG?

Instead of “generating an answer,” it focuses on “asking the right question.” It translates natural language into precise graph traversals by:

Understanding the graph schema and mapping domain concepts to entities, relationships, and constraints
Mimicking expert reasoning patterns for query construction
Leveraging metadata/annotations to disambiguate terms
Producing formal Cypher queries and presenting results meaningfully via the UI

Why start with intent detection, and how are intents classified?

Intent detection routes requests to the right pipeline and output format, improving relevance and visualization. Two layers are common:

Visualization type: graph (default), table (aggregations/stats), chart (distributions), map (locations)
Broader categories: Data-Related; System-Related (Documentation-Related or Schema-Related); Feedback/Complaints

Including a “reason” field in JSON aids debugging and prompt refinement.

What is a conceptual schema, and why prefer it over the raw technical schema (e.g., apoc.meta.schema)?

A conceptual schema filters out helper/admin nodes, internal/unused properties and relationships, and redundant labels to keep only domain-relevant entities and connections. Benefits:

Aligns with human/LLM reasoning
Reduces cognitive load and hallucinations
Minimizes query errors from implementation details
Improves interpretability and mapping from NL to Cypher

How should a schema be represented so an LLM can use it effectively?

Use a concise, consistent, LLM-friendly text format:

List nodes with key properties and types
List relationships with direction and property types
Add inline comments describing semantics, codes, and constraints
Optionally drive generation via a YAML config to skip irrelevant elements and inject descriptions

This yields a clean, annotated schema the LLM can reliably map to queries.

Which annotations/metadata help prevent query errors?

Descriptive annotations clarify how data is encoded and how relationships should be used, e.g.:

Value encodings (e.g., Vehicle.color uses BLK, GRY, SIL, WHI)
Relationship semantics (COMMITTED vs CO_OFFENDS_WITH)
Property meanings and formats (dates, IDs, geospatial)

These “cheat sheet” notes guide the LLM away from naive assumptions.

How do you structure the prompt to generate reliable Cypher queries?

Use a structured, reusable prompt that includes:

Task definition and the user question
LLM-friendly, annotated schema
Intent-dependent requirements (graph/table/chart/map)
Few-shot examples aligned to the output format
Optional current selection from the UI
Graph-specific notes/annotations
Reminder of the question (and prior error messages if retrying)
Strict JSON output schema: relationships (to traverse), reasoning (scratchpad), query (Cypher), success (bool)

Why ask for reasoning before the final answer, and what techniques help?

LLMs tend to post-hoc justify early answers due to semantic consistency. Requesting step-by-step reasoning first (Chain-of-Thought, scratchpad) encourages deliberation and reduces biased or rushed conclusions. Listing potential relationships to traverse before query generation further curbs hallucinations and aligns the final Cypher with the schema.

How does the system integrate with a front-end and choose visualizations?

Intent detection selects the presentation:

Graph: return nodes and relationships (avoid anonymous relationships; aggregate traversals)
Table: select and rename properties for tabular display
Chart: support distributions/aggregations
Map: include locations/geometry for GIS views

The prompt nudges the Cypher to match the chosen visualization’s data needs.

What is the purpose of the summarization step, and how is it prompted?

Summarization turns raw query results into actionable insights that complement the visual graph. The prompt:

Provides the original question, executed query, results, and (optionally) current selection
Instructs the model to filter irrelevant path details and keep factual, question-relevant points
Asks whether analysis is requested; if yes, include brief insights
Returns strict JSON with results_analysis (bool), reasoning, and a concise summary

This produces clear, useful takeaways alongside the visualization.

pro $24.99 per month

access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
choose one free eBook per month to keep
exclusive 50% discount on all purchases
renews monthly, pause or cancel renewal anytime

lite $19.99 per month

access to all Manning books, including MEAPs!

team

5, 10 or 20 seats+ for your team - learn more

eBook

pdf, ePub, online

$64.99 $38.99

you save $26.00 (40%)

include audio $24.99 $17.49

pro $24.99 per month

access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
choose one free eBook per month to keep
exclusive 50% discount on all purchases
renews monthly, pause or cancel renewal anytime

lite $19.99 per month

access to all Manning books, including MEAPs!

team

5, 10 or 20 seats+ for your team - learn more

eBook

$64.99 $38.99

you save $26.00 (40%)

include audio $24.99 $17.49

eBook

pdf, ePub, online

$64.99 $38.99

you save $26.00 (40%)

include audio $24.99 $17.49

pro $24.99 per month

access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
choose one free eBook per month to keep
exclusive 50% discount on all purchases
renews monthly, pause or cancel renewal anytime

lite $19.99 per month

access to all Manning books, including MEAPs!

team

5, 10 or 20 seats+ for your team - learn more