Knowledge Graphs and LLMs in Action you own this product

Alessandro Negro with Vlastimil Kus, Giuseppe Futia and Fabio Montagna
Forewords by Maxime Labonne, Khalifeh AlJadda

October 2025
ISBN 9781633439894
472 pages

Included with a Manning Online subscription

printed in black & white

available in Russian, Simplified Chinese

catalog / Data Science / Machine Learning / Knowledge Graphs

table of content

Part 1 Foundations of hybrid intelligent systems

1 Knowledge graphs and LLMs: A killer combination

1.1 Knowledge graphs

1.2 Large language models

1.3 KGs and LLMs: Stronger together

1.4 The paradigm shift in data-driven applications

1.4.1 The four pillars of knowledge graphs

1.5 Building data-driven applications using KGs and LLMs

1.5.1 Example use case: Drug discovery and development

1.5.2 Example use case: Conversational AI for customer support

1.5.3 Deciding whether to use a KG

1.6 Knowledge graph technologies

1.6.1 Taxonomies and ontologies

1.7 How do we teach KGs and LLMs?

2 Intelligent systems: A hybrid approach

2.1 What is intelligence?

2.2 Designing an intelligent system

2.2.1 What is an intelligent system?

2.2.2 Categories of intelligent systems

2.2.3 Characteristics of an intelligent system

2.3 Knowledge acquisition and representation

2.4 Reasoning

2.5 Reasoning engines

2.5.1 Limitations of a pure deductive reasoning engine

2.5.2 Using inductive reasoning and ML

2.5.3 The role of LLMs in the reasoning engine

2.6 A KG approach to IASs

Part 2 Building knowledge graphs from structured data sources

3 Create your first knowledge graph from ontologies

3.1 Knowledge graph building: Warmup

3.1.1 Business and domain understanding

3.1.2 Data understanding

3.2 Understanding knowledge graph technologies

3.2.1 RDF or LPG? A goal-driven discussion

3.2.2 Representing edge properties with RDF and LPG

3.3 Building a knowledge graph

3.3.1 Ontology ingestion and processing with neosemantics

3.3.2 Annotation ingestion and processing

3.4 Querying the data

3.5 Reasoning over the KG

4 From simple networks to multisource integration

4.1 Biomedical knowledge graphs and applications

4.2 Multi-omic applications of KGs

4.2.1 Creating a KG from the PPI and protein-disease networks

4.2.2 High-level analysis of the resulting KGs

4.2.3 Domain-specific analysis of the PPI and disease KG

4.3 Pharmaceutical applications of KGs

4.3.1 Deep analysis of the Hetionet knowledge graph

4.3.2 LLM-assisted interpretation of pathway analysis results

4.4 Clinical applications of KGs

4.4.1 LLM-guided clinical decision support analysis

Part 3 Building knowledge graphs from text

5 Extracting domain-specific knowledge from unstructured data

5.1 The archives challenge

5.2 Key concepts of knowledge extraction

5.2.1 Recognizing named entities

5.2.2 Extracting relations

5.3 Building KGs with large language models

5.3.1 Using LLMs

5.3.2 Prompt engineering examples

5.3.3 Prompt engineering guidelines

5.3.4 KG building: Traditional NLP or LLMs?

6 Building knowledge graphs with large language models

6.1 Transforming an archive to a KG

6.1.1 Graph modeling

6.1.2 Creating a metagraph

6.1.3 Normalization and cleansing

6.1.4 Graph-based entity resolution

6.2 Intellectual network analysis: The value of graphs

6.3 Next steps in the Rockefeller Archive Center project

6.4 The value of knowledge graphs in the LLM era

7 Named entity disambiguation

7.1 From recognition to disambiguation

7.2 Understanding named entity disambiguation

7.3 Domain-based NED and LLMs

7.4 Business and domain understanding

7.4.1 Context

7.4.2 Use case definition

7.5 Understanding the data

7.5.1 Unstructured data

7.5.2 Domain ontologies

7.6 Building a SoHO knowledge graph

7.6.1 Defining the schema

7.6.2 Processing and ingesting documents

7.6.3 Disambiguating and ingesting medical entities

7.6.4 Processing, loading, and mapping ontologies

7.6.5 Generating entity co-occurrences

7.7 KG-based use cases

7.7.1 Conceptual search

7.7.2 Structured knowledge-based search

7.7.3 KG-based interpretability and discovery

7.7.4 Uncovering new knowledge

8 NED with open LLMs and domain ontologies

8.1 Understanding limitations of traditional NED systems

8.2 Ingesting the domain ontology

8.3 Setting up the model with Ollama and Llama 3.1 8B

8.4 End-to-end NED process

8.4.1 Named entity recognition

8.4.2 Candidate selection

8.4.3 Candidate disambiguation

8.5 Conclusions

Part 4 Machine learning on knowledge graphs

9 Machine learning on knowledge graphs: A primer approach

9.1 Machine learning on graphs: Why?

9.2 Machine learning on graphs: What?

9.2.1 Node classification

9.2.2 Link prediction (a.k.a. relationship prediction)

9.2.3 Clustering and community detection

9.2.4 Graph classification

9.3 Machine learning on graphs: How?

9.3.1 Node classification and link prediction

9.3.2 Graph classification

9.3.3 Graph clustering

10 Graph feature engineering: Manual and semiautomated approaches

10.1 Manual node features

10.1.1 Degree

10.1.2 Triangles

10.1.3 Density

10.1.4 Geodesic (or shortest) path

10.1.5 Closeness

10.1.6 Betweenness

10.1.7 PageRank

10.1.8 Prediction

10.2 Manual relationship features

10.2.1 Node-based representation

10.2.2 Path-based features

10.3 Semiautomated feature extraction

10.3.1 Performing ReFeX manually

10.3.2 Performing ReFeX automatically with code

11 Graph representation learning and graph neural networks

11.1 Embeddings in graph representation learning

11.1.1 Understanding graph embeddings: From discrete to continuous

11.1.2 Real-world applications and examples

11.2 The encoder–decoder model

11.2.1 The encoder: Converting graph structure to vectors

11.2.2 The decoder: Reconstructing graph properties

11.2.3 The power of the framework

11.2.4 Node2Vec: An example of an encoder–decoder framework

11.3 Shallow embeddings: A first approach to graph representation

11.3.1 Understanding shallow embeddings

11.3.2 Limitations of shallow embeddings

11.4 Embeddings in knowledge graphs

11.4.1 Loss function

11.4.2 Multirelationship decoder

11.5 Message passing and graph neural networks

11.5.1 The message-passing framework: A neural conversation

11.5.2 Motivation and intuition: Why message passing works

11.5.3 The basic GNN model

11.5.4 Message passing with self-loops

11.6 Generalized aggregation and update methods

11.6.1 Neighborhood normalization

11.6.2 Neighborhood attention

11.6.3 Multihead attention and transformer connections

11.6.4 Generalized update methods

11.7 The synergy of GNNs and LLMs

12 Node classification and link prediction with GNNs

12.1 Node classification for anti-money laundering applications

12.1.1 Input data

12.1.2 Graph processor: Data preparation

12.1.3 Graph processor: Homogeneous PyG graph

12.1.4 Encoder–decoder architecture

12.1.5 Evaluation and analysis

12.2 Link prediction for movie recommendations

12.2.1 Input data

12.2.2 Graph processor: Data preparation

12.2.3 Graph processor: Heterogeneous PyG graph

12.2.4 Encoder–decoder architecture

12.2.5 Evaluation and analysis

Part 5 Information retrieval with knowledge graphs and LLMs

13 Knowledge graph–powered retrieval-augmented generation

13.1 AI agents

13.2 Chatting with the LLM

13.3 Challenges in the production environment

13.4 Chatting with the AI about private data

13.4.1 Retrieval-augmented generation

13.4.2 Vector-based RAG limitations

13.4.3 Graph RAG

13.4.4 Reasoning agents

13.4.5 Let’s chat with our KG

14 Asking a KG questions with natural language

14.1 Querying a knowledge graph in the policing domain

14.1.1 Enabling domain experts with knowledge graphs

14.2 RAG for KG querying: Capabilities and challenges

14.2.1 RAG effectiveness with complete context

14.2.2 RAG fragility with incomplete retrieval

14.3 Schema-based approach for querying KGs

14.3.1 Understanding and using graph schemas

14.4 Think like an expert: Using metadata for enhanced querying

14.5 Intent detection: Understanding user expectations

14.5.1 Classifying by visualization type

14.5.2 Is it data, documentation, or just complaining?

14.6 From schema to LLM-ready context

14.6.1 Schema extraction and representation

14.6.2 Enriching schemas with descriptive annotations

14.6.3 A practical approach to schema representation

14.7 It’s time to think: Understanding LLM reasoning

14.7.1 The order matters: Answer first vs. reasoning first

14.7.2 Thinking in queries: From text to Cypher

14.7.3 Structuring output for reliable query generation

14.8 Response summarization: From results to insights

15 Building a QA agent with LangGraph

15.1 Building the LangGraph pipeline

15.1.1 System architecture overview

15.1.2 Configuring pipeline components

15.1.3 Schema translation service

15.1.4 State management design

15.1.5 Pipeline agent implementation

15.1.6 Pipeline integration layer

15.2 Streamlit application

15.2.1 Application overview

15.2.2 LangGraph integration

15.3 Expert-emulating investigation

15.3.1 Identifying the initial case

15.3.2 Spatial analysis of surveillance coverage

15.3.3 Vehicle pattern detection

15.3.4 Context-aware request refinement

15.3.5 Historical record analysis

15.4 Future directions and enhancements

15.4.1 Learning from use

15.4.2 Enhancing core capabilities

15.4.3 Advanced evolution paths

Appendixes

Appendix A: Introduction to graphs

A.1 What is a graph?

A.2 Graphs as models of networks

A.3 Representing graphs

Appendix B: Neo4j

B.1 Introduction to Neo4j

B.2 Installing Neo4j

B.2.1 Installing a Neo4j server

B.2.2 Neo4j Desktop installation

B.3 Cypher

B.4 Installing plugins

B.4.1 Installing APOC Core

B.4.2 GDS installation

B.5 Cleaning

Appendix C: Building knowledge graphs from structured sources

C.1 MicroRNA–disease association: Warmup

C.1.1 Key concepts

C.1.2 Business understanding

C.1.3 Data understanding

C.2 Building the miRNA knowledge graph

C.2.1 Importing known miRNA–disease connections

C.2.2 Importing the disease ontology

C.2.3 Importing miRNA information

C.3 Exploring and analyzing the miRNA KG

Appendix D: references

Overview

1 Knowledge graphs and LLMs: a killer combination

Artificial intelligence—especially generative models—has changed how people interact with technology, yet mission-critical domains still demand accuracy, transparency, and fresh, domain-grounded knowledge that generic LLMs struggle to provide on their own. This chapter introduces knowledge graphs (KGs) as structured, contextual, and explainable representations of entities and their relationships, and shows why pairing them with LLMs creates a compelling foundation for advanced applications. The two technologies are presented as complementary: KGs bring verifiable, updatable knowledge and provenance, while LLMs contribute powerful natural language understanding and generation. The result is a “killer combination” aimed at sectors like healthcare, finance, and law enforcement, and at practitioners who need both expressive knowledge management and intuitive user experiences.

The chapter details how LLMs strengthen KGs by extracting entities and relations from unstructured text, accelerating graph construction, easing complex querying through natural language, and summarizing multi-hop results into clear answers. In the opposite direction, KGs mitigate LLM limitations by grounding responses to trusted data, reducing hallucinations, improving explainability, and keeping knowledge current without retraining an entire model. Together they enable natural language access to deeply connected organizational knowledge, harmonize heterogeneous sources, and support sophisticated reasoning and analysis. This synergy sets up a positive flywheel in which better graphs yield better model behavior, which in turn enriches the graphs. Concrete scenarios—such as drug discovery and conversational customer support—illustrate the practical value of the approach.

From an implementation view, the chapter frames a paradigm shift from rigid, siloed databases to graph-centered knowledge substrates built on four pillars: evolution (flexible, ever-growing structures), semantics (typed entities and meaningful relationships), integration (uniting structured and unstructured sources), and learning (reasoning and analytics for humans and machines). It advocates pragmatic, “just enough” semantics via taxonomies and ontologies, avoiding unnecessary rigidity while preserving interpretability and inference. The book is technology-agnostic, drawing on both RDF/SPARQL and labeled property graphs with openCypher/Gremlin, and explains their complementary strengths. Readers are guided to model schemas, ingest and validate data, apply modern ML (including GNNs), and use LLMs for extraction, query generation, and summarization—always anchored in business goals and demonstrated through real-world, end-to-end examples.

Example of a KG in the healthcare domain. The nodes (the circles) represent the entities, like people, diseases, anatomic parts, etc. The edges represent meaningful connections among entities. Both nodes and edges have properties describing relevant details.

Transfer Learning high-level principles. In transfer learning a model (or part of it) trained on a specific task (e.g. predicting randomly masked tokens) is then copied to be part of the training and the prediction for another task (e.g. relation extraction).

LLMs building blocks and differentiating characteristics.

Knowledge graph building with and without LLMs and LLMs support for querying and retrieval.

Summary of how LLMs and KGs can complement each other. Inspired by [11].

The four pillars of KGs: evolution, semantics, integration, and learning

Summary

LLMs and KGs empower each other by overcoming individual limitations when used in isolation.
LLMs support KG creation from structured data and simplify the querying phase.
KGs provide ground knowledge for LLMs to answer domain-specific questions using up-to-date and private data.
Data-driven systems with contextualized knowledge are strategic for high-impact applications like recruitment tools and medical predictions.
KGs represent a core abstraction for incorporating human knowledge into machines, while LLMs provide natural language understanding capabilities.
KG and LLM adoption represents a paradigm shift where intelligent behavior is encoded once in a unique source of trust. This empowers data representation for different applications and diverse tasks.
KGs are ever-evolving graph data structures containing typed entities, their attributes, and meaningful relationships. They are built for specific domains from structured and unstructured data to craft knowledge for both humans and machines.
KGs have four pillars: evolution, semantics, integration, and learning.
KGs and LLMs support critical domains with data-driven decisions across multiple applications. These include Customer 360 for banking, drug discovery, retail recommendations, and conversational systems.
Two key technologies represent KGs: Resource Description Framework (RDF) and labeled property graphs (LPG).
Taxonomies and ontologies play fundamental roles by incorporating semantic metadata that makes traditional graphs smarter.

J. Launchbury. “A DARPA Perspective on Artificial Intelligence” (2020). Accessed: May 5, 2022. [Online].
J. Yosinski, et al., "How transferable are features in deep neural networks?," Proceedings of the 27th International Conference on Neural Information Processing Systems - Volume 2 (p./pp. 3320--3328), Cambridge, MA, USA: MIT Press, Accessed: July 26, 2024. [Online]. Available: https://arxiv.org/abs/1411.1792
A. Vaswani et al., "Attention is all you need," In Proceedings of the 31st International Conference on Neural Information Processing Systems (NIPS'17). Curran Associates Inc., Red Hook, NY, USA, 6000–6010, 2017. Accessed: July 26, 2024. [Online]. Available: https://arxiv.org/abs/1706.03762
J. Devlin et al., "BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding," 2018, Accessed: July 26, 2024. [Online]. Available: https://arxiv.org/abs/1810.04805
A. Radford et al., "Language models are unsupervised multitask learners," 2019. Accessed: July 26, 2024. [Online]. Available: https://api.semanticscholar.org/CorpusID:160025533
OpenAI, GPT-4 Technical Report. 2023, Accessed: July 26, 2024. [Online]. Available: https://arxiv.org/abs/2303.08774
J. W. Rae et al., "Scaling language models: Methods, analysis & insights from training Gopher," arXiv preprint arXiv:2112.11446, 2021, Accessed: July 26, 2024. [Online]. Available: https://arxiv.org/abs/2112.11446
A. Chowdhery et al., "PaLM: Scaling language modeling with pathways," The Journal of Machine Learning Research, Volume 24, Issue 1 Article No.: 240, Pages 11324 - 11436, 2023, Accessed: July 26, 2024. [Online]. Available: https://dl.acm.org/doi/10.5555/3648699.3648939
J. Kaplan et al., “Scaling Laws for Neural Language Models,” 2020, arXiv:2001.08361 [cs.LG], Accessed: July 26, 2024. [Online]. Available: https://arxiv.org/abs/2001.08361
J. Z. Pan et al., "Large Language Models and Knowledge Graphs: Opportunities and Challenges," Transactions on Graph Data and Knowledge, 1, 2:1--2:38. doi: 10.4230/TGDK.1.1.2, 2023, Accessed: July 26, 2024. [Online]. Available: https://arxiv.org/abs/2308.06374
S. Pan et al., "Unifying Large Language Models and Knowledge Graphs: A Roadmap," IEEE Transaction on Knowledge Data Engineering, 36, 3580-3599. 2024, Accessed: July 26, 2024. [Online]. Available: https://ieeexplore.ieee.org/document/10387715
D. Grande et al., “Reducing data costs without jeopardizing growth.” McKinsey Digital, 2020. Accessed: May 5, 2022. [Online]. Available: https://www.mckinsey.com/business-functions/mckinsey-digital/our-insights/reducing-data-costs-without-jeopardizing-growth
F. Lecue, “On the role of knowledge graphs in explainable AI,” Semantic Web, vol. 11 no. 1, pp. 41-51, 2020. Accessed: May 5, 2022. [Online]. Available: http://semantic-web-journal.org/system/files/swj2259.pdf
H. Zhang et al., “Grounded conversation generation as guided traverses in commonsense knowledge graphs,” arXiv Labs, arXiv preprint arXiv:1911.02707, May 5, 2020. Accessed: May 5, 2022. [Online]. Available: https://arxiv.org/abs/1911.02707
J. Barrasa, A. E. Hodler, and J. Webber, Knowledge Graphs. Sebastopol, CA, USA: O'Reilly Media, Inc., 2021. Accessed: May 5, 2022. [Online]. Available: https://www.oreilly.com/library/view/knowledge-graphs/9781098104863/

FAQ

What is a Knowledge Graph (KG) and how is it structured?

A Knowledge Graph is an ever-evolving graph where nodes represent real-world entities, edges represent meaningful named relationships, and properties add context. It encodes domain knowledge in a structured, explainable form that supports reasoning and complex queries across connected data.

What are Large Language Models (LLMs) and why are they effective?

LLMs are large neural networks trained on vast text corpora. Built on transfer learning and transformer architectures, they generalize across many NLP tasks, enabling natural language understanding and generation. With sufficient scale and high-quality data, performance improves markedly, making prompt-driven use practical.

Why are KGs and LLMs a “killer combination”?

LLMs extract entities/relations from unstructured text, help generate queries, and summarize results; KGs ground LLM responses with reliable, up-to-date domain knowledge, reducing hallucinations and improving explainability. Together they deliver accurate, contextual, and user-friendly intelligent systems.

What key challenges have historically limited KG adoption?

High costs to build and maintain, complex multi-hop access patterns, and results dispersed across many nodes/edges. Extracting knowledge from unstructured text (languages, typos, pronouns, styles, domain jargon) further increases complexity without the aid of modern NLP/LLMs.

How do LLMs help build and query Knowledge Graphs?

LLMs accelerate entity and relationship extraction from unstructured sources (papers, reports), assist schema-aware querying via natural language, and produce clear textual summaries of graph results. They reduce the need for many task-specific NLP models and make graph access more accessible.

How do KGs mitigate common LLM limitations like hallucinations and stale knowledge?

KGs provide trusted, structured facts that can ground generation, detect inconsistencies, and explain answers via explicit relationships. They also externalize knowledge so updates don’t require retraining the LLM, improving freshness and auditability for mission-critical domains.

What is the paradigm shift in data-driven applications described in the chapter?

Move from rigid, purpose-built systems over siloed data to flexible, semantics-rich graphs as a single source of truth, with relationships as first-class citizens. LLMs add natural language interfaces and unstructured-data ingestion, enabling adaptable, explainable, enterprise-scale applications.

What are the four pillars of modern Knowledge Graphs?

Evolution (flexible, ever-extensible structure), Semantics (typed entities and meaningful relationships), Integration (unifying structured and unstructured, multi-source data), and Learning (supporting human analysis, reasoning, and ML over graph structure).

How do RDF and Labeled Property Graphs (LPG) compare?

RDF encodes knowledge as statements (triples) with strong semantic interoperability and SPARQL querying; LPG models nodes/edges with properties, favoring traversal and pathfinding via languages like openCypher/Gremlin. RDF suits ontology-driven consistency; LPG excels at rich property-based analysis. They can be complementary depending on use case.

How do I decide whether I need KGs, LLMs, or both?

Use KGs if you must harmonize silos, connect structured/unstructured data, evolve schemas, ensure provenance, and power graph-centric search/recommendation/visualization/ML. Use LLMs for entity/relation extraction from text, interpreting complex queries, conversational interfaces, and summarization. Most real-world systems benefit from both.

pro $24.99 per month

access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
choose one free eBook per month to keep
exclusive 50% discount on all purchases
renews monthly, pause or cancel renewal anytime

lite $19.99 per month

access to all Manning books, including MEAPs!

team

5, 10 or 20 seats+ for your team - learn more

eBook

pdf, ePub, online

$47.99

include audio $24.99

click to save $959.80 (20%)

check the box to apply

pro $24.99 per month

access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
choose one free eBook per month to keep
exclusive 50% discount on all purchases
renews monthly, pause or cancel renewal anytime

lite $19.99 per month

access to all Manning books, including MEAPs!

team

5, 10 or 20 seats+ for your team - learn more

eBook

$47.99

include audio $24.99

eBook

pdf, ePub, online

$47.99

include audio $24.99

click to save $959.80 (20%)

check the box to apply

pro $24.99 per month

access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
choose one free eBook per month to keep
exclusive 50% discount on all purchases
renews monthly, pause or cancel renewal anytime

lite $19.99 per month

access to all Manning books, including MEAPs!

team

5, 10 or 20 seats+ for your team - learn more