Graph Databases in Action
Examples in Gremlin
Dave Bechberger, Josh Perryman
  • MEAP began January 2019
  • Publication in Fall 2020 (estimated)
  • ISBN 9781617296376
  • 350 pages (estimated)
  • printed in black & white

A great introduction to Graph databases in general and will help you get up to speed quickly.

Douglas Duncan
Relationships in data often look far more like a web than an orderly set of rows and columns. Graph databases shine when it comes to revealing valuable insights within complex, interconnected data such as demographics, financial records, or computer networks. In Graph Databases in Action, experts Dave Bechberger and Josh Perryman illuminate the design and implementation of graph databases in real-world applications. You’ll learn how to choose the right database solutions for your tasks, and how to use your new knowledge to build agile, flexible, and high-performing graph-powered applications!

About the Technology

Graph databases store interconnected data in a more natural form, making them superior tools for representing data with rich relationships. Unlike in relational database management systems (RDBMS), where a more rigid view of data connections results in the loss of valuable insights, in graph databases, data connections are first priority. Graph databases store data and its connections, unlocking the potential to store, process, and query with incredible efficiency. With giants like NASA, eBay, Walmart, and Fortune 500 financial service providers leveraging this hot technology, graph database skills are priceless! Cloud databases like AWS Neptune and Microsoft CosmosDB now add to the growing list of graph databases powered by the Gremlin graph traversal language presented in this book.

About the book

Graph Databases in Action teaches you everything you need to know to begin building and running applications powered by graph databases. Right off the bat, seasoned graph database experts and authors Dave Bechberger and Josh Perryman introduce you to just enough graph theory, the graph database ecosystem, and a variety of datastores. You’ll also explore modelling basics in action with real-world examples, then go hands-on with querying, coding traversals, parsing results, and other essential tasks as you build your own graph-backed social network app complete with a recommendation engine!

With valuable firsthand experience under your belt, you’re ready for advanced concepts including query tuning, data model tuning, evolving your graphs, and pitfalls and anti-patterns like supernodes, hidden entities, and anemic edges. All examples are presented in the open source Apache TinkerPop framework and the Gremlin language, and almost all concepts and constructs are compatible with Cypher/openCypher databases such as Neo4j. With this comprehensive guide, you’ll be building graph-powered applications that dramatically increase the value of data—as well as your professional value to the companies savvy enough to use them!
Table of Contents detailed table of contents

Part 1: Getting Started with Graph Databases

1 Introduction to Graphs

1.1 What is a graph?

1.1.1 What is a graph database?

1.1.2 Comparison with other types of databases

1.1.3 Why Can’t I Use SQL?

1.2 Is my problem a graph problem?

1.2.1 Explore the questions

1.2.2 I’m still confused… Is this a graph problem?

1.4 Summary

2 Graph Data Modeling

2.1 The Data Modeling Process

2.1.1 Data Modelling Terms

2.1.2 Four Step Process for Data Modeling

2.2. Understand the problem

2.3 Developing the whiteboard model

2.3.1 Identifying and grouping entities

2.3.2 Identifying relationships between entities

2.4 Constructing the logical data model

2.4.1 Translate entities to vertices

2.4.2 Translate relationships to edges

2.4.3 Find and assign properties

2.5 Check our model

2.6 Summary

3 Running Basic and Recursive Traversals

3.1 Setting up your Environment

3.2 Traversing a graph

3.2.1 Fundamental Concepts of Traversing a Graph

3.2.2 Writing traversals in Gremlin

3.3 Recursive Traversals

3.3.1 Writing Recursive Traversals in Gremlin

3.4 Summary

4 Pathfinding Traversals and Mutating a Graph

4.1 Mutating a Graph

4.1.1 Creating Vertices and Edges

4.1.2 Removing Data From our Graph

4.1.3 Updating a Graph

4.1.4 Extending our Graph

4.2 Paths

4.2.1 Cycles in Graphs

4.2.2 Finding the Simple Path

4.3 Traversing and Filtering Edges

4.3.1 Introduction of “E” and “V” steps for Traversing Edges

4.3.2 Filtering with Edge Properties

4.3.3 Include Edges in Path Results

4.3.4 Performant Edge Counts and Denormalization

4.4 Summary

5 Formatting Results

5.1 Review of Values Steps

5.2 Constructing our Result Payload

5.2.1 Applying Aliases in Gremlin

5.3 Organizing our Results

5.3.1 Ordering results returned from a graph traversal

5.3.2 Grouping results returned from a graph traversal

5.3.3 Limiting Results

5.4 Combing steps into complex traversals

5.5 Summary

6 Developing an Application

6.1 Starting the project

6.1.1 Selecting our Tools

6.1.2 Software Project Setup

6.1.3 Obtaining a Driver: Apache TinkerPop’s Gremlin Driver

6.1.4 Prepare Database Server Instance

6.2 Connecting to our database

6.2.1 Build the cluster configuration

6.2.2 Setup the GraphTraversalSource

6.3 Retrieving Data

6.3.1 Retrieving a Vertex

6.4 Adding/Modifying/Deleting data

6.4.1 Adding Vertices

6.4.2 Adding Edges

6.4.3 Updating Properties

6.4.4 Deleting Elements

6.5 Translating our List and Path Traversals

6.5.1 Lists of Results

6.5.2 Implement recursive traversals

6.5.3 Implementing Paths

6.6 Summary

Part 2: Building on Graph Databases

7 Advanced Data Modeling Techniques

7.1 Reviewing our Current Data Models

7.2 Extending our Logical Data Model

7.3 Translate Entities to Vertices

7.3.1 Generic Labels

7.3.2 Data Denormalization

7.3.3 Translate Relationships to Edges

7.3.4 Find and Assign Properties

7.3.5 Moving Properties to Edges

7.3.6 Check our Model

7.4 Extending our Data Model for Personalization

7.5 Comparing the Results

7.6 Summary

8 Building Traversals Using Known Walks

8.1 Preparing to develop our traversals

8.1.1 Identifying the required elements

8.1.2 Selecting a starting place

8.2 Setting Up Test Data

8.3 Writing Our First Traversal

8.3.1 Designing Our Traversal

8.3.2 Developing the Traversal Code

8.4 Pagination and graph databases

8.5 Recommending the Highest Rated Restaurants

8.5.1 Designing Our Traversal

8.5.2 Developing our Traversal Code

8.6 Writing the Last Recommendation Engine Traversal

8.7 Summary

9 Working with Subgraphs

9.1 Working with Subgraphs

9.1.1 Extracting a Subgraph

9.1.2 Traversing a Subgraph

9.2 Building a subgraph for personalization

9.3 Building the traversal

9.3.1 Evaluating the individualized results of the subgraph

9.4 Implementing a subgraph() with a remote connection

9.5 Summary

Part 3: Moving Beyond the Basics

10 Performance, Pitfalls and Anti-patterns

10.1 Slow performing traversal

10.1.1 Explaining our traversal

10.1.2 Profiling our traversal

10.1.3 Indexes

10.2 Dealing with supernodes

10.2.1 What makes a supernode?

10.2.2 Monitoring for supernodes

10.2.3 What to do if you have a supernode

10.3 Application anti-patterns

10.3.1 Using graphs for non-graph use cases

10.3.2 “Dirty” Data

10.3.3 Lack of adequate testing

10.4 Traversal anti-patterns

10.4.1 Not using parameterized traversals

10.4.2 Using unlabeled filtering steps

10.5 Summary

11 What’s Next: Graph Analytics, Machine Learning, Resources

11.1 Graph Analytics

11.1.1 Path Finding

11.1.2 Centrality

11.1.3 Community Detection

11.1.4 Graphs and Machine Learning

11.1.5 Additional Resources

11.2 Final Thoughts

11.3 Summary

Appendixes

Appendix A: Apache TinkerPop Installation and Overview

A.1 Overview

A.1.1 Gremlin Traversal Language

A.1.2 TinkerGraph

A.1.3 Gremlin Console

A.1.4 Gremlin Language Variants

A.1.5 Gremlin Server

A.1.6 Documentation

A.2 Installation

A.2.1 Install and Verify the Java Runtime

A.2.2 Install Gremlin Console

A.2.3 Install Gremlin Server

A.2.4 Configure Gremlin Console to Connect to Gremlin Server

A.2.5 Using the Gremlin Console

What's inside

  • Graph database fundamentals
  • An overview of the graph database ecosystem
  • Relational vs. graph database modelling
  • Querying graphs using Gremlin
  • Real-world common graph use cases
  • Basic graph algorithms
  • A hands-on graph-backed application project
  • Performance tuning
  • Pitfalls and anti-patterns
  • Graph analytics

About the reader

For readers with basic Java and application development skills building in RDBMS systems such as Oracle, SQL Server, MySQL, and Postgres. No experience with graph databases is required.

About the authors

Dave Bechberger has extensive experience using graph databases as a product architect and a consultant. He’s spent his career leveraging cutting-edge technologies to build software in complex data domains such as bioinformatics, oil and gas, and supply chain management. He’s an active member of the graph community and has presented on a wide variety of graph-related topics at national and international conferences.

Josh Perryman is technologist with over two decades of diverse experience building and maintaining complex systems, including high performance computing (HPC) environments. Since 2014 he has focused on graph databases, especially in distributed or big data environments, and he regularly blogs and speaks at conferences about graph databases.

placing your order...

Don't refresh or navigate away from the page.
Manning Early Access Program (MEAP) Read chapters as they are written, get the finished eBook as soon as it’s ready, and receive the pBook long before it's in bookstores.
print book $29.99 $49.99 pBook + eBook + liveBook
Additional shipping charges may apply
Graph Databases in Action (print book) added to cart
continue shopping
go to cart

eBook $24.99 $39.99 3 formats + liveBook
Graph Databases in Action (eBook) added to cart
continue shopping
go to cart

Prices displayed in rupees will be charged in USD when you check out.

FREE domestic shipping on three or more pBooks