Elasticsearch in Action
Radu Gheorghe, Matthew Lee Hinman, and Roy Russo
  • November 2015
  • ISBN 9781617291623
  • 496 pages
  • printed in black & white
ePub + Kindle available Dec 1, 2015

To understand how a modern search infrastructure works is a daunting task. Radu, Matt, and Roy make it an engaging, hands-on experience.

Sen Xu, Twitter Inc.

Elasticsearch in Action teaches you how to build scalable search applications using Elasticsearch. You'll ramp up fast, with an informative overview and an engaging introductory example. Within the first few chapters, you'll pick up the core concepts you need to implement basic searches and efficient indexing. With the fundamentals well in hand, you'll go on to gain an organized view of how to optimize your design. Perfect for developers and administrators building and managing search-oriented applications.

Table of Contents show full



about this book

author online

about the authors

about the cover illustration


1. Introducing Elasticsearch

1.1. Solving search problems with Elasticsearch

1.1.1. Providing quick searches

1.1.2. Ensuring relevant results

1.1.3. Searching beyond exact matches

1.2. Exploring typical Elasticsearch use cases

1.2.1. Using Elasticsearch as the primary back end

1.2.2. Adding Elasticsearch to an existing system

1.2.3. Using Elasticsearch with existing tools

1.2.4. Main Elasticsearch featurs

1.2.5. Extending Lucene functionality

1.2.6. Structuring your data in Elasticsearch

1.2.7. Installing Java

1.2.8. Downloading and starting Elasticsearch

1.2.9. Verifying that it works

1.3. Summary

2. Diving into the functionality

2.1. Understanding the logical layout: documents, types, and indices

2.1.1. Documents

2.1.2. Types

2.1.3. Indices

2.2. Understanding the physical layout: nodes and shards

2.2.1. Creating a cluster of one or more nodes

2.2.2. Understanding primary and replica shards

2.2.3. Distributing shards in a cluster

2.2.4. Distributed indexing and searching

2.3. Indexing new data

2.3.1. Indexing a document with cURL

2.3.2. Creating an index and mapping type

2.3.3. Indexing documents from the code samples

2.4. Searching for and retrieving data

2.4.2. Contents of the reply

2.4.4. Getting documents by ID

2.5. Configuring Elasticsearch

2.5.1. Specifying a cluster name in elasticsearch.yml

2.5.2. Specifying verbose logging via logging.yml

2.5.3. Adjusting JVM settings

2.6. Adding nodes to the cluster

2.6.1. Starting a second node

2.6.2. Adding additional nodes

2.7. Summary

3. Indexing, updating, and deleting data

3.1. Using mappings to define kinds of documents

3.1.1. Retrieving and defining mappings

3.1.2. Extending an existing mapping

3.2. Core types for defining your own fields in documents

3.2.1. String

3.2.2. Numeric

3.2.3. Date

3.2.4. Boolean

3.3. Arrays and multi-fields

3.3.1. Arrays

3.3.2. Multi-fields

3.4. Using predefined fields

3.4.1. Controlling how to store and search your documents

3.4.2. Identifying your documents

3.5. Updating existing documents

3.5.1. Using the update API

3.5.2. Implementing concurrency control through versioning

3.6. Deleting data

3.6.1. Deleting documents

3.6.2. Deleting indices

3.6.3. Closing indices

3.6.4. Re-indexing sample documents

3.7. Summary

4. Searching your data

4.1. Structure of a search request

4.1.1. Specifying a search scope

4.1.2. Basic components of a search request

4.1.3. URL based search request

4.1.4. Request body-based search request

4.1.5. Understanding the structure of a response

4.2. Introducing the query and filter DSL

4.2.1. Match query and term filter

4.2.2. Most used basic queries and filters

4.2.3. Match query and term filter

4.2.4. Phrase_prefix query

4.3. Combining queries or Compound queries

4.3.1. Bool query

4.3.2. Bool filter

4.4. Beyond match and filter queries

4.4.1. Range query and filter

4.4.2. Prefix query and filter

4.4.3. Wildcard query

4.5. Querying for field existence with filters

4.5.1. Exists filter

4.5.2. Missing filter

4.5.3. Transforming any query into a filter

4.6. Choosing the best query for the job

4.7. Summary

5. Analyzing your data

5.1. What is analysis?

5.1.1. Character filtering

5.1.2. Breaking into tokens

5.1.3. Token filtering

5.1.4. Token indexing

5.2. Using analyzers for your documents

5.2.1. Adding analyzers when an index is created

5.2.2. Adding analyzers to the Elasticsearch configuration

5.2.3. Specifying the analyzer for a field in the mapping

5.3. Analyzing text with the analyze API

5.3.1. Selecting an analyzer

5.3.2. Combining parts to create an impromptu analyzer

5.3.3. Analyzing based on a field’s mapping

5.3.4. Learning about indexed terms using the terms vectors API

5.4. Analyzers, tokenizers, and token filters, oh my!

5.4.1. Built-in analyzers

5.4.2. Tokenization

5.4.3. Token filters

5.5. Ngrams, edge ngrams, and shingles

5.5.1. 1-grams

5.5.2. Bigrams

5.5.3. Trigrams

5.5.4. Setting min_gram and max_gram

5.5.5. Edge ngrams

5.5.6. Ngram settings

5.5.7. Shingles

5.6. Stemming

5.6.1. Algorithmic stemming

5.6.2. Stemming with dictionaries

5.6.3. Overriding the stemming from a token filter

5.7. Summary

6. Searching with relevancy

6.1. How scoring works in Elasticsearch

6.1.1. How scoring documents works

6.1.2. Term frequency

6.1.3. Inverse document frequency

6.1.4. Lucene’s scoring formula

6.2. Other scoring methods

6.2.1. Okapi BM25

6.3. Boosting

6.3.1. Boosting at index time

6.3.2. Boosting at query time

6.3.3. Queries spanning multiple fields

6.4. Understanding how a document was scored with explain

6.4.1. Explaining why a document did not match

6.5. Reducing scoring impact with query rescoring

6.6. Custom scoring with function_score

6.6.1. weight

6.6.2. Combining scores

6.6.3. field_value_factor

6.6.4. Script

6.6.5. random

6.6.6. Decay functions

6.6.7. Configuration options

6.7. Tying it back together

6.8. Sorting with scripts

6.9. Field data detour

6.9.1. The field data cache

6.9.2. What field data is used for

6.9.3. Managing field data

6.10. Summary

7. Exploring your data with aggregations

7.1. Understanding the anatomy of an aggregation

7.1.1. Structure of an aggregation request

7.1.2. Aggregations run on query results

7.1.3. Filters and aggregations

7.2. Metrics aggregations

7.2.1. Statistics

7.2.2. Advanced statistics

7.2.3. Approximate statistics

7.2.4. Terms aggregations

7.2.5. Range aggregations

7.2.6. Histogram aggregations

7.3. Nesting aggregations

7.3.1. Nesting multi-bucket aggregations

7.3.2. Nesting aggregations to get result grouping

7.3.3. Using single-bucket aggregations

7.4. Summary

8. Relations among documents

8.1. Overview of options for defining relationships among documents

8.1.1. Object type

8.1.2. Nested type

8.1.3. Parent-child relationships

8.1.4. Denormalizing

8.2. Having objects as field values

8.2.1. Mapping and indexing objects

8.2.2. Searching in objects

8.3. Nested type: connecting nested documents

8.3.1. Mapping and indexing nested documents

8.3.2. Searches and aggregations on nested documents

8.4. Parent-child relationships: connecting separate documents

8.4.1. Indexing, updating, and deleting child documents

8.4.2. Searching in parent and child documents

8.5. Denormalizing: using redundant data connections

8.5.1. Use cases for denormalizing

8.5.2. Indexing, updating, and deleting denormalized data

8.5.3. Querying denormalized data

8.6. Application-side joins

8.7. Summary


9. Scaling out

9.1. Adding nodes to your Elasticsearch cluster

9.1.1. Adding nodes to your cluster

9.2. Discovering other Elasticsearch nodes

9.2.1. Multicast discovery

9.2.2. Unicast discovery

9.2.3. Electing a master node and detecting faults

9.2.4. Fault detection

9.3. Removing nodes from a cluster

9.3.1. Decommissioning nodes

9.4. Upgrading Elasticsearch nodes

9.4.1. Performing a rolling restart

9.4.2. Minimizing recovery time for a restart

9.5. Using the _cat API

9.6. Scaling strategies

9.6.1. Over-sharding

9.6.2. Splitting data into indices and shards

9.6.3. Maximizing throughput

9.7. Aliases

9.7.1. What is an alias, really?

9.7.2. Alias creation

9.8. Routing

9.8.1. Why use routing?

9.8.2. Routing strategies

9.8.3. Using the _search_shards API to determine where a search is performed

9.8.4. Configuring routing

9.8.5. Combining routing with aliases

9.9. Summary

10. Improving performance

10.1. Grouping requests

10.1.1. Bulk indexing, updating, and deleting

10.1.2. Multisearch and multiget APIs

10.2. Optimizing the handling of Lucene segments

10.2.1. Refresh and flush thresholds

10.2.2. Merges and merge policies

10.2.3. Store and store throttling

10.3. Making the best use of caches

10.3.1. Filters and filter caches

10.3.2. Shard query cache

10.3.3. JVM heap and OS caches

10.3.4. Keeping caches up with warmers

10.4. Other performance trade-offs

10.4.1. Big indices or expensive searches

10.4.2. Tuning scripts or not using them at all

10.4.3. Trading network trips for less data and better distributed scoring

10.4.4. Trading memory for better deep paging

10.5. Summary

11. Administering your cluster

11.1. Improving defaults

11.1.1. Index templates

11.1.2. Default mappings

11.2. Allocation Awareness

11.2.1. Shard-based Allocation

11.2.2. Forced Allocation Awareness

11.3. Monitoring for bottlenecks

11.3.1. Checking cluster health

11.3.2. CPU: slow logs, hot threads, and thread pools

11.3.3. Memory: heap size, field, and filter caches

11.3.4. OS caches

11.3.5. Store throttling

11.4. Backing up your data

11.4.1. Snapshot API

11.4.2. Backing up data to a shared file system

11.4.3. Restoring from backups

11.4.4. Using repository plugins

11.5. Summary


Appendix A: Working with geospatial data

A.1. Points and distances between them

A.2. Adding distance to your sort criteria

A.2.1. Sorting by distance and other criteria at the same time by using scripts

A.3. Filter based on distance

A.4. Does a point belong to a shape?

A.4.1. Bounding boxes

A.4.2. Geohashes

A.5. Shape intersections

A.5.1. Indexing shapes

A.5.2. Filtering overlapping shapes

Appendix B: Plugins

B.1. Working with plugins

B.2. Installing Plugins

B.3. Accessing plugins

B.4. Telling Elasticsearch to require certain plugins

B.5. Removing or updating plugins

Appendix C: Highlighting

C.1. Highlighting basics

C.1.1. What should be passed on to the user

C.1.2. Too many fields contain highlighted terms

C.2. Highlighting options

C.2.1. Size, order and number of fragments

C.2.2. Highlighting tags and fragment encoding

C.2.3. Highlight query

C.3. Highlighter implementations

C.3.1. Postings Highlighter

C.3.2. Fast Vector Highlighter

Appendix D: Elasticsearch monitoring plugins

D.1. BigDesk: visualize your cluster

D.2. ElasticHQ: monitoring with management

D.3. Head: advanced query building

D.4. Kopf: snapshots, warmers, and percolators

D.5. Marvel: fine-grained analysis

D.6. Sematext SPM: the Swiss Army knife

Appendix E: Turning search upside down with the percolator

E.1. Percolator basics

E.1.1. Define a mapping, register queries, then percolate documents

E.1.2. Percolator under the hood

E.2. Performance tips

E.2.1. Options for requests and replies

E.2.2. Separating and filtering percolator queries

E.3. Functionality tricks

E.3.1. Highlighting percolated documents

E.3.2. Ranking matching queries

E.3.3. Aggregations on matching query metadata

Appendix F: Using Suggesters for Auto complete and Did-You-Mean

F.1. Did-You-Mean Suggesters

F.1.1. Term suggester

F.1.2. Phrase suggester

F.2. Autocomplete suggesters

F.2.1. Completion Suggester

F.2.2. Context Suggester

About the Technology

Modern search seems like magic—you type a few words and the search engine appears to know what you want. With the Elasticsearch real-time search and analytics engine, you can give your users this magical experience without having to do complex low-level programming or understand advanced data science algorithms. You just install it, tweak it, and get on with your work.

About the book

Elasticsearch in Action teaches you how to write applications that deliver professional quality search. As you read, you’ll learn to add basic search features to any application, enhance search results with predictive analysis and relevancy ranking, and use saved data from prior searches to give users a custom experience. This practical book focuses on Elasticsearch’s REST API via HTTP. Code snippets are written mostly in bash using cURL, so they’re easily translatable to other languages.

What's inside

  • What is a great search application?
  • Building scalable search solutions
  • Using Elasticsearch with any language
  • Configuration and tuning

About the reader

This book is for developers and administrators building and managing search-oriented applications.

About the authors

Radu Gheorghe is a search consultant and software engineer. Matthew Lee Hinman develops highly available, cloud-based systems. Roy Russo is a specialist in predictive analytics.

combo 44,99 USD pBook + eBook
eBook 35,99 USD pdf + ePub + kindle

FREE domestic shipping on three or more pBooks

An indispensable guide to the challenges of search of semi-structured data.

Artur Nowak, Evidence Prime

The best resource for a complex topic. Highly recommended.

Daniel Beck, juris GmbH

Took me from confused to confident in a week.

Alan McCann, Givsum.com