Natural Language Processing in Action
Understanding, analyzing, and generating text with Python
Hobson Lane, Cole Howard, Hannes Hapke
  • MEAP began April 2017
  • Publication in November 2018 (estimated)
  • ISBN 9781617294631
  • 420 pages (estimated)
  • printed in black & white

The best NLP book for practitioners around by a long stretch (And I've read them all!)

Franco Arda, Machine Learning Engineer/MBA at

Natural Language Processing in Action is your guide to creating machines that understand human language using the power of Python with its ecosystem of packages dedicated to NLP and AI! You'll start with a mental model of how a computer learns to read and interpret language. Then, you'll discover how to train a Python-based NLP machine to recognize patterns and extract information from text. As you explore the carefully-chosen examples, you'll expand your machine's knowledge and apply it to a range of challenges, from building a search engine that can find documents based on their meaning rather than merely keywords, to training a chatbot that uses deep learning to answer questions and participate in a conversation.

Table of Contents detailed table of contents


Part 1: Wordy Machines

1 Packets of thought (NLP overview)

1.1 Natural language vs. programming language

1.2 The magic

1.2.1 Machines that converse

1.2.2 The math

1.3 Practical applications

1.4 Language through a computer’s "eyes"

1.4.1 The language of locks (regular expressions)

1.4.2 A simple chatbot

1.4.3 Another way

1.5 A brief overflight of hyperspace

1.6 Word order and grammar

1.7 A chatbot natural language pipeline

1.8 Processing in depth

1.9 Natural language IQ

1.10 Summary

2 Build your vocabulary (word tokenization)

2.1 Challenges (a preview of stemming)

2.2 Building your vocabulary with a tokenizer

2.2.1 A token improvement

2.2.2 Extending your vocabulary with n-grams

2.2.3 Normalizing your vocabulary

2.3 Sentiment

2.3.1 VADER — A rule-based sentiment analyzer

2.3.2 Naive Bayes — a machine learning sentiment analyzer

2.4 Summary

3 Math with Words (TF-IDF Vectors)

3.1 Bag of Words

3.2 Vectorizing

3.2.1 Vector Spaces

3.3 Zipf’s Law

3.4 Topic Modeling

3.4.1 Return of Zipf

3.4.2 Relevance Ranking

3.4.3 Tools

3.4.4 Alternatives

3.4.5 Okapi BM25

3.5 Summary

4 Finding Meaning in Word Counts (Semantic Analysis)

4.1 From Word Counts to Topic Scores

4.1.1 TF-IDF Vectors and Lemmatization

4.1.2 Topic Vectors

4.1.3 Thought Experiment

4.1.4 An Algorithm for Scoring Topics

4.1.5 An LDA Classifier

4.2 Latent Semantic Analysis (LSA)

4.2.1 Our Thought Experiment Made Real

4.3 Singular Value Decomposition (SVD)

4.3.1 U — Left Singular Vectors

4.3.2 S — Singular Values

4.3.3 VT — Right Singular Vectors

4.3.4 SVD Matrix Orientation

4.3.5 Truncating the Topics

4.4 Principal Component Analysis (PCA)

4.4.1 PCA on 3D Vectors

4.4.2 Stop Horsing Around and Get Back to NLP

4.4.3 Using PCA for SMS Message Semantic Analysis

4.4.4 Using Truncated SVD for SMS Message Semantic Analysis

4.4.5 How well does LSA work for spam Classification?

4.5 Latent Dirichlet Allocation (LDiA)

4.5.1 The LDiA Idea

4.5.2 LDiA Topic Model for SMS Messages

4.5.3 LDiA + LDA = spam Classifier

4.5.4 A Fairer Comparison: 32 LDiA Topics

4.6 Distance and Similarity

4.7 Steering with Feedback

4.7.1 Linear Discriminant Analysis (LDA)

4.8 Topic Vector Power

4.9 Summary

Part 2: Deeper Learning (Neural Networks)

5 Baby Steps with Neural Networks (Perceptrons and Backpropagation)

5.1 Neural Networks, the Ingredient List

5.1.1 Perceptron

5.1.2 A Numerical Perceptron

5.1.3 Detour through Bias

5.1.4 A Pythonic Neuron

5.1.5 Class is in Session

5.1.6 Logic is a Fun Thing to Learn

5.1.7 Next Step

5.1.8 Emergence from the From the First AI Winter

5.1.9 Backpropagation

5.1.10 Derivative All the Things

5.1.11 Let’s Go Skiing - The Error Surface

5.1.12 Off the Chair Lift, Onto the Slope

5.1.13 Let’s Shake Things Up a Bit

5.1.14 Keras: Neural Networks in Python

5.1.15 Onward and Deepward

5.1.16 Normalization: Input with Style

5.2 Summary

6 Reasoning with Word Vectors (Word2vec)

6.1 Semantic Queries and Analogies

6.1.1 Analogy Questions

6.2 Word Vectors

6.2.1 Vector-Oriented Reasoning

6.2.2 How to compute the Word2Vec Representations?

6.2.3 How to use the gensim.word2vec module?

6.2.4 How to generate your own Word vector representations?

6.2.5 Word2vec vs GloVe (Global Vector)

6.2.6 fastText

6.2.7 Word2vec vs LSA

6.2.8 Visualizing Word Relationships

6.2.9 Unnatural Words

6.2.10 Document Similarity with Doc2vec

6.3 Summary

7 Getting Words in Order with Convolutional Neural Networks (CNNs)

7.1 Learning Meaning

7.1.1 Word Order

7.2 Toolkit

7.3 Convolutional Neural Nets

7.3.1 Building Blocks

7.3.2 Step Size

7.3.3 Filter Composition

7.3.4 Padding

7.3.5 Learning

7.4 Narrow Windows Indeed

7.4.1 Implementation in Keras: Prepping the Data

7.4.2 Convolutional Neural Network Architecture

7.4.3 Pooling

7.4.4 Dropout

7.4.5 The Cherry on the Sundae

7.4.6 Let’s Get to Learning (Training)

7.4.7 Using the Model in a Pipeline

7.4.8 Where Do We Go From Here?

7.5 Summary

8 Loopy (Recurrent) Neural Networks (RNNs)

8.1 Remembering with Recurrent Networks

8.1.1 Backpropagation Through Time

8.1.2 When Do We Update What?

8.1.3 Recap

8.1.4 There’s Always a Catch

8.1.5 Recurrent Neural Net with Keras

8.2 Putting Things Together

8.3 Let’s Get to Learning Our Past Selves

8.3.1 Hyperparameters

8.4 Predicting

8.4.1 Statefulness

8.4.2 Two Way Street

8.4.3 What is this thing?

8.5 Summary

9 Improving Retention with Long Short-Term Memory Networks (LSTMs)

9.1 LSTM

9.1.1 Backpropagation Through Time

9.1.2 In Practice

9.1.3 Where does the rubber hit the road?

9.1.4 Dirty Data

9.1.5 Back to Our Dirty Data

9.1.6 Words are hard. Letters are easier.

9.1.7 My Turn to Talk

9.1.8 My Turn to Speak More Clearly

9.1.9 Learned How to Say, but not yet What.

9.1.10 Other Kinds of Memory

9.1.11 Going Deeper

9.2 Summary

10 Sequence to Sequence Models and Attention (Generative Models)

10.1 Sequence-to-Sequence Networks

10.2 How are seq2seq networks implemented?

10.2.1 Preparing your dataset for the sequence-to-sequence training

10.2.2 Sequence-to-Sequence in Keras

10.2.3 The Sequence-to-Sequence Encoder

10.2.4 The Sequence-to-Sequence Decoder

10.2.5 Assembling the Sequence-to-Sequence Network

10.3 Training the Sequence-to-Sequence Network

10.3.1 Generate output sequences

10.4 Building a chatbot using seq2seq networks

10.4.1 Preparing the corpus for our training

10.4.2 Building our character dictionary

10.4.3 Generate one-hot encoded training sets

10.4.4 Train your sequence-to-sequence chatbot

10.4.5 Assemble the model for sequence generation

10.4.6 Predicting a sequence

10.4.7 Generating a response

10.4.8 Converse with your chatbot

10.5 Enhancements

10.5.1 Reduce Training Complexity by using Bucketing

10.5.2 Paying Attention

10.6 In the Real World

10.7 Summary

Part 3: Getting Real (Real World NLP Challenges)

11 Information Extraction (Named Entity Extraction and Question Answering)

11.1 Named Entities and Relations

11.1.1 A Knowledge Base

11.1.2 Information Extraction

11.2 Regular Patterns

11.2.1 Regular Expressions

11.2.2 Information Extraction as ML Feature Extraction

11.3 Information Worth Extracting

11.3.1 Extracting Numbers

11.3.2 GPS Locations

11.3.3 Dates

11.4 Extracting Relationships (Relations)

11.4.1 POS Tagging

11.4.2 Entity Name Normalization

11.4.3 Relation Normalization and Extraction

11.4.4 Word Patterns

11.4.5 Segmentation

11.4.6 split('.!?') Won’t Work

11.4.7 Sentence Segmentation with Regular Expressions

11.5 In the Real World

11.6 Summary

12 Getting Chatty (Dialog Engines)

12.1 Language Skill

12.1.1 Modern Approaches

12.1.2 A Hybrid Approach

12.2 1. Pattern Matching

12.2.1 A Pattern-Matching Chatbot with Artificial Intelligence Markup Language (AIML)

12.3 2. Grounding

12.4.1 The Context Challenge

12.4.2 Example Retrieval-Based Chatbot

12.4.3 A Search-based Chatbot

12.5 4. Generative Models

12.5.1 Pros and Cons of Each Approach

12.6 Four Wheel Drive

12.6.1 The Will to Succeed

12.7 Design Process

12.8 Trickery

12.8.1 Ask Questions With Predictable Answers

12.8.2 Be Entertaining/Likable

12.8.5 Be a Connector/Networker

12.8.6 Getting Emotional

12.9 In the Real World

12.10 Summary

13 Scaling Up (Optimization, Parallelization and Batch Processing)

13.1 Too Much of a Good Thing (Data)

13.2 Optimizing NLP Algorithms

13.2.1 Indexing

13.2.2 Advanced Indexing

13.2.3 Advanced Indexing with Annoy

13.2.4 Why Use Approximate Indexes at All?

13.2.5 An Indexing Workaround: Discretizing

13.3 Constant RAM Algorithms

13.3.1 Gensim

13.3.2 Graph Computing

13.4 Parallelizing Your NLP Computations

13.4.1 Training NLP models on GPUs

13.4.2 Rent vs. Buying

13.4.3 GPU Rental Options

13.4.4 Tensor Processing Units

13.5 Reducing the Memory Footprint during Model Training

13.6 Gaining Model Insights with TensorBoard

13.6.1 How to Visualize Word Embeddings

13.7 Summary


Appendix A: Your NLP tools

A.1 Anaconda3

A.2 Install NLPIA


A.4 Ubuntu package manager

A.5 Mac

A.5.1 A Mac package manager

A.5.2 Some packages

A.5.3 Tuneups

A.6 Windows

A.6.1 Get Virtual

A.7 NLPIA automagic

Appendix B: Playful Python and regular expressions

B.1 Working with strings

B.1.1 String types (str and bytes)

B.1.2 Templates in Python (.format())

B.2 Mapping in Python (dict and OrderedDict)

B.3 Regular expressions

B.3.1 | - "OR"

B.3.2 [ ] - Character classes

B.3 Mastery

Appendix C: Vectors and matrices (linear algebra fundamentals)

C.1 Vectors

C.1.1 Distances

Appendix D: Machine learning tools and techniques

D.1 Data selection (and avoiding bias)

D.2 How fit is fit?

D.3 Knowing is half the battle

D.4 Cross-fit training

D.5 Holding your model back

D.5.1 Regularization

D.5.2 Dropout

D.5.3 BatchNorm

D.6 Imbalanced training sets

D.6.1 Oversampling

D.6.2 Undersampling

D.6.3 Augmenting your data

D.7 Performance metrics

D.7.1 Classification

D.7.2 Regression

D.8 Pro tips

Appendix E: Resources

E.1 Applications and project ideas

E.2 Courses and tutorials

E.3 Research papers and talks

E.3.2 Finance

E.3.3 Question answering systems

E.3.4 Deep learning

E.3.5 LSTMs and RNNs

E.4 Competitions and awards

E.5 Datasets

E.6 Search engines

E.6.1 Search algorithms

E.6.2 Open source search engines

E.6.3 Open source full-text indexers

E.6.4 Manipulative search engines

E.6.5 Less manipulative search engines

E.6.6 Distributed search engines

Appendix F: Glossary

F.1 Acronyms

F.2 Terms

Appendix G: Setting up your AWS GPU

G.1 Steps to create your AWS GPU instance

G.1.1 Cost control

Appendix H: Locality sensitive hashing

H.1 High-dimensional vectors are different

H.2 "Like" prediction

About the Technology

Most humans are pretty good at reading and interpreting text; computers...not so much. Natural Language Processing (NLP) is the discipline of teaching computers to read more like people, and you see examples of it in everything from chatbots to the speech-recognition software on your phone. Modern NLP techniques based on machine learning radically improve the ability of software to recognize patterns, use context to infer meaning, and accurately discern intent from poorly-structured text. NLP promises to help you improve customer interactions, save cost, and reinvent text-intensive applications like search or product support.

What's inside

  • Working with Keras, TensorFlow, Gensim, scikit-learn, and more
  • Parsing and normalizing text
  • Rule-based (Grammar) NLP
  • Data-based (Machine Learning) NLP
  • Deep Learning NLP
  • End-to-end chatbot pipeline with training data
  • Scalable NLP pipelines
  • Hyperparameter optimization algorithms

About the reader

While all examples are written in Python, experience with any modern programming language will allow readers to get the most from this book. A basic understanding of machine learning will also be helpful.

About the authors

Hobson Lane has more than 15 years of experience building autonomous systems that make important decisions on behalf of humans. Hannes Hapke is an Electrical Engineer turned Data Scientist with experience in deep learning. Cole Howard is a carpenter and writer turned Deep Learning expert.

Manning Early Access Program (MEAP) Read chapters as they are written, get the finished eBook as soon as it’s ready, and receive the pBook long before it's in bookstores.
Natural Language Processing in Action (combo) added to cart
continue shopping
go to cart

MEAP combo $49.99 pBook + eBook + liveBook
Natural Language Processing in Action (eBook) added to cart
continue shopping
go to cart

MEAP eBook $39.99 pdf + ePub + kindle + liveBook

FREE domestic shipping on three or more pBooks