Getting Started with Natural Language Processing
Ekaterina Kochmar
  • MEAP began October 2019
  • Publication in Early 2021 (estimated)
  • ISBN 9781617296765
  • 325 pages (estimated)
  • printed in black & white

Great content, tone, presentation, figures, and code. Pedagogical and thorough—friendly and engaging.

Erik Hansson
Getting Started with Natural Language Processing gives you everything you need to get started with NLP in a friendly, understandable tutorial. Full of Python code and hands-on projects, each chapter provides a concrete example with practical techniques that you can put into practice right away. If you’re a beginner to NLP and want to upgrade your applications with functions and features like information extraction, user profiling, and automatic topic labeling, this is the book for you.

About the Technology

Natural Language Processing is a set of data science techniques that enable machines to make sense of human text and speech. Advances in machine learning and deep learning have made NLP more efficient and reliable than ever, leading to a huge number of new tools and resources. From improving search applications to sentiment analysis, the possible applications of NLP are vast and growing.

About the book

Getting Started with Natural Language Processing is a hands-on guide to NLP with practical techniques you can put into action right away. By following the numerous Python-based examples and real-world case studies, you’ll apply NLP to search applications, extracting meaning from text, sentiment analysis, user profiling, and more. When you’re done, you’ll have a solid grounding in NLP that will serve as a foundation for further learning.
Table of Contents detailed table of contents

Part 1: First steps

1 Introduction

1.1 A brief history of NLP

1.2 Typical tasks

1.2.2 Advanced Information Search: Asking the machine precise questions

1.2.3 Conversational agents and Intelligent virtual assistants

1.2.4 Text prediction and Language generation

1.2.5 Spam filtering

1.2.6 Machine translation

1.2.7 Spell- and grammar checking

1.3 Summary

2 Your first NLP example

2.1 Introducing NLP in practice: spam filtering

2.2 Understanding the task

2.3 Implementing your own spam filter

2.3.1 Step 1: Define the data and classes

2.3.2 Step 2: Split the text into words

2.3.3 Step 3: Extract and normalize the features

2.3.4 Step 4: Train the classifier

2.3.5 Step 5: Evaluate your classifier

2.4 Deploying your spam filter in practice

2.5 Summary

Part 2: Practical NLP

3 Introduction to Information Search

3.1 Understanding the task

3.1.1 Data and data structures

3.1.2 Boolean search algorithm

3.2 Processing the data further

3.2.1 Preselecting the words that matter: stopwords removal

3.2.2 Matching forms of same word: morphological processing

3.3 Information weighing

3.3.1 Weighing words with term frequency

3.3.2 Weighing words with inverse document frequency

3.4 Practical use of the search algorithm

3.4.1 Retrieval of the most similar documents

3.4.2 Evaluation of the results

3.4.3 Deploying search algorithm in practice

3.5 Summary

4 Information Extraction

4.1 Use cases

4.2 Understanding the task

4.3 Detecting word types with part-of-speech tagging

4.3.1 Understanding word types

4.3.2 Part-of-speech tagging with spaCy

4.4 Understanding sentence structure with syntactic parsing

4.4.1 Why sentence structure is important

4.4.2 Dependency parsing with spaCy

4.5 Building your own Information Extraction algorithm

4.6 Summary

5 Author Profiling as a Machine Learning Task

5.1 Understanding the task

5.2 Machine Learning pipeline at a first glance

5.2.1 Original data

5.2.2 Testing generalization behavior

5.2.3 Setting up the benchmark

5.3 A closer look at the machine learning pipeline

5.3.1 Decision Trees classifier basics

5.3.2 Evaluating which tree is better using node impurity

5.3.3 Selection of the best split in Decision Trees

5.3.4 Decision Trees on language data

5.4 Summary

6 Linguistic Feature Engineering for Author Profiling

6.1 Another close look at the machine learning pipeline

6.1.1 Evaluating the performance of your classifier

6.1.2 Further evaluation measures

6.2 Feature engineering for authorship attribution

6.2.1 Word and sentence length statistics as features

6.2.2 Counts of stopwords and proportion of stopwords as features

6.2.3 Distributions of parts-of-speech as features

6.2.4 Distribution of word suffixes as features

6.2.5 Unique words as features

6.3 Practical use of authorship attribution and user profiling

6.4 Summary

7 Your First Sentiment Analyzer using Sentiment Lexicons

7.1. Use cases

7.2. Understanding your task

7.2.1. Aggregating sentiment score with the help of a lexicon

7.2.2. Learning to detect sentiment in a data-driven way

7.3. Setting up the pipeline: data loading and analysis

7.3.1. Data loading and preprocessing

7.3.2. A closer look into the data

7.4. Aggregating sentiment scores with a sentiment lexicon

7.4.1. Collecting sentiment scores from a lexicon

7.4.2. Applying sentiment scores to detect review polarity

7.5. Summary

8 Sentiment Analysis with a Data-Driven Approach

8.1 Addressing multiple senses of a word with SentiWordNet

8.2 Addressing dependence on context with machine learning

8.2.1 Data preparation

8.2.2 Extracting features from text

8.2.3 Sklearn’s machine learning Pipeline

8.2.4 Full scale evaluation with cross-validation

8.3 Varying the length of the sentiment-bearing features

8.4 Negation handling for sentiment analysis

8.5 Further practice

8.6 Summary

9 Topic Labeling

10 Named Entity Recognition

11 Summarization

Part 3: Next steps

12 Further guidance for an NLP practitioner

Appendixes: Reference guide to the essential building blocks

Appendix A: NLP essentials: core terminology

Appendix B: Machine Learning cheat sheet

Appendix C: Your essential toolset

What's inside

  • Extracting information from raw text
  • Named entity recognition
  • Automating summarization of key facts
  • Topic labeling

About the reader

For beginners to NLP with basic Python skills.

About the author

Ekaterina Kochmar is an Affiliated Lecturer and a Senior Research Associate at the Natural Language and Information Processing group of the Department of Computer Science and Technology, University of Cambridge. She holds an MA degree in Computational Linguistics, an MPhil in Advanced Computer Science, and a PhD in Natural Language Processing.

placing your order...

Don't refresh or navigate away from the page.
Manning Early Access Program (MEAP) Read chapters as they are written, get the finished eBook as soon as it’s ready, and receive the pBook long before it's in bookstores.
print book $27.99 $39.99 pBook + eBook + liveBook
Additional shipping charges may apply
Getting Started with Natural Language Processing (print book) added to cart
continue shopping
go to cart

eBook $22.39 $31.99 3 formats + liveBook
Getting Started with Natural Language Processing (eBook) added to cart
continue shopping
go to cart

Prices displayed in rupees will be charged in USD when you check out.

FREE domestic shipping on three or more pBooks