Natural Language Processing for Hackers
Learn to build apps that can understand people
George-Bogdan Ivanov
  • June 2019
  • ISBN 9781617296567
  • 176 pages
pBook available soon
Natural Language Processing (NLP) is a collection of techniques to analyze, interpret, and create human-understandable text and speech. Advances in machine learning have pushed NLP to new levels of accuracy and uncanny realism. Natural Language Processing for Hackers lays out everything you need to crawl, clean, build, fine-tune, and deploy natural language models from scratch—all with easy-to-read Python code.

Distributed by Manning Publications

This book was created independently by AI expert George-Bogdan Ivanov and is distributed by Manning Publications.

About the Technology

Thanks to NLP, computers are capable of highly accurate text and speech-based interaction with humans. NLP capitalizes on powerful machine learning techniques that can detect patterns and extract meaning from human-generated text. As well as improving raw data processing, NLP technology is behind cutting edge UI developments such as chatbots and voice assistants that can process written and spoken commands and generate realistic and helpful responses.

About the book

Natural Language Processing for Hackers covers NLP end-to-end, giving you the skills and techniques that allow your computers to speak human. Unlike many research-oriented books that use the kind of clean datasets you would never find in the real world, this practical guide takes on NLP as you’ll actually use it. You’ll learn the key concepts of NLP by coding your own tools and projects, from a text analysis service right up to a full-featured chatbot. Everything is written in concise, easy-to-read Python code to ensure you’ll grok the most important aspects of Natural Language Processing. When you’re done, you will be able to apply the complete range of NLP techniques to build practical applications—even with messy real-world data.
Table of Contents detailed table of contents

Part 1: Introduction to NLTK

NLTK Fundamentals

Installing NLTK

Splitting Text

Building a vocabulary

Fun with Bigrams and Trigrams

Part Of Speech Tagging

Named Entity Recognition

Getting started with Wordnet

Wordnet Structure

Lemma Operations

Lemmatizing and Stemming

How stemmers work

How lemmatizers work

Part 2: Create a Text Analysis service

Introduction to Machine Learning

A Practical Machine Learning Example

Getting Started with Scikit-Learn

Installing Scikit-Learn and building a dataset

Training a Scikit-Learn Model

Making Predictions

Finding the data

Existing corpora

Ideas for Gathering Data

Getting the Data

Learning to Classify Text

Text Feature Extractor

Scikit-Learn Feature Extraction

Text Classification with Naive Bayes

Persisting models

Building the API

Building a Flask API

Deploy to Heroku

Part 3: Create a Social Media Monitoring Service

Basics of Sentiment Analysis

Be Aware of Negations

Machine Learning doesn’t get Humour

Multiple and Mixed Sentiments

Non-Verbal Communication

Twitter Sentiment Data

Twitter Corpora

Other Sentiment Analysis Corpora

Building a Tweets Dataset

Sentiment Analysis - A First Attempt

Better Tokenization

Fine Tuning

Try a different classifier

Use Ngrams Instead of Words

Using a Pipeline

Cross Validation

Picking the Best Tokenizer

Building the Twitter Listener

Classification Metrics

Binary Classification

Multi-Class Metrics

The Confusion Matrix

Part 4: Build Your Own NLP Toolkit

Build Your Own Part -Of-Speech Tagger

Part-Of-Speech Corpora

Building Toy Models

About Feature Extraction

Using the NLTK Base Classes

Writing the Feature Extractor

Training the Tagger

Out-Of-Core Learning

Build a Chunker

IOB Tagging

Implementing the Chunk Parser

Chunker Feature Detection

Build a Named Entity Extractor

NER Corpora

The Groningen Meaning Bank Corpus

Feature Detection

NER Training

Build a Dependency Parser

Understanding the Problem

Step 0

Step 1 - LEFT-ARC

Step 2 - SHIFT

Step 3 - SHIFT

Step 4 - LEFT-ARC

Step 5 - SHIFT

Step 6 - RIGHT-ARC

Step 7 - LEFT-ARC (draw an arc from the ROOT node to the remaining node)

Greedy Transition-Based Parsing

Dependency Dataset

Writing the Dependency Parser Class

Adding Labels to the Parser

Learning to Label Dependencies

Training our Labelled Dependency Parser

Part 5: Build Your Own Chatbot Engine

General Architecture

Train the Platform via Examples

Action Handlers

Building the Core

Chatbot Base Class and Training Set

Training the Chatbot

Everything together


The Movie DB API

Small-Talk Handlers

Simple Handlers

Execution Handlers

MovieBot on Facebook

Installing ngrok

Setting up Facebook

Trying it Out

What Next?

What's inside

  • Constructing your own Text Analysis engine
  • Building a Twitter listener that performs Sentiment Analysis on a certain subject
  • Assembling your own NLP toolbox, complete with Part Of Speech Tagger, Shallow Parser, Named Entity Extractor, and Dependency Parsers
  • Cleaning and standardising messy datasets

About the reader

This book requires familiarity with Python, but no prior knowledge of natural language processing or machine learning is necessary.

About the author

George-Bogdan Ivanov is a software engineer with an MSc. in AI and a soft spot for applying Natural Language Processing to practical problems. He loves being part of and building startups and small businesses in general. Among other projects he is the creator of, a Natural Language Processing API for the Romanian Language.

placing your order...

Don't refresh or navigate away from the page.
eBook $19.99 $39.99 3 formats
Natural Language Processing for Hackers (eBook) added to cart
continue shopping
go to cart

Prices displayed in rupees will be charged in USD when you check out.

FREE domestic shipping on three or more pBooks