Deep Learning with Structured Data
Mark Ryan
  • MEAP began August 2019
  • Publication in December 2020 (estimated)
  • ISBN 9781617296727
  • 273 pages (estimated)
  • printed in black & white

An excellent companion towards the journey of mastering deep learning. A must read book for MS and PhD students who wish to apply deep learning in their research and development projects.

Irfan Ullah
Deep learning offers the potential to identify complex patterns and relationships hidden in data of all sorts. Deep Learning with Structured Data shows you how to apply powerful deep learning analysis techniques to the kind of structured, tabular data you'll find in the relational databases that real-world businesses depend on. Filled with practical, relevant applications, this book teaches you how deep learning can augment your existing machine learning and business intelligence systems.

About the Technology

Most businesses are far more interested in accurate forecasting and fraud detection using their existing structured datasets than identifying cats in YouTube videos. Powerful deep learning techniques can efficiently extract insight from the kind of structured data collected by most businesses and organisations. Deep learning demands less feature tuning than other machine learning methods, takes less code to maintain, and can be automated to crawl your business’s databases in order to detect unanticipated patterns a human would never even notice. Thanks to the availability of cloud environments adapted to deep learning and to recent improvements in deep learning frameworks, deep learning is now a viable approach to solving problems with structured data.

About the book

Deep Learning with Structured Data shows you how to bring powerful deep learning techniques to your business’s structured data to predict trends and unlock hidden insights. In it, deep learning advocate Mark Ryan takes you through cleaning and preparing structured data for deep learning. You’ll learn the architecture of a Keras deep learning model, along with techniques for training, deploying, and maintaining your model. You’ll discover ways to get quick wins that can rapidly show whether your models are working, and techniques for monitoring your model’s ongoing functionality. Throughout, an end-to-end example using an open source transit delay dataset illustrates deep learning’s potential for unraveling problems and making predictions from large volumes of structured data.
Table of Contents detailed table of contents

1 Why Deep Learning with Structured Data?

1.1 Overview of deep learning

1.2 Benefits and drawbacks of deep learning

1.3 Overview of the deep learning stack

1.4 Structured vs. unstructured data

1.5 Objections to deep learning with structured data

1.6 Why investigate deep learning with a structured data problem?

1.7 An overview of the code accompanying this book

1.8 What you need to know

1.9 Summary

2 Introduction to the Example Problem and Pandas dataframes

2.1 Development environment options for deep learning

2.2 Code for exploring Pandas

2.3 Pandas dataframes in Python

2.4 Ingesting CSV files into Pandas dataframes

2.5 Using Pandas to do what you would do with SQL

2.6 The major example: predicting streetcar delays

2.7 Why is a real-world dataset critical for learning about deep learning?

2.8 Format and scope of the input dataset

2.9 The destination: an end-to-end solution

2.10 More details on the code that makes up the end-to-end solutions

2.11 Development environments: vanilla vs deep learning enabled

2.12 A deeper look at the objections to deep learning

2.13 How has deep learning become more accessible

2.14 A First Taste of Training a Deep Learning Model

2.15 Summary

3 Preparing the Data Part 1: Exploring and cleansing the data

3.1 Code for exploring and cleansing the data

3.2 Using config files with Python

3.3 Ingesting XLS files into a Pandas dataframe

3.4 Using pickle to save your Pandas dataframe from one session to another

3.5 Exploring the data

3.6 Categorizing data into continuous, categorical and text categories

3.7 Problems in the dataset: missing data, errors, and guesses

3.8 How much data does deep learning need?

3.9 Summary

4 Preparing the Data Part 2: Transforming the Data

4.1 Code for preparing and transforming the data

4.2 Dealing with incorrect value: Routes

4.3 Why only one substitute for all bad values?

4.4 Dealing with incorrect values: Vehicles

4.5 Dealing with inconsistent values: Location

4.6 Locations: going the distance

4.7 Fixing type mismatches

4.8 Dealing with rows that still contain bad data

4.9 Creating derived columns

4.10 Preparing non-numeric data to train a deep learning model

4.11 Overview of the end-to-end solution

4.12 Summary

5 Preparing and Building the Model

5.1 Data leakage and features that are fair game for training the model

5.2 Domain expertise and minimal scoring tests to prevent data leakage

5.3 Preventing data leakage in the streetcar delay prediction problem

5.4 Code for exploring Keras and building the model

5.5 Deriving the dataframe we will use to train the model

5.6 Transforming the dataframe into the format expected by the Keras model

5.7 A brief history of Keras and TensorFlow

5.8 Migrating from TensorFlow 1.x to TensorFlow 2

5.9 TensorFlow vs. PyTorch

5.10 The structure of a deep learning model in Keras

5.11 How the data structure defines the Keras model

5.12 The power of embeddings

5.13 Code to build a Keras model automatically based on the data structure

5.14 Exploring your model

5.15 Model parameters

5.16 Summary

6 Training the Model and Running Experiments

6.1 Code for training the deep learning model

6.2 Reviewing the process of training a deep learning model

6.3 Reviewing the overall goal of the streetcar delay prediction model

6.4 Selecting the train, validation and test datasets

6.5 Initial training run

6.6 Measuring the performance of your model

6.7 Keras callbacks – getting the best out of your training runs

6.8 Getting identical results from multiple training runs

6.9 Shortcuts to scoring

6.10 Explicitly saving trained models

6.11 Running a series of training experiments

6.12 Summary

7 More Experiments with the Trained Model

7.1 Code for more experiments with the model

7.2 Validating whether removing bad values improves the model

7.3 Validating whether embeddings for columns improve the performance of the model

7.4 Comparing the deep learning model with XGBoost

7.5 Possible next steps for improving the deep learning model

7.6 Summary

8 Deploying the Model

8.1 Overview of model deployment

8.2 If deployment is so important, why is it so hard?

8.3 Review of one-off scoring

8.4 The user experience with web deployment

8.5 Steps to deploy your model with web deployment

8.6 Behind the scenes with web deployment

8.7 The user experience with Facebook Messenger deployment

8.8 Behind the scenes with Facebook Messenger deployment

8.9 More background on Rasa

8.10 Steps to deploy your model in Facebook Messenger with Rasa

8.11 Introduction to pipelines

8.12 Defining pipelines in the model training phase

8.13 Applying pipelines in the scoring phase

8.14 Maintaining a model after deployment

8.15 Summary

9 Recommended Next Steps

9.1 Reviewing what we have covered so far

9.2 What we could do next with the streetcar delay prediction project

9.3 Adding location details to the streetcar delay prediction project

9.4 Training our deep learning model with weather data

9.5 Adding season or time of day to the streetcar delay prediction project

9.6 Imputation – an alternative to removing records with bad values

9.7 Making the web deployment of the streetcar delay prediction model generally available

9.8 Adapting the streetcar delay prediction model to an entirely new dataset: overview

9.9 Adapting the streetcar delay prediction model to an entirely new dataset: preparing the dataset and training the model

9.10 Adapting the streetcar delay prediction model to an entirely new dataset: deploying the model with web deployment

9.11 Adapting the streetcar delay prediction model to an entirely new dataset: deploying the model with Facebook Messenger

9.12 An example of adapting the approach in this book to an entirely different dataset

9.13 Resources for additional learning

9.14 Summary


Appendix A: Using Google Colaboratory

A.1 Introduction to Google Colaboratory (Colab)

A.2 Making Google Drive available in your Colab session

A.3 Making the Repo Available in Colab and Running Notebooks

A.4 Pros and Cons of Colab and Paperspace

A.5 Summary

What's inside

  • The benefits and drawbacks of deep learning
  • Organizing data for your deep learning model
  • The deep learning stack
  • Measuring performance of your models

About the reader

For readers with an intermediate knowledge of Python, Jupyter notebooks, and machine learning.

About the author

Mark Ryan has 20 years of experience leading teams delivering IBM’s premier relational database product. He holds a Master's degree in Computer Science from the University of Toronto.

placing your order...

Don't refresh or navigate away from the page.
Manning Early Access Program (MEAP) Read chapters as they are written, get the finished eBook as soon as it’s ready, and receive the pBook long before it's in bookstores.
print book $29.99 $59.99 pBook + eBook + liveBook
Additional shipping charges may apply
Deep Learning with Structured Data (print book) added to cart
continue shopping
go to cart

eBook $23.99 $47.99 3 formats + liveBook
Deep Learning with Structured Data (eBook) added to cart
continue shopping
go to cart

Prices displayed in rupees will be charged in USD when you check out.
customers also reading

This book 1-hop 2-hops 3-hops

FREE domestic shipping on three or more pBooks