Deep Learning for Vision Systems
Mohamed Elgendy
  • October 2020
  • ISBN 9781617296192
  • 480 pages
  • printed in black & white
ePub + Kindle available Nov 2, 2020

From text and object detection to DeepDream and facial recognition...this book is comprehensive, approachable, and relevant for modern applications of deep learning to computer vision systems!

Bojan Djurkovic, DigitalOcean
Computer vision is central to many leading-edge innovations, including self-driving cars, drones, augmented reality, facial recognition, and much, much more. Amazing new computer vision applications are developed every day, thanks to rapid advances in AI and deep learning (DL). Deep Learning for Vision Systems teaches you the concepts and tools for building intelligent, scalable computer vision systems that can identify and react to objects in images, videos, and real life. With author Mohamed Elgendy's expert instruction and illustration of real-world projects, you’ll finally grok state-of-the-art deep learning techniques, so you can build, contribute to, and lead in the exciting realm of computer vision!

About the Technology

How much has computer vision advanced? One ride in a Tesla is the only answer you’ll need. Deep learning techniques have led to exciting breakthroughs in facial recognition, interactive simulations, and medical imaging, but nothing beats seeing a car respond to real-world stimuli while speeding down the highway.

About the book

How does the computer learn to understand what it sees? Deep Learning for Vision Systems answers that by applying deep learning to computer vision. Using only high school algebra, this book illuminates the concepts behind visual intuition. You'll understand how to use deep learning architectures to build vision system applications for image generation and facial recognition.
Table of Contents detailed table of contents

Part 1: Deep learning foundation

1 Welcome to Computer Vision

1.1 Computer vision intuition

1.1.1 What is visual perception?

1.1.2 Vision systems

1.1.3 Human vision systems

1.1.4 AI vision systems

1.1.5 Sensing device

1.1.6 Interpreting device

1.1.7 Can machine learning achieve better performance than human brain?

1.2 Applications of computer vision

1.2.1 Image classification

1.2.2 Object detection and localization

1.2.3 Automatic image captioning

1.2.4 Generate art (Style Transfer)

1.2.5 Create images

1.2.6 Conclusion

1.3 Computer Vision Pipeline - The big picture

1.4 Input image

1.4.1 Image as functions

1.4.2 How computers see images?

1.4.3 Color images

1.5 Image preprocessing

1.5.1 What is image processing?

1.5.2 Why image preprocessing?

1.6 Feature extraction

1.6.1 What is a feature in computer vision?

1.6.2 What makes a good (useful) feature?

1.6.3 Extracting features (hand-craft vs automatic extracting)

1.6.4 Traditional machine learning uses hand-crafted features

1.6.5 Deep learning automatically extract features

1.6.6 Why use features?

1.7 Classifier learning algorithm

1.8 Chapter summary and takeaways

2 Deep learning and neural networks

2.1 The Perceptron intuition

2.1.1 What is a perceptron?

2.1.2 How does the perceptron learn?

2.1.3 Is one neuron enough to solve complex problems?

2.2 Multi-Layer Perceptron (MLP)

2.2.1 Multi-Layer Perceptron Architecture

2.2.2 What are the Hidden Layers?

2.2.3 How many layers and how many nodes in each layer?

2.2.4 MLP Takeaways

2.3 Activation functions

2.3.1 Linear Transfer Function

2.3.2 Heaviside Step Function (Binary classifier)

2.3.3 Sigmoid/Logistic function

2.3.4 Softmax Function

2.3.5 Hyperbolic Tangent Function (tanh)

2.3.6 Rectified Linear Unit (ReLU)

2.3.7 Leaky ReLU

2.4 Feedforward

2.4.1 Feedforward calculations

2.4.2 Feature learning

2.5 Error functions

2.5.1 What is the error function?

2.5.2 Why do we need an error function?

2.5.3 Error is always positive

2.5.4 Mean Square Error (MSE)

2.5.5 Cross Entropy

2.5.6 A final note on errors and weights

2.6 Optimization algorithms

2.6.1 What is Optimization?

2.6.2 Batch Gradient Descent (BGD)

2.6.3 Stochastic Gradient Descent (SGD)

2.6.4 Mini-batch Gradient Descent (MN-GD)

2.6.5 Gradient descent takeaways

2.7 Backpropagation

2.7.1 What is backpropagation?

2.7.2 Backpropagation takeaways

2.8 Chapter summary and takeaways

2.9 Project: Build Your first Neural Network

3 Convolutional Neural Networks (CNNs)

3.1 Image classification using MLP

3.1.1 Input layer

3.1.2 Hidden Layers

3.1.3 Output Layer

3.1.4 Putting it all together

3.1.5 Drawbacks of MLPs in processing images

3.1.6 Conclusion

3.2 CNNs Architecture

3.2.1 The big picture

3.2.2 A closer look on feature extraction

3.2.3 A closer look on classification

3.3 Basic components of the CNN

3.3.1 Convolutional layers (CONV)

3.3.2 Pooling layers or subsampling (POOL)

3.3.3 Why use a pooling layer?

3.3.4 Fully connected layers (FC)

3.4 Image classification using CNNs

3.4.1 Build the model architecture

3.4.2 Model summary

3.4.3 Number of parameters (weights)

3.5 Add Dropout layers to avoid overfitting

3.5.1 What is overfitting?

3.5.2 What is a dropout layer?

3.5.3 Why do we need dropout layers?

3.5.4 Where does dropout layer go in the CNN architecture?

3.6 Convolution over colored images (3D images)

3.7 Chapter summary and takeaways

3.8 Project: Image classification for colored images (CIFAR-10 dataset)

3.8.1 Load the dataset

3.8.2 Image preprocessing

3.8.3 Define the model architecture

3.8.4 Compile the model

3.8.5 Train the model

3.8.6 Load the model with the best val_acc

3.8.7 Evaluate the model

4 Structuring Deep Learning Projects and Hyperparameters tuning

4.1 Define the performance metrics

4.1.1 Is accuracy the best metric to evaluate the model?

4.1.2 Confusion matrix

4.1.3 Precision and Recall

4.1.4 F-Score

4.2 Design a baseline model

4.3 Get your data ready for training

4.3.1 Split your data into train/validation/test datasets

4.3.2 Data preprocessing

4.4 Evaluate the model and interpret its performance (error analysis)

4.4.1 Diagnose for overfitting and underfitting

4.4.2 Plot the learning curves

4.4.3 Exercise: build, train, evaluate a simple network

4.5 Improve the network and tune hyperparameters

4.5.1 When to collect more data vs tuning hyperparameters?

4.5.2 Parameters vs. hyperparameters

4.5.3 Neural networks hyperparameters

4.5.4 Network architecture

4.5.5 Learning and optimization

4.5.6 Regularization techniques to avoid overfitting

4.6 Batch normalization (BN)

4.6.1 The covariance shift problem

4.6.2 Covariance shift in neural networks

4.6.3 How does batch normalization work?

4.6.4 Batch normalization implementation in Keras

4.6.5 Batch normalization recap

4.7 Chapter summary and takeaways

4.7.1 Final thoughts

4.7.2 Tips on hyperparameters tuning

4.7.3 Deep learning foundation takeaways

4.7.4 What should I do next?

4.7.5 An advice from the author

4.8 Project: Achieve >90% accuracy on the CIFAR-10 image classification project

4.8.1 Import dependencies

4.8.2 Get the data ready for training

4.8.3 Build the model architecture

4.8.4 Train the model

4.8.5 Evaluate the model

4.8.6 Plot learning curves

4.8.7 Further improvements

Part 2: Image Classification and Object Detection

5 Advanced CNN Architectures

5.1 CNN design patterns

5.1.1 Pattern #1

5.1.2 Pattern #2

5.1.3 Pattern #3

5.2 LeNet-5

5.2.1 LeNet architecture

5.2.2 LeNet-5 implementation in Keras

5.2.3 Set up the learning hyperparameters

5.2.4 LeNet performance on MNIST dataset

5.3 AlexNet

5.3.1 AlexNet architecture

5.3.2 Novel features of AlexNet

5.3.3 AlexNet implementation in Keras

5.3.4 Set up the learning hyperparameters

5.3.5 AlexNet performance on CIFAR dataset

5.4 VGGNet

5.4.1 Novel features of VGGNet

5.4.2 4.2. VGGNet Configurations

5.4.3 VGG-16 in Keras

5.4.4 Learning hyperparameters

5.4.5 VGGNet performance on CIFAR dataset

5.5 Inception and GoogLeNet

5.5.1 Novel features of Inception

5.5.2 Inception module - naive version

5.5.3 Inception module with dimensionality reduction

5.5.4 Inception architecture

5.5.5 GoogleNet in Keras

5.5.6 Learning hyperparameters

5.5.7 Inception performance on CIFAR dataset

5.6 ResNet

5.6.1 Novel features of ResNet

5.6.2 Residual blocks

5.6.3 ResNet implementation in Keras

5.6.4 Learning hyperparameters

5.6.5 ResNet performance on CIFAR dataset

5.7 Summary and takeaways

6 Transfer Learning

6.1 What are the problems that transfer learning is solving?

6.2 What is transfer learning?

6.3 How transfer learning works

6.3.1 How do neural networks learn features?

6.3.2 What about the transferability of features extracted at later layers in the network?

6.4 Transfer learning approaches

6.4.1 Pretrained network as a classifier

6.4.2 Pretrained network as a feature extractor

6.4.3 Fine-tuning

6.5 Choose the appropriate level of transfer learning

6.5.1 Scenario #1: target dataset is small and similar to source dataset

6.5.2 Scenario #2: target dataset is large and similar to the source dataset

6.5.3 Scenario #3: target dataset is small and different from the source dataset

6.5.4 Scenario #4: target dataset is large and different from the source dataset

6.5.5 Recap of the transfer learning scenarios

6.6 Open-source datasets

6.6.1 MNIST

6.6.2 Fashion-MNIST

6.6.3 CIFAR-10

6.6.4 ImageNet

6.6.5 Microsoft’s COCO

6.6.6 Google’s Open Images

6.6.7 Kaggle

6.7 Chapter summary and takeaways

6.8 Project 1: A pretrained network as a feature extractor

6.9 Project 2: Fine tuning

7 Object Detection with R-CNN, SSD, and YOLO

7.1 General object detection framework

7.1.1 Region proposals

7.1.2 Network predictions

7.1.3 Non-maximum suppression (NMS)

7.1.4 Object detector evaluation metrics

7.2 Region-Based Convolutional Neural Networks (R-CNNs)

7.2.1 R-CNN

7.2.. Fast R-CNN

7.2.1 Faster R-CNN

7.2.2 Recap of the RCNN family

7.3 Single Shot Detection (SSD)

7.3.1 High level SSD architecture

7.3.2 Base network

7.3.3 Multi-scale feature layers

7.3.4 Non-maximum Suppression

7.4 You Only Look Once (YOLO)

7.4.1 How YOLOv3 works

7.4.1 YOLOv3 Architecture

7.5 Chapter summary and takeaways

Part 3: Generative Models and Visual Embeddings

8 Generative Adversarial Networks (GANs)

8.1 GANs Architecture

8.1.1 The Discriminator Model

8.1.2 The Generator Model

8.1.3 Training the GAN

8.1.4 GAN Minimax Function

8.2 Evaluate GAN models

8.2.1 Inception score

8.2.2 Fréchet Inception Distance (FID)

8.2.3 Which evaluation scheme to use?

8.3.1 Text-to-Photo Synthesis

8.3.2 Image-to-image translation (Pix2Pix GAN)

8.3.3 Image Super-Resolution GAN (SRGAN)

8.3.4 Ready to get your hands dirty?

8.4 Building your own GAN project

8.4.1 Import libraries

8.4.2 Download and visualize the dataset

8.4.3 Build the generator

8.4.4 Build the discriminator

8.4.5 Build the combined model

8.4.6 Build the training function

8.4.7 Train and observe results

8.4.8 Closing

8.5 Summary and takeaways

9 DeepDream and Neural Style Transfer

9.1 How convolutional neural networks see the world

9.1.1 Revisit how neural networks work

9.1.2 Visualize CNN features

9.1.3 Implement feature visualizer

9.1.4 Wrapping up

9.2 DeepDream

9.2.1 How does the DeepDream algorithm work?

9.2.2 DeepDream implementation in Keras

9.2.3 Wrapping up

9.3 Neural Style Transfer

9.3.1 The content loss

9.3.2 The style loss

9.3.3 Total variance loss

9.3.4 Network Training

9.3.5 Wrapping up

10 Visual embeddings

10.1 Applications of visual embeddings

10.1.1 Face recognition (FR)

10.1.2 Image recommendation systems

10.1.3 Object re-identification

10.2 Learning Embedding

10.3 Loss functions

10.3.1 Problem Setup and Formalization

10.3.2 Cross entropy loss

10.3.3 Contrastive Loss

10.3.4 Triplet Loss

10.4 Mining informative data

10.4.1 Dataloader

10.4.2 Informative data mining: Finding useful triplets

10.4.3 Batch All (BA)

10.4.4 Batch Hard (BH)

10.4.5 Batch Weighted (BW)

10.4.6 Batch sample (BS)

10.5 Project: Train an embedding network

10.5.1 Task: Fashion - get me items similar to this

10.5.2 Task 2: Vehicle re-identification

10.5.3 Implementation

10.6 Testing a trained model

10.6.1 Task 1: In-shop retrieval

10.6.2 Task 2: Vehicle Re-identification

10.7 Bonus: pushing the boundaries of current accuracy

10.8 Chapter summary and takeaways

10.9 References


Appendix A: Getting Set Up

A.1 Download code repository

A.2 Install Anaconda

A.3 Set up your deep learning environment

A.3.1 Set up your development environment

A.3.2 Save and load environments

A.4 Set up your AWS EC2 environment

A.4.1 Create an AWS account

A.4.2 Remotely connecting to your instance

A.4.3 Run Jupyter notebook

What's inside

  • Image classification and object detection
  • Advanced deep learning architectures
  • Transfer learning and generative adversarial networks
  • DeepDream and neural style transfer
  • Visual embeddings and image search

About the reader

For intermediate Python programmers.

About the author

Mohamed Elgendy is the VP of Engineering at Rakuten. A seasoned AI expert, he has previously built and managed AI products at Amazon and Twilio.

placing your order...

Don't refresh or navigate away from the page.
print book $29.99 $49.99 pBook + eBook + liveBook
Additional shipping charges may apply
Deep Learning for Vision Systems (print book) added to cart
continue shopping
go to cart

eBook $31.99 $39.99 3 formats + liveBook
Deep Learning for Vision Systems (eBook) added to cart
continue shopping
go to cart

Prices displayed in rupees will be charged in USD when you check out.
customers also reading

This book 1-hop 2-hops 3-hops

FREE domestic shipping on three or more pBooks