Grokking Deep Learning for Computer Vision
FREEYou can see any available part of this book for free.
Click the table of contents to start reading.
Computer vision is central to many leading-edge innovations, including self-driving cars, drones, augmented reality, facial recognition, and much, much more. Amazing new computer vision applications are developed every day, thanks to rapid advances in AI and deep learning (DL). Grokking Deep Learning for Computer Vision teaches you the concepts and tools for building intelligent, scalable computer vision systems that can identify and react to objects in images, videos, and real life. With author Mohamed Elgendy’s expert instruction and illustration of real-world projects, you’ll finally grok state-of-the-art deep learning techniques, so you can build, contribute to, and lead in the exciting realm of computer vision!
If you're in the market for a great book on deep learning for computer vision, I suggest you look no further. Recommended.
Table of Contents takes you straight to the bookdetailed table of contents
Part 1: Deep learning foundation
1 Welcome to Computer Vision
1.1 Computer vision intuition
1.1.1 What is visual perception?
1.1.2 Vision systems
1.1.3 Sensing device
1.1.4 Interpreting device
1.2 Applications of computer vision
1.2.1 Image classification
1.2.2 Object detection and localization
1.2.3 Automatic image captioning
1.2.4 Generate art (Style Transfer)
1.2.5 Create images
1.3 Computer Vision Pipeline - The big picture
1.4 Input image
1.4.1 Image as functions
1.4.2 How computers see images?
1.4.3 Color images
1.5 Image preprocessing
1.5.1 What is image processing?
1.5.2 Why image preprocessing?
1.6 Feature extraction
1.6.1 What is a feature in computer vision?
1.6.2 What makes a good (useful) feature?
1.6.3 Extracting features (hand-craft vs automatic extracting)
1.6.4 Why use features?
1.7 Classifier learning algorithm
1.8 Chapter summary and takeaways
2 Deep learning and neural networks
2.1 The Perceptron intuition
2.1.1 What is a perceptron?
2.1.2 How does the perceptron learn?
2.1.3 Is one neuron enough to solve complex problems?
2.2 Multi-Layer Perceptron (MLP)
2.2.1 Multi-Layer Perceptron Architecture
2.2.2 What are the Hidden Layers?
2.2.3 How many layers and how many nodes in each layer?
2.2.4 MLP Takeaways
2.3 Activation functions
2.3.1 Linear Transfer Function
2.3.2 Heaviside Step Function (Binary classifier)
2.3.3 Sigmoid/Logistic function
2.3.4 Softmax Function
2.3.5 Hyperbolic Tangent Function (tanh)
2.3.6 Rectified Linear Unit (ReLU)
2.3.7 Leaky ReLU
2.4.1 Feedforward calculations
2.4.2 Feature learning
2.5 Error functions
2.5.1 What is the error function?
2.5.2 Why do we need an error function?
2.5.3 Error is always positive
2.5.4 Mean Square Error (MSE)
2.5.5 Cross Entropy
2.5.6 A final note on errors and weights
2.6 Optimization algorithms
2.6.1 What is Optimization?
2.6.2 Batch Gradient Descent (BGD)
2.6.3 Stochastic Gradient Descent (SGD)
2.6.4 Mini-batch Gradient Descent (MN-GD)
2.6.5 Gradient descent takeaways
2.7.1 What is backpropagation?
2.7.2 Backpropagation takeaways
2.8 Chapter summary and takeaways
2.9 Project: Build Your first Neural Network
3 Convolutional Neural Networks (CNNs)
3.1 Image classification using MLP
3.1.1 Input layer
3.1.2 Hidden Layers
3.1.3 Output Layer
3.1.4 Putting it all together
3.1.5 Drawbacks of MLPs in processing images
3.2 CNNs Architecture
3.2.1 The big picture
3.2.2 A closer look on feature extraction
3.2.3 A closer look on classification
3.3 Basic components of the CNN
3.3.1 Convolutional layers (CONV)
3.3.2 Pooling layers or subsampling (POOL)
3.3.3 Why use a pooling layer?
3.3.4 Fully connected layers (FC)
3.4 Image classification using CNNs
3.4.1 Build the model architecture
3.4.2 Model summary
3.4.3 Number of parameters (weights)
3.5 Add Dropout layers to avoid overfitting
3.5.1 What is overfitting?
3.5.2 What is a dropout layer?
3.5.3 Why do we need dropout layers?
3.5.4 Where does dropout layer go in the CNN architecture?
3.6 Convolution over colored images (3D images)
3.7 Chapter summary and takeaways
3.8 Project: Image classification for colored images (CIFAR-10 dataset)
3.8.1 Load the dataset
3.8.2 Image preprocessing
3.8.3 Define the model architecture
3.8.4 Compile the model
3.8.5 Train the model
3.8.6 Load the model with the best val_acc
3.8.7 Evaluate the model
4 Structuring Deep Learning Projects and Hyperparameters tuning
4.1 Define the performance metrics
4.1.1 Is accuracy the best metric to evaluate the model?
4.1.2 Confusion matrix
4.1.3 Precision and Recall
4.2 Design a baseline model
4.3 Get your data ready for training
4.3.1 Split your data into train/validation/test datasets
4.3.2 Data preprocessing
4.4 Evaluate the model and interpret its performance (error analysis)
4.4.1 Diagnose for overfitting and underfitting
4.4.2 Plot the learning curves
4.4.3 Exercise: build, train, evaluate a simple network
4.5 Improve the network and tune hyperparameters
4.5.1 When to collect more data vs tuning hyperparameters?
4.5.2 Parameters vs. hyperparameters
4.5.3 Neural networks hyperparameters
4.5.4 Network architecture
4.5.5 Learning and optimization
4.5.6 Regularization techniques to avoid overfitting
4.6 Batch normalization (BN)
4.6.1 The covariance shift problem
4.6.2 Covariance shift in neural networks
4.6.3 How does batch normalization work?
4.6.4 Batch normalization implementation in Keras
4.6.5 Batch normalization recap
4.7 Chapter summary and takeaways
4.7.1 Final thoughts
4.7.2 Tips on hyperparameters tuning
4.7.3 Deep learning foundation takeaways
4.7.4 What should I do next?
4.7.5 An advice from the author
4.8 Project: Achieve >90% accuracy on the CIFAR-10 image classification project
4.8.1 Import dependencies
4.8.2 Get the data ready for training
4.8.3 Build the model architecture
4.8.4 Train the model
4.8.5 Evaluate the model
4.8.6 Plot learning curves
4.8.7 Further improvements
Part 2: Image Classification and Object Detection
5 Advanced CNN Architectures
5.1.1 LeNet architecture
5.1.2 LeNet-5 implementation in Keras
5.1.3 Set up the learning hyperparameters
5.1.4 LeNet performance on MNIST dataset
5.2.1 AlexNet architecture
5.2.2 Novel features of AlexNet
5.2.3 AlexNet implementation in Keras
5.2.4 Set up the learning hyperparameters
5.2.5 AlexNet performance on CIFAR dataset
5.3.1 Novel features of VGGNet
5.3.2 VGGNet Configurations
5.3.3 VGG-16 in Keras
5.3.4 Learning hyperparameters
5.3.5 VGGNet performance on CIFAR dataset
5.4 Inception and GoogLeNet
5.4.1 Novel features of Inception
5.4.2 Inception module - naive version
5.4.3 Inception module with dimensionality reduction
5.4.4 Inception architecture
5.4.5 GoogleNet in Keras
5.4.6 Learning hyperparameters
5.4.7 Inception performance on CIFAR dataset
5.5.1 Novel features of ResNet
5.5.2 Residual blocks
5.5.3 ResNet implementation in Keras
5.5.4 Learning hyperparameters
5.5.5 ResNet performance on CIFAR dataset
5.6 Summary and takeaways
6 Transfer Learning
7 Object detection with YOLO, SSD and R-CNN
Part 3: Advanced computer vision
8 Recurrent Neural Networks (RNNs)
9 Image captioning
10 Generative adversarial networks (GANs)
11 Style transfer
Part 4: Closing and last words
12 The Ethics of Artificial Intelligence
About the TechnologyBy using deep neural networks, AI systems make decisions based on their perceptions of their input data. Deep learning-based computer vision (CV) techniques, which enhance and interpret visual perceptions, makes tasks like image recognition, generation, and classification possible. Exciting advances in CV have led to solutions in a wide range of industries including robotics, automation, agriculture, healthcare, and security, just to name a few. In many cases, CV is deemed more accurate than human vision, which is an important distinction when you think about what that means for CV programs that can detect skin cancer or find anomalies in medical diagnostic scans. Whether we’re talking about self-driving cars or life-saving medical programs, there’s no denying that the application of deep learning for computer vision is changing the world.
About the bookGrokking Deep Learning for Computer Vision teaches you to apply deep learning techniques to solve real-world computer vision problems. In his straightforward and accessible style, DL and CV expert Mohamed Elgendy introduces you to the concept of visual intuition—how a machine learns to understand what it sees. Then you’ll explore the DL algorithms used in different CV applications. You’ll drill down into the different parts of the CV interpreting system, or pipeline. Using Python, OpenCV, Keras, Tensorflow, and Amazon’s MxNet, you’ll discover advanced DL techniques for solving CV problems.
Applications of focus include image classification, segmentation, captioning, and generation as well as face recognition and analysis. You’ll also cover the most important deep learning architectures including artificial neural networks (ANNs), convolutional networks (CNNs), and recurrent networks (RNNs), knowledge that you can apply to related deep learning disciplines like natural language processing and voice user interface. Real-life, scalable projects from Amazon, Google, and Facebook drive it all home. With this invaluable book, you’ll gain the essential skills for building amazing end-to-end CV projects that solve real-world problems.
- Introduction to computer vision
- Deep learning and neural networks
- Transfer learning and advanced CNN architectures
- Image classification and captioning
- Object detection with YOLO, SSD and R-CNN
- Style transfer
- AI ethics
- Real-world projects
About the readerFor readers with intermediate Python, math and machine learning skills. Experience with with the Matplotlib and Pandas machine learning libraries is helpful.
About the authorMohamed Elgendy is the head of engineering at Synapse Technology, a leading AI company that builds proprietary computer vision applications to detect threats at security checkpoints worldwide. Previously, Mohamed was an engineering manager at Amazon, where he developed and taught the deep learning for computer vision course at Amazon’s Machine Learning University. He also built and managed Amazon’s computer vision think tank, among many other noteworthy machine learning accomplishments. Mohamed regularly speaks at many AI conferences like Amazon’s DevCon, O'Reilly’s AI conference and Google’s I/O.
Manning Early Access Program (MEAP) Read chapters as they are written, get the finished eBook as soon as it’s ready, and receive the pBook long before it's in bookstores.
placing your order...Don't refresh or navigate away from the page.
customers also bought