Overview

1 What Is Generative AI and Why PyTorch?

Generative AI has rapidly moved from research to mainstream since late 2022, reshaping workflows and creative processes across industries. This chapter sets the stage by clarifying what generative AI is, how it differs from non-generative (discriminative) systems, and why its ability to synthesize new text, images, audio, and more is so disruptive. It frames the core questions behind the technology’s mechanism and impact, then positions Python and PyTorch as the practical foundation for learning and experimentation throughout the book.

The chapter introduces the main families of models you will build: Generative Adversarial Networks (GANs) and Transformers, along with variational autoencoders and diffusion models. GANs pair a generator and a discriminator in a competitive loop to learn data distributions and produce convincing new samples, enabling tasks from image synthesis to style and attribute translation. Transformers, powered by the self-attention mechanism, handle sequences efficiently, capture long-range dependencies, and scale via parallel training—properties that underpin large language models and modern multimodal systems. Diffusion models and text-to-image pipelines illustrate how iterative refinement and conditioning unlock high-quality visual generation.

To make these ideas concrete, the book adopts a build-from-scratch approach using Python and PyTorch. PyTorch’s dynamic computation graph, clear syntax, GPU acceleration, and rich ecosystem make it well suited to rapid prototyping, transfer learning, and integration with familiar scientific libraries. By implementing models end to end, readers develop intuition for architectures, learn to control outputs (for example, selecting attributes through latent variables), and gain the skills to adapt or fine-tune pre-trained models for downstream tasks. This deeper understanding not only improves practical effectiveness but also equips readers to evaluate the capabilities and risks of generative AI with greater rigor and responsibility.

A comparison of generative models versus discriminative models. A discriminative model (top half of the figure) takes data as inputs and produces probabilities of different labels, which we denote by Prob(dog) and Prob(cat). In contrast, a generative model (bottom half) acquires an in-depth understanding of the defining characteristics of these images to synthesize new images representing dogs and cats.
Generative Adversarial Networks (GANs) architecture and its components. GANs employ a dual-network architecture comprising a generative model (left) tasked with capturing the underlying data distribution and a discriminative model (center) that serves to estimate the likelihood that a given sample originates from the authentic training dataset (considered as "real") rather than being a product of the generative model (considered as "fake").
Examples from the Anime faces training dataset.
Generated Anime face images by the trained generator in DCGAN.
Changing hair color with CycleGAN. If we feed images with blond hair (first row) to a trained CycleGAN model, the model converts blond hair to black hair in these images (second row). The same trained model can also convert black hair (third row) to blond hair (bottom row).
The Transformer architecture. The encoder in the Transformer (left side of the diagram) learns the meaning of the input sequence (e.g., the English phrase “How are you?”) and converts it into an abstract representation that captures its meaning before passing it to the decoder (right side of the diagram). The decoder constructs the output (e.g., the French translation of the English phrase) by predicting one word at a time, based on previous words in the sequence and the abstract representation from the encoder.
The diffusion model adds more and more noise to the images and learns to reconstruct them. The left column contains four original flower images. As we move to the right, some noise is added to the images in each step, until at the right column, the four images are completely noisy images. We then use these images to train a diffusion-based model to progressively remove noise to generate new data samples.
Image generated by DALL-E 2 with text prompt “an astronaut in a space suit riding a unicorn”.

Summary

  • Generative AI is a type of technology with the capacity to produce diverse forms of new content, including texts, images, code, music, audio, and video.
  • Discriminative models specialize in assigning labels while generative models generate new instances of data.
  • PyTorch, with its dynamic computational graphs and the ability for GPU training, is well suited for deep learning and generative modeling.
  • GANs are a type of generative modeling method consisting of two neural networks: a generator and a discriminator. The goal of the generator is to create realistic data samples to maximize the chance that the discriminator thinks they are real. The goal of the discriminator is to correctly identify fake samples from real ones.
  • Transformers are deep neural networks that use the attention mechanism to identify long-term dependencies among elements in a sequence. The original Transformer has an encoder and a decoder. When it’s used for English-to-French translation, for example, the encoder converts the English sentence into an abstract representation before passing it to the decoder. The decoder generates the French translation one word at a time, based on the encoder’s output and the previously generated words.

FAQ

What is generative AI, and how does it differ from discriminative AI?Generative AI learns data distributions to create new content (text, images, audio, etc.). Discriminative AI focuses on labeling or classifying existing data. Statistically, discriminative models estimate P(Y|X), while generative models aim to learn the joint distribution P(X, Y) (or P(X)) and sample new X from it.
Why does this book use Python and PyTorch for generative AI?Python offers readable syntax, cross-platform support, and a vast ecosystem. PyTorch complements it with a flexible, Pythonic API, dynamic computational graphs that simplify experimentation and debugging, strong GPU acceleration, and an active community—ideal for fast-moving generative AI work.
What is a dynamic computational graph in PyTorch, and why does it matter?A dynamic computational graph is built and modified on the fly as your code runs. This makes model architectures easier to vary, experiments faster to iterate, and bugs simpler to diagnose—key advantages when building and training custom generative models.
How do Generative Adversarial Networks (GANs) work at a high level?GANs pit two networks against each other: a generator that synthesizes data and a discriminator that distinguishes real from fake. Through iterative training in this zero-sum game, the generator learns to produce outputs that the discriminator cannot reliably tell apart from real samples.
What role does the latent vector Z play in GANs?The latent vector Z is the generator’s input “task description.” Sampling different Z values produces diverse outputs and lets you explore or control characteristics in generated content. Later extensions (e.g., conditional GANs) further steer specific attributes.
What practical applications do GANs have beyond image synthesis?GANs support image-to-image translation (e.g., changing hair color with CycleGAN), data augmentation, style transfer, and even music generation. They can also reduce production costs by generating realistic previews (such as customized product images) before physical manufacturing.
Why did Transformers overtake RNNs/LSTMs for sequence tasks?Transformers use self-attention to capture long-range dependencies and process tokens in parallel, enabling much faster training on large datasets. This scalability and context modeling outperform sequential RNN processing for many tasks.
What are the main Transformer variants and their typical use cases?- Encoder-only (e.g., BERT): understanding tasks such as classification and named entity recognition. - Decoder-only (e.g., GPT-2/ChatGPT): text generation and language modeling. - Encoder-decoder: sequence-to-sequence and multimodal tasks like translation, speech recognition, or text-to-image.
How do diffusion models relate to text-to-image systems like DALL·E?Diffusion models learn to remove noise step by step to generate high-quality images. Text-to-image systems condition this process on prompts and often pair Transformer components with diffusion-style iterative refinement to align outputs with the textual description.
Why build generative models from scratch instead of only using pre-trained ones?Implementing models yourself deepens understanding, improves troubleshooting, and enables precise control (e.g., attribute steering in GANs). It also equips you to adapt or fine-tune pre-trained LLMs for downstream tasks and to evaluate benefits and risks of AI more responsibly.

pro $24.99 per month

  • access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
  • choose one free eBook per month to keep
  • exclusive 50% discount on all purchases
  • renews monthly, pause or cancel renewal anytime

lite $19.99 per month

  • access to all Manning books, including MEAPs!

team

5, 10 or 20 seats+ for your team - learn more


choose your plan

team

monthly
annual
$49.99
$499.99
only $41.67 per month
  • five seats for your team
  • access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
  • choose another free product every time you renew
  • choose twelve free products per year
  • exclusive 50% discount on all purchases
  • renews monthly, pause or cancel renewal anytime
  • renews annually, pause or cancel renewal anytime
  • Learn Generative AI with PyTorch ebook for free
choose your plan

team

monthly
annual
$49.99
$499.99
only $41.67 per month
  • five seats for your team
  • access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
  • choose another free product every time you renew
  • choose twelve free products per year
  • exclusive 50% discount on all purchases
  • renews monthly, pause or cancel renewal anytime
  • renews annually, pause or cancel renewal anytime
  • Learn Generative AI with PyTorch ebook for free
choose your plan

team

monthly
annual
$49.99
$499.99
only $41.67 per month
  • five seats for your team
  • access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
  • choose another free product every time you renew
  • choose twelve free products per year
  • exclusive 50% discount on all purchases
  • renews monthly, pause or cancel renewal anytime
  • renews annually, pause or cancel renewal anytime
  • Learn Generative AI with PyTorch ebook for free