A Damn Fine Stable Diffusion Book you own this product

Create awesome images with AI

Will Kurt

MEAP began April 2024
Last updated October 2025
Publication in Spring 2026 (estimated)

ISBN 9781633436800
275 pages (estimated)

Included with a Manning Online subscription

printed in black & white

available in Russian

catalog / Data Science / AI

table of content

PART 1: GETTING STARTED

1 Making our First Image: “A damn fine cup of coffee”

1.1 The Basics of Text-to-Image Creation

1.1.1 The Prompt

1.1.2 Creating more images: Iterations vs Batches

1.1.3 Random Number Generators and The Seed

1.1.4 Adjusting Height and Width

1.2 Prompt Engineering

1.2.1 Favor clear, descriptive prompts

1.2.2 Give you image context

1.2.3 Describe a style for your image

1.3 Hacking the source code!

1.4 Summary

2 Easier Image Generation with Stable Diffusion Webui

2.1 Familiarizing yourself with A1111

2.1.1 Viewing the images you’ve created

2.2 New prompting tricks

2.2.1 Negative Prompting

2.2.2 Prompt Attention

2.2.3 “Sandworm” or “Sand worm”? Understanding Tokens

2.3 Other options to tweak

2.3.1 Classifier-Free Guidance (CFG) Scale

2.3.2 Sampling Steps

2.4 PNG Info

2.5 Summary

3 Programming Stable Diffusion with Hugging Face Diffusers

3.1 Why use Python to work with Stable Diffusion?

3.2 Creating a Page for a Graphic Novel

3.3 Working in Python Notebooks

3.3.1 Basics of the Notebook Interface

3.3.2 Notebooks in the Cloud

3.4 Hugging Face Diffusers

3.4.1 Imports

3.4.2 The DiffusionPipeline

3.4.3 Creating our PRNG

3.4.4 Variables

3.4.5 Creating an Image

3.5 Creating our Graphic Novel Page

3.6 Summary

4 Understanding How Stable Diffusion Works

4.1 Artificial Intelligence, Machine Learning and Neural Networks

4.1.1 Artificial Intelligence

4.1.2 Machine Learning

4.1.3 Neural Networks

4.2 Overview of How Stable Diffusion Works

4.2.1 A High Level overview of how Diffusion models work.

4.3 The Main Components of Stable Diffusion

4.3.1 Compressing Images with the Variational Autoencoder

4.3.2 Transforming Text with the CLIP Encoder

4.3.3 Estimating Noise with the U-NET

4.3.4 Putting it all together with the Scheduler/Sampler

4.4 There and back again, Stable Diffusion in code.

4.5 Summary

PART 2: CREATING BETTER IMAGES

5 Using Custom Checkpoints for Better Images

5.1 Customizing Stable Diffusion

5.1.1 Models, Fine-tuning and checkpoints.

5.1.2 Starting with our base checkpoint.

5.2 CivitAI

5.2.1 epiCPhotoGasm

5.2.2 Store Bought Gyoza

5.3 Hugging Face

5.3.1 Using Hugging Face models with A1111

5.3.2 Using Hugging Face Model with Diffusers

5.3.3 Conclusion

5.4 Summary

6 Improving Images with Samplers

6.1 Understanding Sampling Methods and Schedule Types

6.1.1 Sampling Methods

6.1.2 Schedule Type

6.2 Exploring Samplers and Schedule Types

6.2.1 Samplers we’ll be exploring

6.2.2 Exploring our 3 Samplers

6.2.3 Exploring Schedule Types

6.2.4 Ancestral Samplers

6.3 Conclusion

6.4 Summary

7 Creating High Resolution Images

7.1 Getting Started

7.1.1 A Blade Runner Themed Desktop Wallpaper

7.1.2 Why Can’t I Just Increase the Resolution?

7.1.3 Creating Our Low Resolution Image

7.2 Using Upscalers

7.2.1 Configuring the Scale Factor

7.2.2 Denoising Strength

7.2.3 Hi-res steps

7.3 Image-to-Image Generation

7.3.1 Changing the Seed

7.3.2 Using Different Checkpoints for Img2img

7.3.3 Img2img2img2img

7.3.4 Change that Prompt

7.4 Conclusion

7.5 Summary

8 Building Advanced Workflows with ComfyUI

8.1 Making Anime-Style Ramen Real

8.1.1 A1111 Workflow

8.1.2 Introducing ComfyUI

8.2 Creating an Image with ComfyUI

8.2.1 Adding Our First Node - KSampler

8.2.2 Loading Our Checkpoint

8.2.3 Adding Our Prompt (Positive and Negative)

8.2.4 Empty Latent Image

8.2.5 Organizing Nodes Into A Group

8.2.6 VAE Decoding and Saving Our Image

8.3 Implementing Hi-res Fix in ComfyUI

8.3.1 Upscaling Latents

8.3.2 Adding another VAE Decode and Save Image

8.4 Implementing Img2Img Upscaling

8.4.1 Using Existing ComfyUI Workflows

8.4.2 Generating Realistic Ramen Images From Anime Images.

8.5 Summary

PART 3: ADVANCED TECHNIQUES

9 Better Image Generation with Flux

9.1 Why Flux

9.1.1 Enhanced Image Quality

9.1.2 Better Human Anatomy

9.1.3 High Fidelity Text Generation

9.1.4 Prompt Adherence

9.2 Why Stable Diffusion 1.5 and XL?

9.2.1 Resource Requirements

9.2.2 Style

9.3 Conclusion

9.4 Summary

10 The Flux Workflow

10.1 Getting Started

10.1.1 Loading the Workflow

10.1.2 Getting the Necessary Files

10.1.3 Quantized Models and Loading the Model Files

10.2 Prompt and Guidance

10.3 Sampling

10.3.1 Setting Seeds with RandomNoise

10.3.2 ModelSamplingFlux - Base Shift and Max Shift

10.3.3 The BasicScheduler

10.4 The Remaining nodes

10.5 Summary

11 Deeper Customization Using LoRA

11.1 VRAM and Diffusion Models

11.2 What are LoRAs?

11.3 Visualizing Kafka

11.4 Using Your First LoRA

11.4.1 Setting up the LoRA

11.4.2 Modifying LoRA Streams

11.4.3 A Tarot Card LoRA

11.4.4 Using Trigger Words

11.5 Concept LoRAs

11.6 Combining LoRAs

11.7 Conclusion

11.8 Summary

12 Using ControlNets

12.1 Getting Started with ControlNets

12.1.1 The Basics of Control Nets

12.1.2 Image Preprocessing with ComfyUI Controlnet Aux

12.1.3 General Workflow for using ControlNets

12.1.4 ControlNets and Checkpoint files

12.2 Creating Hidden Images with QR Code Control

12.2.1 QR Code control reference image

12.2.2 Generating our Hidden Image

12.3 Stylizing Real Images with Scribble

12.3.1 Setup for Scribble ControlNet

12.3.2 Generating our Anime Style Crowmeo

12.4 Controlling Poses with Open Pose

12.4.1 Setup for Open Pose ControlNet

12.4.2 Generating our Meditating Cyber-Ninja

12.5 Composing Scenes with Semantic Segmentation

12.5.1 Setup for the Semantic Segmentation ControlNet

12.5.2 Generating our Street Scene in the Autumn

12.6 Conclusion

12.7 Summary

PART 4: CREATING YOUR OWN MODELS

13 Training your own LoRA

14 Checkpoint Merging

15 Fine-tuning your own Stable Diffusion Model

Appendix

Appendix A: Installing Stable Diffusion

Overview

1 Making our First Image: “A damn fine cup of coffee”

The chapter introduces Stable Diffusion through an analogy to early hip‑hop: like DJs remix familiar records into something new, the model recombines learned visual concepts to produce novel images. It frames generative imaging as a playful way to realize ideas we’ve long doodled in notebooks, emphasizing that while first results come quickly, mastery takes experimentation, technique, and iteration. The open‑source nature of Stable Diffusion empowers users to run, customize, and extend the system on consumer hardware, supported by a fast‑evolving ecosystem of community tools and methods.

Readers are guided through creating their first text‑to‑image outputs with the prompt “A damn fine cup of coffee,” learning core controls and trade‑offs. The chapter explains iterations versus batch size (time, VRAM, and practicality), the role of seeds in reproducibility (default 42) and variety, and how image dimensions—especially aspect ratio—meaningfully shape composition and content. A key habit is to generate many candidates and curate the best, treating the model less as an all‑imaginative artist and more like a search engine over possible images, responsive to precise wording and settings.

Prompt engineering fundamentals follow: prefer clear, descriptive phrasing over poetic ambiguity; add scene context to steer composition; and specify stylistic cues (for example, surrealist painting, wood etching) to avoid the uncanny valley and evoke desired aesthetics. The chapter also demonstrates the power of open source by exposing the built‑in NSFW safety check that can replace outputs with a placeholder image and shows how to optionally disable it in code—underscoring user agency. It concludes by reinforcing an iterative workflow of prompt refinement, parameter tuning, and selective curation as the path to images that truly match one’s intent.

Getting my imagination on the screen

Browsing an infinite library of Pulp Sci-Fi that never was.

Who knew monks were such avid readers of sci-fi?

Envisioning ancient aliens.

The initial 6 images created by our prompt: “A damn fine cup of coffee.”

Average seconds to create an image, comparing iterations and batch.

Generating 30 images at once.

Creating 6 different images with seed 12345.

Images with a 5:3 landscape aspect ratio.

Images with a 3:5 aspect ratio using the same seed.

Images with a 3:7 aspect ratio using the same seed.

Images with a 4:1 aspect ratio using the same seed.

A poetic prompt does not always yield poetic images.

A straight forward prompt yields more cups of black coffee.

Adding a scene to an image can help provide context.

Choosing a landscape aspect ratio helps display the counter.

Creating surrealistic images.

Images in the style of a wood etching.

I would say that’s a damn fine cup of coffee!

Being “Rick-rolled” by Stable Diffusion.

Summary

Generating with Stable Diffusion is an iterative process, in which we are constantly revising our settings and prompts.
Despite the many ways to improve images, it’s always a good idea to generate a variety of images to see if we find a particular one that stands out to us as pleasing.
Our prompts should be clear and descriptive. Giving some context for the object we’re prompting can change the image dramatically. Describing the style of the image can further let us change the feeling of the images we’re generating.
The aspect ratio that we use to generate an image can have a major impact on the way the image looks. Consider whether the image you want to create would look better as a square, a landscape or portrait.
Because Stable Diffusion is open source we (as well as the entire community of users) can change and extend its behavior.

FAQ

How do I generate my first image with the prompt “A damn fine cup of coffee”?

Activate your environment, then run the txt2img script from the Stable Diffusion repo root: - conda activate ldm - python ./scripts/txt2img.py --prompt="A damn fine cup of coffee" If your checkpoint isn’t in the default path, add: --ckpt=/path/to/model.ckpt

I get “model.ckpt not found.” How do I fix it?

Place the v1.5 checkpoint where the script expects it (see the book’s appendix), or pass the full path with: --ckpt=/path/to/model.ckpt. Example: python ./scripts/txt2img.py --prompt="A damn fine cup of coffee" --ckpt=/models/sd-v1-5.ckpt

Where are the generated images saved by default?

In the CompVis implementation, txt2img saves to: stable-diffusion/outputs/txt2img-samples/samples

What’s the difference between n_iter and n_samples (batch size)?

- n_iter: how many sequential iterations to run (total rounds) - n_samples: how many images generated in parallel per iteration (batch size) Batches use more GPU VRAM. In practice, larger batches don’t provide a big speedup per image; many users stick to small batches (often 1) and raise n_iter for volume.

How can I generate 30 images in one go?

Multiply iterations by batch size to reach 30. For example: python ./scripts/txt2img.py --prompt="A damn fine cup of coffee" --n_samples=5 --n_iter=6 Adjust values to fit your VRAM if batch 5 is too large.

Why did the first 6 images in my 30-image run repeat?

The script uses a fixed random seed for reproducibility (default seed is 42). If you don’t change it, the first set repeats. Set a new seed, e.g.: --seed=12345

What is the seed and why does it matter?

The seed initializes the pseudo-random number generator. With the same prompt, settings, and seed, results are exactly repeatable. Change the seed to explore new variations, or keep it fixed to reproduce a result later (or match the book’s outputs).

How do I set image size and aspect ratio, and are there limits?

Use --H (height) and --W (width). Dimensions must be multiples of 8; the chapter uses multiples of 128 for simplicity. Larger H×W needs more VRAM. Aspect ratio strongly affects composition (e.g., landscape for wide scenes, portrait for tall subjects). Generate high-res later via dedicated upscaling workflows.

How can I improve results with prompt engineering?

- Prefer clear, descriptive phrasing over poetic language (e.g., “A cup of black coffee”). - Add context/scene (e.g., “on a diner counter”). - Specify style (e.g., “surrealist painting”, “wood etching”). - Iterate: generate, review, tweak prompt/settings, repeat.

What’s the “Rick Astley” image and can I disable the NSFW safety checker?

When the safety checker flags an image as NSFW, it replaces it with a placeholder (often the “Rickroll”). Because CompVis Stable Diffusion is open source, you can disable this in scripts/txt2img.py by redefining check_safety to return the original image (e.g., def check_safety(x_image): return x_image, False). Do this only if you understand and accept the implications.

pro $24.99 per month

access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
choose one free eBook per month to keep
exclusive 50% discount on all purchases
renews monthly, pause or cancel renewal anytime

lite $19.99 per month

access to all Manning books, including MEAPs!

team

5, 10 or 20 seats+ for your team - learn more

eBook

pdf, ePub, online

$39.99 $25.99

you save $14.00 (35%)

pro $24.99 per month

access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
choose one free eBook per month to keep
exclusive 50% discount on all purchases
renews monthly, pause or cancel renewal anytime

lite $19.99 per month

access to all Manning books, including MEAPs!

team

5, 10 or 20 seats+ for your team - learn more

eBook

$39.99 $25.99

you save $14.00 (35%)

eBook

pdf, ePub, online

$39.99 $25.99

you save $14.00 (35%)

pro $24.99 per month

access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
choose one free eBook per month to keep
exclusive 50% discount on all purchases
renews monthly, pause or cancel renewal anytime

lite $19.99 per month

access to all Manning books, including MEAPs!

team

5, 10 or 20 seats+ for your team - learn more