A Damn Fine Stable Diffusion Book you own this product

Create awesome images with AI

Will Kurt

MEAP began April 2024
Last updated February 2026
Publication in July 2026 (estimated)

ISBN 9781633436800
275 pages (estimated)

Included with a Manning Online subscription

printed in black & white

available in Russian

resources: Book forum

table of content

INTRODUCTION

0 Introduction

PART 1: GETTING STARTED

1 Making our First Image: “A damn fine cup of coffee”

1.1 Getting Started with A1111

1.2 The Basics of Text-to-Image Creation

1.2.1 The Prompt

1.2.2 Creating more images: Batch Size vs Batch Count

1.2.3 Random Number Generators and The Seed

1.2.4 Adjusting Height and Width

1.3 Prompt Engineering

1.3.1 Favor clear, descriptive prompts

1.3.2 Give you image context

1.3.3 Describe a style for your image

1.4 Summary

2 Easier Image Generation with Stable Diffusion Webui

2.1 Returning to A1111

2.1.1 Quick UI refresher

2.1.2 Viewing the images you’ve created

2.2 New prompting tricks

2.2.1 Negative Prompting

2.2.2 Prompt Attention

2.2.3 “Sandworm” or “Sand worm”? Understanding Tokens

2.3 Other options to tweak

2.3.1 Classifier-Free Guidance (CFG) Scale

2.3.2 Sampling Steps

2.4 PNG Info

2.5 Summary

3 Programming Stable Diffusion with Hugging Face Diffusers

3.1 Why use Python to work with Stable Diffusion?

3.2 Creating a Page for a Graphic Novel

3.3 Working in Python Notebooks

3.3.1 Basics of the Notebook Interface

3.3.2 Notebooks in the Cloud

3.4 Hugging Face Diffusers

3.4.1 Imports

3.4.2 The DiffusionPipeline

3.4.3 Creating our PRNG

3.4.4 Variables

3.4.5 Creating an Image

3.5 Creating our Graphic Novel Page

3.6 Summary

4 Understanding How Stable Diffusion Works

4.1 Artificial Intelligence, Machine Learning and Neural Networks

4.1.1 Artificial Intelligence

4.1.2 Machine Learning

4.1.3 Neural Networks

4.2 Overview of How Stable Diffusion Works

4.2.1 A High Level overview of how Diffusion models work.

4.3 The Main Components of Stable Diffusion

4.3.1 Compressing Images with the Variational Autoencoder

4.3.2 Transforming Text with the CLIP Encoder

4.3.3 Estimating Noise with the U-NET

4.3.4 Putting it all together with the Scheduler/Sampler

4.4 There and back again, Stable Diffusion in code.

4.5 Summary

PART 2: CREATING BETTER IMAGES

5 Using Custom Checkpoints for Better Images

5.1 Customizing Stable Diffusion

5.1.1 Models, Fine-tuning and checkpoints.

5.1.2 Starting with our base checkpoint.

5.2 CivitAI

5.2.1 epiCPhotoGasm

5.2.2 Store Bought Gyoza

5.3 Hugging Face

5.3.1 Using Hugging Face models with A1111

5.3.2 Using Hugging Face Model with Diffusers

5.3.3 Conclusion

5.4 Summary

6 Improving Images with Samplers

6.1 Understanding Sampling Methods and Schedule Types

6.1.1 Sampling Methods

6.1.2 Schedule Type

6.2 Exploring Samplers and Schedule Types

6.2.1 Samplers we’ll be exploring

6.2.2 Exploring our 3 Samplers

6.2.3 Exploring Schedule Types

6.2.4 Ancestral Samplers

6.3 Conclusion

6.4 Summary

7 Creating High Resolution Images

7.1 Getting Started

7.1.1 A Blade Runner Themed Desktop Wallpaper

7.1.2 Why Can’t I Just Increase the Resolution?

7.1.3 Creating Our Low Resolution Image

7.2 Using Upscalers

7.2.1 Configuring the Scale Factor

7.2.2 Denoising Strength

7.2.3 Hi-res steps

7.3 Image-to-Image Generation

7.3.1 Changing the Seed

7.3.2 Using Different Checkpoints for Img2img

7.3.3 Img2img2img2img

7.3.4 Change that Prompt

7.4 Conclusion

7.5 Summary

8 Building Advanced Workflows with ComfyUI

8.1 Making Anime-Style Ramen Real

8.1.1 A1111 Workflow

8.1.2 Introducing ComfyUI

8.2 Creating an Image with ComfyUI

8.2.1 Adding Our First Node - KSampler

8.2.2 Loading Our Checkpoint

8.2.3 Adding Our Prompt (Positive and Negative)

8.2.4 Empty Latent Image

8.2.5 Organizing Nodes Into A Group

8.2.6 VAE Decoding and Saving Our Image

8.3 Implementing Hi-res Fix in ComfyUI

8.3.1 Upscaling Latents

8.3.2 Adding another VAE Decode and Save Image

8.4 Implementing Img2Img Upscaling

8.4.1 Using Existing ComfyUI Workflows

8.4.2 Generating Realistic Ramen Images From Anime Images.

8.5 Summary

PART 3: ADVANCED TECHNIQUES

9 Better Image Generation with Flux

9.1 Why Flux

9.1.1 Enhanced Image Quality

9.1.2 Better Human Anatomy

9.1.3 High Fidelity Text Generation

9.1.4 Prompt Adherence

9.2 Why Stable Diffusion 1.5 and XL?

9.2.1 Resource Requirements

9.2.2 Style

9.3 Conclusion

9.4 Summary

10 The Flux Workflow

10.1 Getting Started

10.1.1 Loading the Workflow

10.1.2 Getting the Necessary Files

10.1.3 Quantized Models and Loading the Model Files

10.2 Prompt and Guidance

10.3 Sampling

10.3.1 Setting Seeds with RandomNoise

10.3.2 ModelSamplingFlux - Base Shift and Max Shift

10.3.3 The BasicScheduler

10.4 The Remaining nodes

10.5 Summary

11 Deeper Customization Using LoRA

11.1 VRAM and Diffusion Models

11.2 What are LoRAs?

11.3 Visualizing Kafka

11.4 Using Your First LoRA

11.4.1 Setting up the LoRA

11.4.2 Modifying LoRA Streams

11.4.3 A Tarot Card LoRA

11.4.4 Using Trigger Words

11.5 Concept LoRAs

11.6 Combining LoRAs

11.7 Conclusion

11.8 Summary

12 Using ControlNets

12.1 Getting Started with ControlNets

12.1.1 The Basics of Control Nets

12.1.2 Image Preprocessing with ComfyUI Controlnet Aux

12.1.3 General Workflow for using ControlNets

12.1.4 ControlNets and Checkpoint files

12.2 Creating Hidden Images with QR Code Control

12.2.1 QR Code control reference image

12.2.2 Generating our Hidden Image

12.3 Stylizing Real Images with Scribble

12.3.1 Setup for Scribble ControlNet

12.3.2 Generating our Anime Style Crowmeo

12.4 Controlling Poses with Open Pose

12.4.1 Setup for Open Pose ControlNet

12.4.2 Generating our Meditating Cyber-Ninja

12.5 Composing Scenes with Semantic Segmentation

12.5.1 Setup for the Semantic Segmentation ControlNet

12.5.2 Generating our Street Scene in the Autumn

12.6 Conclusion

12.7 Summary

PART 4: CREATING YOUR OWN MODELS

13 Training your own LoRA

14 Checkpoint Merging

15 Fine-tuning your own Stable Diffusion Model

Appendix

Appendix A: Installing Stable Diffusion

Overview

10 The Flux Workflow

This chapter introduces Flux.1-dev, a powerful open image generation model from members of the original Stable Diffusion team, and walks through how to use it in ComfyUI. It contrasts Flux’s workflow with SD1.5 and SDXL, emphasizing where it differs and what new configuration options it brings. The focus is on getting the most out of Flux despite its higher memory demands, understanding its new guidance system and sampler controls, and learning practical tweaks that improve prompt adherence, style handling, and realism.

Getting started involves loading a dedicated ComfyUI workflow and, unlike classic single-checkpoint setups, separately loading Flux’s components: two text encoders (CLIP-L and T5-XXL), the Flux VAE, and the UNet. Because Flux.1-dev is VRAM-hungry, the chapter shows how to swap in a quantized fp8 UNet to roughly halve memory use with only a minor quality trade-off, enabling generation on more constrained GPUs. It also explains the DualCLIPLoader, which combines two encoders to boost prompt understanding, and notes that while ComfyUI lets you “just run” the graph, understanding these parts helps you troubleshoot and improve results.

Flux replaces classic CFG with a Guidance mechanism (via FluxGuidance and BasicGuider), where lowering guidance often improves style adherence and reduces the plasticky look in photorealistic prompts. Sampling is more elaborate: RandomNoise adds seed behaviors (fixed, increment, decrement, randomize); ModelSamplingFlux exposes base_shift and max_shift to shape the denoising schedule (max_shift has clear visual impact, base_shift is subtle and can be resolution-dependent); and BasicScheduler keeps familiar scheduler/steps while adding a sensitive denoise control that should be tuned in very small increments. The chapter wraps with practical notes on latent size and batch size constraints, and underscores that Flux’s new levers can materially change outcomes and partially overcome current limitations, with deeper style solutions coming later via LoRAs.

Drag this image to ComfyUI to get the Flux workflow.

The Flux workflow is notably different than SD1.5 and SDXL

Our adorable first image from Flux

The Load Diffusion Model node allows you to select the U-NET.

Using the fp8 model only minorly impacts the quality of our output.

The DualCLIPLoader allows us to use two CLIP encoders.

FluxGuidance and BasicGuider nodes.

The Guidance parameter in Flux can help the model adhere to style better.

Lower Guidance also helps to make images look less “plasticy”.

Sampling is notably more complex when working with Flux.

RandomNoize adds a control_after_generation option.

Noise Schedule when max_shift and base_shift both are 0.0.

With max_shift of 1.5, the noise removal is no longer uniform.

Max shift can have a notable impact on the resulting image generated.

Base shift slightly modifies the denoising curve.

While there is a difference in images, it is more subtle.

Under some aspect ratios, base_shift has no impact at all!

Denoise should be changed in very small increments

Summary

The Flux workflow has quite a few differences from the standard Stable Diffusion workflow.
When working with models requiring high memory, it is helpful to look for quantized versions of these models.
Part of the reason Flux is so good at prompt adherence is because it uses two CLIP encoders.
Guidance in Flux is similar to CFG in SD1.5 and SDXL, but doesn’t allow for a negative prompt.
Reducing Guidance can help the model pay more attention to the style recommended in the prompt.
Flux allows for much more nuanced control over scheduling.
Max shift can have a pretty major impact on your final result, but base shift is less impactful and in many cases will have no impact at all.
Reducing the Denoise value can make the image look nicer, but should only be done in very small steps as it can quickly degrade the image.

FAQ

How do I load and run the official Flux workflow in ComfyUI?

Go to the ComfyUI Flux examples page at https://comfyanonymous.github.io/ComfyUI_examples/flux/ and drag the example image into ComfyUI. The workflow is embedded in the image. After you place the required model files (see below), enter a prompt and click “Queue Prompt.”

Which files are required for Flux.1-dev and where do I put them?

- Text encoders (to ComfyUI/models/clip): clip_l.safetensors and t5xxl_fp16.safetensors (https://huggingface.co/comfyanonymous/flux_text_encoders/tree/main)
- VAE (to ComfyUI/models/vae): ae.safetensors (https://huggingface.co/black-forest-labs/FLUX.1-schnell/blob/main/ae.safetensors)
- UNet (to ComfyUI/models/unet): flux-1dev.safetensors (https://huggingface.co/black-forest-labs/FLUX.1-dev/tree/main). Note: place in models/unet, not models/checkpoints.

I’m running out of VRAM. How can I make Flux fit on my GPU?

- Use the quantized UNet: flux1-dev-fp8.safetensors (https://huggingface.co/Comfy-Org/flux1-dev/blob/main/flux1-dev-fp8.safetensors). It uses ~half the VRAM with only minor quality loss.
- In “Load Diffusion Model,” set unet_name to flux1-dev-fp8.safetensors.
- Keep batch_size at 1 and reduce resolution if needed.
- Macs with large unified memory (M‑series) can run Flux more easily.

Where do I select the model in the workflow, and what is weight_dtype?

Use the “Load Diffusion Model” node. Set unet_name to the file you want (e.g., flux-1dev.safetensors or flux1-dev-fp8.safetensors). Leave weight_dtype as “default” unless you have a specific reason to change it.

Why does Flux require two text encoders, and what is DualCLIPLoader?

Flux uses two encoders (CLIP-L and T5-XXL) via the DualCLIPLoader to improve prompt understanding. In the node, set type to “flux” and pick clip_l.safetensors and t5xxl_fp16.safetensors. You can also experiment with “sd3” or “sdxl” types in other workflows.

How is Flux Guidance different from CFG, and how should I tune it?

Flux’s Guidance (via FluxGuidance + BasicGuider) isn’t the classic CFG with a negative prompt. It strongly affects style and realism: lower values often improve style adherence and reduce the “plastic” look, while very high values can distort images. Start around 3.5 (default) and try lowering toward ~1.3–2.5 for styles or more natural realism.

My seed changes every run. How do I keep it fixed?

In the RandomNoise node, set control_after_generate to “fixed” and enter your noise_seed. Other options: “increment,” “decrement,” or “randomize” (the default, which picks a new seed each time).

What do max_shift and base_shift in ModelSamplingFlux do?

They shape the denoising schedule (how much noise is removed per step). max_shift has the largest practical impact: higher values shift more denoising to later steps, often improving color separation and changing overall look. base_shift is subtle and, at 1024×1024 (and some other settings), may have no effect at all. Tip: tune max_shift first; leave base_shift at 0 unless you’re exploring non‑square resolutions.

What does the BasicScheduler’s denoise slider do?

denoise controls how much of the total noise is removed (1.0 = full). Slightly lowering it (e.g., ~0.9) can add a more illustrative quality, but going much lower quickly degrades the image. If you tweak it, do so in very small increments.

Where do I set image size and batch count, and any gotchas?

Use EmptySD3LatentImage for batch_size (usually 1 for Flux) and connect height/width nodes. Keep height/width nodes’ control_after_generate set to “fixed.” Randomly changing size across runs is rarely useful and can complicate reproducibility and memory use.

pro $24.99 per month

access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
choose one free eBook per month to keep
exclusive 50% discount on all purchases
renews monthly, pause or cancel renewal anytime

lite $19.99 per month

access to all Manning books, including MEAPs!

team

5, 10 or 20 seats+ for your team - learn more

eBook

pdf, ePub, online

$39.99 $23.99

you save $16.00 (40%)

pro $24.99 per month

access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
choose one free eBook per month to keep
exclusive 50% discount on all purchases
renews monthly, pause or cancel renewal anytime

lite $19.99 per month

access to all Manning books, including MEAPs!

team

5, 10 or 20 seats+ for your team - learn more

eBook

$39.99 $23.99

you save $16.00 (40%)

eBook

pdf, ePub, online

$39.99 $23.99

you save $16.00 (40%)

pro $24.99 per month

access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
choose one free eBook per month to keep
exclusive 50% discount on all purchases
renews monthly, pause or cancel renewal anytime

lite $19.99 per month

access to all Manning books, including MEAPs!

team

5, 10 or 20 seats+ for your team - learn more