Beyond Slop you own this product

Professional AI image creation

Will Kurt

MEAP began April 2024
Last updated May 2026
Publication in September 2026 (estimated)

ISBN 9781633436800
275 pages (estimated)

Included with a Manning Online subscription

printed in black & white

available in Russian

catalog / Data Science / AI

resources: Book forum

table of content

INTRODUCTION

0 Creator, Not Consumer

0.1 What are “Generative Models?”

0.1.1 What are “models”?

0.1.2 Generative vs discriminative models

0.2 Open Models and Open Source Tools

0.2.1 Open model vs Open Source

0.2.2 Open Source tools we’ll be using

0.3 Hardware for Running Image Generation Models

0.3.1 Running models with a GPU

0.3.2 Working with Unified Memory on Apple computers

0.4 Quick Glimpse of Stable Diffusion in Action

0.5 Where we’re going!

0.6 Creating “Art” with AI?!

0.7 Summary

PART 1: GETTING STARTED

1 Making our First Image: “A damn fine cup of coffee”

1.1 Getting Started with A1111

1.2 The Basics of Text-to-Image Creation

1.2.1 The Prompt

1.2.2 Creating more images: Batch Size vs Batch Count

1.2.3 Random Number Generators and The Seed

1.2.4 Adjusting Height and Width

1.3 Prompt Engineering

1.3.1 Favor clear, descriptive prompts

1.3.2 Give you image context

1.3.3 Describe a style for your image

1.4 Summary

2 Easier Image Generation with Stable Diffusion Webui

2.1 Returning to A1111

2.1.1 Quick UI refresher

2.1.2 Viewing the images you’ve created

2.2 New prompting tricks

2.2.1 Negative Prompting

2.2.2 Prompt Attention

2.2.3 “Sandworm” or “Sand worm”? Understanding Tokens

2.3 Other options to tweak

2.3.1 Classifier-Free Guidance (CFG) Scale

2.3.2 Sampling Steps

2.4 PNG Info

2.5 Summary

3 Programming Stable Diffusion with Hugging Face Diffusers

3.1 Why use Python to work with Stable Diffusion?

3.2 Creating a Page for a Graphic Novel

3.3 Working in Python Notebooks

3.3.1 Basics of the Notebook Interface

3.3.2 Notebooks in the Cloud

3.4 Hugging Face Diffusers

3.4.1 Imports

3.4.2 The DiffusionPipeline

3.4.3 Creating our PRNG

3.4.4 Variables

3.4.5 Creating an Image

3.5 Creating our Graphic Novel Page

3.6 Summary

4 Understanding How Stable Diffusion Works

4.1 Artificial Intelligence, Machine Learning and Neural Networks

4.1.1 Artificial Intelligence

4.1.2 Machine Learning

4.1.3 Neural Networks

4.2 Overview of How Stable Diffusion Works

4.2.1 A High Level overview of how Diffusion models work.

4.3 The Main Components of Stable Diffusion

4.3.1 Compressing Images with the Variational Autoencoder

4.3.2 Transforming Text with the CLIP Encoder

4.3.3 Estimating Noise with the U-NET

4.3.4 Putting it all together with the Scheduler/Sampler

4.4 There and back again, Stable Diffusion in code.

4.5 Summary

PART 2: CREATING BETTER IMAGES

5 Using Custom Checkpoints for Better Images

5.1 Customizing Stable Diffusion

5.1.1 Models, Fine-tuning and checkpoints.

5.1.2 Starting with our base checkpoint.

5.2 CivitAI

5.2.1 epiCPhotoGasm

5.2.2 Store Bought Gyoza

5.3 Hugging Face

5.3.1 Using Hugging Face models with A1111

5.3.2 Using Hugging Face Model with Diffusers

5.3.3 Conclusion

5.4 Summary

6 Improving Images with Samplers

6.1 Understanding Sampling Methods and Schedule Types

6.1.1 Sampling Methods

6.1.2 Schedule Type

6.2 Exploring Samplers and Schedule Types

6.2.1 Samplers weâ€™ll be exploring

6.2.2 Exploring our 3 Samplers

6.2.3 Exploring Schedule Types

6.2.4 Ancestral Samplers

6.3 Conclusion

6.4 Summary

7 Creating High Resolution Images

7.1 Getting Started

7.1.1 A Blade Runner Themed Desktop Wallpaper

7.1.2 Why Canâ€™t I Just Increase the Resolution?

7.1.3 Creating Our Low Resolution Image

7.2 Using Upscalers

7.2.1 Configuring the Scale Factor

7.2.2 Denoising Strength

7.2.3 Hi-res steps

7.3 Image-to-Image Generation

7.3.1 Changing the Seed

7.3.2 Using Different Checkpoints for Img2img

7.3.3 Img2img2img2img

7.3.4 Change that Prompt

7.4 Conclusion

7.5 Summary

8 Building Advanced Workflows with ComfyUI

8.1 Making Anime-Style Ramen Real

8.1.1 A1111 Workflow

8.1.2 Introducing ComfyUI

8.2 Creating an Image with ComfyUI

8.2.1 Adding Our First Node - KSampler

8.2.2 Loading Our Checkpoint

8.2.3 Adding Our Prompt (Positive and Negative)

8.2.4 Empty Latent Image

8.2.5 Organizing Nodes Into A Group

8.2.6 VAE Decoding and Saving Our Image

8.3 Implementing Hi-res Fix in ComfyUI

8.3.1 Upscaling Latents

8.3.2 Adding another VAE Decode and Save Image

8.4 Implementing Img2Img Upscaling

8.4.1 Using Existing ComfyUI Workflows

8.4.2 Generating Realistic Ramen Images From Anime Images.

8.5 Summary

PART 3: ADVANCED TECHNIQUES

9 Better Image Generation with Flux

9.1 Why Flux

9.1.1 Enhanced Image Quality

9.1.2 Better Human Anatomy

9.1.3 High Fidelity Text Generation

9.1.4 Prompt Adherence

9.2 Why Stable Diffusion 1.5 and XL?

9.2.1 Resource Requirements

9.2.2 Style

9.3 Conclusion

9.4 Summary

10 The Flux Workflow

10.1 Getting Started

10.1.1 Loading the Workflow

10.1.2 Getting the Necessary Files

10.1.3 Quantized Models and Loading the Model Files

10.2 Prompt and Guidance

10.3 Sampling

10.3.1 Setting Seeds with RandomNoise

10.3.2 ModelSamplingFlux - Base Shift and Max Shift

10.3.3 The BasicScheduler

10.4 The Remaining nodes

10.5 Summary

11 Deeper Customization Using LoRA

11.1 VRAM and Diffusion Models

11.2 What are LoRAs?

11.3 Visualizing Kafka

11.4 Using Your First LoRA

11.4.1 Setting up the LoRA

11.4.2 Modifying LoRA Streams

11.4.3 A Tarot Card LoRA

11.4.4 Using Trigger Words

11.5 Concept LoRAs

11.6 Combining LoRAs

11.7 Conclusion

11.8 Summary

12 Using ControlNets

12.1 Getting Started with ControlNets

12.1.1 The Basics of Control Nets

12.1.2 Image Preprocessing with ComfyUI Controlnet Aux

12.1.3 General Workflow for using ControlNets

12.1.4 ControlNets and Checkpoint files

12.2 Creating Hidden Images with QR Code Control

12.2.1 QR Code control reference image

12.2.2 Generating our Hidden Image

12.3 Stylizing Real Images with Scribble

12.3.1 Setup for Scribble ControlNet

12.3.2 Generating our Anime Style Crowmeo

12.4 Controlling Poses with Open Pose

12.4.1 Setup for Open Pose ControlNet

12.4.2 Generating our Meditating Cyber-Ninja

12.5 Composing Scenes with Semantic Segmentation

12.5.1 Setup for the Semantic Segmentation ControlNet

12.5.2 Generating our Street Scene in the Autumn

12.6 Conclusion

12.7 Summary

PART 4: CREATING YOUR OWN MODELS

13 Training a Flux LoRA

13.1 Getting Started with the AI-Toolkit

13.2 Training Data

13.2.1 Collecting Images

13.2.2 Building Your Dataset

13.3 Training our LoRA

13.4 Running our training job

13.5 Exploring the results of our training

13.6 Summary

14 Checkpoint Merging

14.1 How Checkpoint Merging Works

14.1.1 Refresher on Neural Networks

14.1.2 Merging The Model Weights

14.2 Stable Diffusion 1.5 Merging with Automatic 1111

14.2.1 Merging Checkpoint in the A1111 Interface

14.2.2 Using Our Merged Checkpoint

14.3 Stable Diffusion XL Merging with ComfyUI

14.3.1 The ComfyUI Workflow

14.3.2 Generating our Images

14.4 Conclusion

14.5 Summary

15 Fine-tuning Stable Diffusion 1.5

15.1 Better Monster Generation

15.2 Generate our Training Data

15.2.1 Using AI to build our training data

15.2.2 Generating our training data

15.2.3 Storing our images and labels

15.3 Setting up OneTrainer for fine-tuning

15.3.1 Installation and running OneTrainer

15.3.2 Getting started and General Settings

15.3.3 Model Settings

15.3.4 Concepts

15.3.5 The training tab

15.3.6 Sampling from the model while fine-tuning

15.3.7 Backing up our model and saving checkpoints

15.4 Running our training job

15.5 Examining the results of our fine-tuning

15.5.1 Exploring our monster generation

15.5.2 Major early improvements in humans

15.6 Conclusion

15.7 Summary

Appendix

Appendix A: Installing Stable Diffusion

Overview

1 Making our First Image: “A damn fine cup of coffee”

This chapter introduces the basic workflow for creating images with Stable Diffusion using AUTOMATIC1111’s Stable Diffusion Webui, or A1111. It frames image generation as a mix of wonder, unpredictability, and technique: the user types a text description, the model interprets it, and the results may range from impressive to strange. The chapter emphasizes that good results come from learning both the tool and the process, especially by generating many options and selecting the most promising ones for refinement.

The chapter walks through the first text-to-image experiment using the prompt “A damn fine cup of coffee.” It explains that prompts are natural-language descriptions that guide Stable Diffusion, then shows how batch count and batch size can be used to create many images at once. Batch count runs generations sequentially, while batch size creates multiple images in parallel and uses more GPU memory. The chapter also introduces seeds, explaining that Stable Diffusion’s randomness can be controlled by setting a specific seed, making outputs reproducible and allowing users to compare changes more reliably.

The chapter then explores how image dimensions and prompt engineering affect results. Changing width and height alters not only the aspect ratio but also the content and feel of the generated image, with extreme ratios often producing odd results. Prompt engineering is presented as an iterative, experimental practice rather than a precise science: clear descriptions usually work better than poetic language, adding context helps shape the scene, and specifying an artistic style can dramatically change the output. By refining the coffee prompt with clearer wording, a diner-counter setting, and styles such as surrealist painting or wood etching, the chapter demonstrates the core loop of Stable Diffusion work: explore broadly, choose what works, and refine deliberately.

Stable Diffusion sure can create strange things, let’s try to avoid going too far in that direction.

Image of the upper portion of the A1111 UI.

Entering our prompt into A1111

The initial image created by our prompt: “A damn fine cup of coffee.”

Batch count and Batch size theoretically offer different ways to increase images generated

One configuration for generating 30 images at once.

30 different answer to the prompt “A damn fine cup of coffee”

Setting the value to -1 will give us a ‘random’ seed each time we hit ‘Generate’.

The recycle button will give us the seed we used previously.

The options for setting our Width and Height in the UI.

Images with a 5:3 landscape aspect ratio.

Using the same seed but reverting to 512x512 shows us the impact of aspect ratio and image size.

We can easily swap Width and Height values in A1111.

Images with a 3:5 aspect ratio .

Images with a 3:7 aspect ratio.

Images with a 4:1 aspect ratio.

A poetic prompt does not always yield poetic images.

A straight forward prompt yields more cups of black coffee.

Adding a scene to an image can help provide context.

Choosing a landscape aspect ratio helps display the counter.

Creating surrealistic images.

Images in the style of a wood etching.

I would say that’s a damn fine cup of coffee!

Summary

Generating with Stable Diffusion is an iterative process, in which we are constantly revising our settings and prompts.
Despite the many ways to improve images, it’s always a good idea to generate a variety of images to see if we find a particular one that stands out to us as pleasing.
Our prompts should be clear and descriptive. Giving some context for the object we’re prompting can change the image dramatically. Describing the style of the image can further let us change the feeling of the images we’re generating.
The aspect ratio that we use to generate an image can have a major impact on the way the image looks. Consider whether the image you want to create would look better as a square, a landscape or portrait.

FAQ

What is the main goal of this chapter?

The chapter introduces the basics of creating images from text with Stable Diffusion. It focuses on using AUTOMATIC1111’s Stable Diffusion Webui, writing better prompts, generating multiple images, controlling randomness with seeds, and understanding how image size and aspect ratio affect results.

What is AUTOMATIC1111’s Stable Diffusion Webui, or A1111?

A1111 is an open source graphical interface for creating and customizing images with Stable Diffusion. Instead of using Stable Diffusion only through the command line, A1111 provides a more beginner-friendly web interface for entering prompts, changing settings, generating batches of images, and managing outputs.

How do you start A1111 after installing it?

After installing A1111, run ./webui.sh from the installation directory on Linux or macOS. On Windows, run ./webui.bat. Once it is running, open a browser and go to http://127.0.0.1:7860 to access the interface.

What is text-to-image generation?

Text-to-image generation is the process of creating an image from a written description. In Stable Diffusion, the user enters a prompt, such as A damn fine cup of coffee, and the model generates an image based on what it interprets that text to mean.

What is a prompt in Stable Diffusion?

A prompt is the natural language description of the image you want Stable Diffusion to generate. It gives the model guidance about the subject, setting, style, and other visual details. For example, A cup of black coffee, on a diner counter, wood etching is a prompt that describes the object, scene, and artistic style.

Why should you generate many images instead of only one?

Stable Diffusion includes an element of chance, so a single generation may not produce the best result. Generating many images lets you compare outputs and choose the strongest one. The chapter describes this as an explore-and-refine process: first cast a wide net, then deliberately iterate on the best result.

What is the difference between batch size and batch count?

Batch count is the number of sequential rounds or iterations Stable Diffusion runs. Batch size is the number of images generated at the same time in each round. For example, a batch size of 5 and a batch count of 6 produces 30 images total. Larger batch sizes use more GPU VRAM.

What does the seed setting do?

The seed controls the pseudo-random starting point used to generate an image. If the seed is set to -1, A1111 chooses a random seed each time. If you use a specific seed, such as 42 or 1337, you can reproduce results more reliably when the prompt and other settings remain the same.

Why is setting a specific seed useful?

Using a specific seed makes image generation repeatable. This is helpful when you want to test how changes to the prompt, size, or other settings affect the result. If the seed keeps changing, it becomes harder to know whether an improvement came from your change or from random luck.

How do width, height, and aspect ratio affect Stable Diffusion images?

Width and height determine the image dimensions and aspect ratio. Stable Diffusion generally requires dimensions that are multiples of 8, and A1111 helps restrict values to valid options. Changing the aspect ratio can significantly affect the content of the image, not just its shape. For example, a landscape ratio may suit a diner counter scene, while a portrait ratio may encourage taller subjects.

What is prompt engineering?

Prompt engineering is the process of changing and refining prompts to get closer to the desired image. The chapter emphasizes that it is an iterative process rather than a perfectly predictable formula. You generate images, evaluate them, adjust the prompt or settings, and repeat.

Why are clear, descriptive prompts better than poetic prompts?

Stable Diffusion does not always interpret poetic language the way a human might. A phrase like black as midnight on a moonless night may sound evocative, but it may not reliably produce black coffee. A clearer prompt such as A cup of black coffee usually gives the model more direct guidance.

pro $24.99 per month

access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
choose one free eBook per month to keep
exclusive 50% discount on all purchases
renews monthly, pause or cancel renewal anytime

lite $19.99 per month

access to all Manning books, including MEAPs!

team

5, 10 or 20 seats+ for your team - learn more

eBook

pdf, ePub, online

$39.99 $25.19

you save $14.80 (37%)

pro $24.99 per month

access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
choose one free eBook per month to keep
exclusive 50% discount on all purchases
renews monthly, pause or cancel renewal anytime

lite $19.99 per month

access to all Manning books, including MEAPs!

team

5, 10 or 20 seats+ for your team - learn more

eBook

$39.99 $25.19

you save $14.80 (37%)

eBook

pdf, ePub, online

$39.99 $25.19

you save $14.80 (37%)

pro $24.99 per month

access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
choose one free eBook per month to keep
exclusive 50% discount on all purchases
renews monthly, pause or cancel renewal anytime

lite $19.99 per month

access to all Manning books, including MEAPs!

team

5, 10 or 20 seats+ for your team - learn more