Overview

1 Making our First Image: “A damn fine cup of coffee”

This chapter introduces Stable Diffusion through a creative lens, comparing its remixing of visual culture to Grandmaster Flash’s early turntable experimentation. It frames text-to-image generation as an accessible way to turn daydreams into images, while emphasizing that mastery still requires iteration, technique, and tooling. Because Stable Diffusion is open source, learners benefit from a fast-evolving ecosystem, community knowledge, and the freedom to customize the software—all achievable on consumer hardware.

Readers are guided to create their first image with CompVis Stable Diffusion using the prompt “A damn fine cup of coffee,” learning core concepts along the way. The chapter stresses generating many candidates, clarifies the difference between iterations and batch size (with limited practical speed gains from batches and higher VRAM demands), and shows how seeds enable both reproducibility and variety (default 42). It demonstrates how changing width and height—especially aspect ratio—does more than reshape the canvas; it meaningfully shifts composition and subject emphasis.

Foundational prompt-engineering habits follow: favor clear, descriptive language over poetic phrasing, supply scene context to steer composition, and specify stylistic cues (for example, “surrealist painting” or “wood etching”) to avoid the uncanny valley and better match intent. The chapter closes by showcasing the power of open source through a small code edit that disables the built-in NSFW “rick-roll” replacement, underscoring user control over model behavior. Overall, it equips readers with practical commands, mental models, and iterative tactics to move from vague ideas to images that feel intentional and personal.

Getting my imagination on the screen
Browsing an infinite library of Pulp Sci-Fi that never was.
Who knew monks were such avid readers of sci-fi?
Envisioning ancient aliens.
The initial 6 images created by our prompt: “A damn fine cup of coffee.”
Average seconds to create an image, comparing iterations and batch.
Generating 30 images at once.
Creating 6 different images with seed 12345.
Images with a 5:3 landscape aspect ratio.
Images with a 3:5 aspect ratio using the same seed.
Images with a 3:7 aspect ratio using the same seed.
Images with a 4:1 aspect ratio using the same seed.
A poetic prompt does not always yield poetic images.
A straight forward prompt yields more cups of black coffee.
Adding a scene to an image can help provide context.
Choosing a landscape aspect ratio helps display the counter.
Creating surrealistic images.
Images in the style of a wood etching.
I would say that’s a damn fine cup of coffee!
Being “Rick-rolled” by Stable Diffusion.

Summary

  • Generating with Stable Diffusion is an iterative process, in which we are constantly revising our settings and prompts.
  • Despite the many ways to improve images, it’s always a good idea to generate a variety of images to see if we find a particular one that stands out to us as pleasing.
  • Our prompts should be clear and descriptive. Giving some context for the object we’re prompting can change the image dramatically. Describing the style of the image can further let us change the feeling of the images we’re generating.
  • The aspect ratio that we use to generate an image can have a major impact on the way the image looks. Consider whether the image you want to create would look better as a square, a landscape or portrait.
  • Because Stable Diffusion is open source we (as well as the entire community of users) can change and extend its behavior.

FAQ

What do I need installed to follow this chapter?Install the CompVis Stable Diffusion repo (https://github.com/CompVis/stable-diffusion) and the v1.5 model checkpoint (https://huggingface.co/runwayml/stable-diffusion-v1-5). Activate the conda environment you created during setup (e.g., “ldm”).
How do I generate my first image (“a damn fine cup of coffee”)?From the stable-diffusion folder, run: python ./scripts/txt2img.py --prompt="A damn fine cup of coffee". If your model file isn’t in the default spot, add --ckpt=/path/to/model.ckpt.
I got “model.ckpt not found.” How do I fix it?Put the checkpoint where the Appendix instructs or pass the explicit path with --ckpt=/full/path/to/model.ckpt when running txt2img.py.
Where are images saved and how many are created by default?Outputs are placed in stable-diffusion/outputs/txt2img-samples/samples. By default, 6 images are created (n_iter=2 times n_samples=3).
What’s the difference between iterations (n_iter) and batch size (n_samples)?n_iter creates images sequentially; n_samples creates multiple images simultaneously on the GPU. Larger n_samples uses more VRAM and, in practice here, offers little speed-up per image. Generating more than one image at a time still helps amortize startup overhead.
How do I generate more images at once (e.g., 30)?Use both flags, for example: python ./scripts/txt2img.py --prompt="A damn fine cup of coffee" --n_samples=5 --n_iter=6. If VRAM is limited, lower n_samples and raise n_iter to reach your target count.
Why did I get duplicate images across runs? What is the seed?Stable Diffusion uses a pseudo-random number generator with a seed. The default seed is 42, so runs can repeat. Set --seed to a different value (e.g., --seed=12345) to get new images while keeping other settings the same.
How do I change size/aspect ratio, and are there constraints?Use --H (height) and --W (width). Dimensions must be multiples of 8 (the chapter sticks to multiples of 128). VRAM usage grows with H×W. Aspect ratio significantly affects composition (e.g., landscape vs portrait), so use it to guide framing rather than chasing high resolution directly.
How can I improve results with prompt engineering?Be clear and descriptive (“A cup of black coffee”). Add scene/context (“on a diner counter”). Specify style (“surrealist painting”, “wood etching”). Iterate: generate, review, tweak words and settings, repeat.
Why did I see a Rick Astley image and can I disable the NSFW filter?That placeholder appears when the safety checker flags content as NSFW. Because this code is open source, you can optionally disable it by editing scripts/txt2img.py and replacing the check_safety function to simply return the image and False. Do so only if you understand and accept the implications.

pro $24.99 per month

  • access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
  • choose one free eBook per month to keep
  • exclusive 50% discount on all purchases
  • renews monthly, pause or cancel renewal anytime

lite $19.99 per month

  • access to all Manning books, including MEAPs!

team

5, 10 or 20 seats+ for your team - learn more


choose your plan

team

monthly
annual
$49.99
$399.99
only $33.33 per month
  • five seats for your team
  • access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
  • choose another free product every time you renew
  • choose twelve free products per year
  • exclusive 50% discount on all purchases
  • renews monthly, pause or cancel renewal anytime
  • renews annually, pause or cancel renewal anytime
  • A Damn Fine Stable Diffusion Book ebook for free
choose your plan

team

monthly
annual
$49.99
$399.99
only $33.33 per month
  • five seats for your team
  • access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
  • choose another free product every time you renew
  • choose twelve free products per year
  • exclusive 50% discount on all purchases
  • renews monthly, pause or cancel renewal anytime
  • renews annually, pause or cancel renewal anytime
  • A Damn Fine Stable Diffusion Book ebook for free