Overview

1 Making our First Image: “A damn fine cup of coffee”

The chapter introduces Stable Diffusion through an analogy to early hip‑hop: like DJs remix familiar records into something new, the model recombines learned visual concepts to produce novel images. It frames generative imaging as a playful way to realize ideas we’ve long doodled in notebooks, emphasizing that while first results come quickly, mastery takes experimentation, technique, and iteration. The open‑source nature of Stable Diffusion empowers users to run, customize, and extend the system on consumer hardware, supported by a fast‑evolving ecosystem of community tools and methods.

Readers are guided through creating their first text‑to‑image outputs with the prompt “A damn fine cup of coffee,” learning core controls and trade‑offs. The chapter explains iterations versus batch size (time, VRAM, and practicality), the role of seeds in reproducibility (default 42) and variety, and how image dimensions—especially aspect ratio—meaningfully shape composition and content. A key habit is to generate many candidates and curate the best, treating the model less as an all‑imaginative artist and more like a search engine over possible images, responsive to precise wording and settings.

Prompt engineering fundamentals follow: prefer clear, descriptive phrasing over poetic ambiguity; add scene context to steer composition; and specify stylistic cues (for example, surrealist painting, wood etching) to avoid the uncanny valley and evoke desired aesthetics. The chapter also demonstrates the power of open source by exposing the built‑in NSFW safety check that can replace outputs with a placeholder image and shows how to optionally disable it in code—underscoring user agency. It concludes by reinforcing an iterative workflow of prompt refinement, parameter tuning, and selective curation as the path to images that truly match one’s intent.

Getting my imagination on the screen
Browsing an infinite library of Pulp Sci-Fi that never was.
Who knew monks were such avid readers of sci-fi?
Envisioning ancient aliens.
The initial 6 images created by our prompt: “A damn fine cup of coffee.”
Average seconds to create an image, comparing iterations and batch.
Generating 30 images at once.
Creating 6 different images with seed 12345.
Images with a 5:3 landscape aspect ratio.
Images with a 3:5 aspect ratio using the same seed.
Images with a 3:7 aspect ratio using the same seed.
Images with a 4:1 aspect ratio using the same seed.
A poetic prompt does not always yield poetic images.
A straight forward prompt yields more cups of black coffee.
Adding a scene to an image can help provide context.
Choosing a landscape aspect ratio helps display the counter.
Creating surrealistic images.
Images in the style of a wood etching.
I would say that’s a damn fine cup of coffee!
Being “Rick-rolled” by Stable Diffusion.

Summary

  • Generating with Stable Diffusion is an iterative process, in which we are constantly revising our settings and prompts.
  • Despite the many ways to improve images, it’s always a good idea to generate a variety of images to see if we find a particular one that stands out to us as pleasing.
  • Our prompts should be clear and descriptive. Giving some context for the object we’re prompting can change the image dramatically. Describing the style of the image can further let us change the feeling of the images we’re generating.
  • The aspect ratio that we use to generate an image can have a major impact on the way the image looks. Consider whether the image you want to create would look better as a square, a landscape or portrait.
  • Because Stable Diffusion is open source we (as well as the entire community of users) can change and extend its behavior.

FAQ

How do I generate my first image with the prompt “A damn fine cup of coffee”?Activate your environment, then run the txt2img script from the Stable Diffusion repo root: - conda activate ldm - python ./scripts/txt2img.py --prompt="A damn fine cup of coffee" If your checkpoint isn’t in the default path, add: --ckpt=/path/to/model.ckpt
I get “model.ckpt not found.” How do I fix it?Place the v1.5 checkpoint where the script expects it (see the book’s appendix), or pass the full path with: --ckpt=/path/to/model.ckpt. Example: python ./scripts/txt2img.py --prompt="A damn fine cup of coffee" --ckpt=/models/sd-v1-5.ckpt
Where are the generated images saved by default?In the CompVis implementation, txt2img saves to: stable-diffusion/outputs/txt2img-samples/samples
What’s the difference between n_iter and n_samples (batch size)?- n_iter: how many sequential iterations to run (total rounds) - n_samples: how many images generated in parallel per iteration (batch size) Batches use more GPU VRAM. In practice, larger batches don’t provide a big speedup per image; many users stick to small batches (often 1) and raise n_iter for volume.
How can I generate 30 images in one go?Multiply iterations by batch size to reach 30. For example: python ./scripts/txt2img.py --prompt="A damn fine cup of coffee" --n_samples=5 --n_iter=6 Adjust values to fit your VRAM if batch 5 is too large.
Why did the first 6 images in my 30-image run repeat?The script uses a fixed random seed for reproducibility (default seed is 42). If you don’t change it, the first set repeats. Set a new seed, e.g.: --seed=12345
What is the seed and why does it matter?The seed initializes the pseudo-random number generator. With the same prompt, settings, and seed, results are exactly repeatable. Change the seed to explore new variations, or keep it fixed to reproduce a result later (or match the book’s outputs).
How do I set image size and aspect ratio, and are there limits?Use --H (height) and --W (width). Dimensions must be multiples of 8; the chapter uses multiples of 128 for simplicity. Larger H×W needs more VRAM. Aspect ratio strongly affects composition (e.g., landscape for wide scenes, portrait for tall subjects). Generate high-res later via dedicated upscaling workflows.
How can I improve results with prompt engineering?- Prefer clear, descriptive phrasing over poetic language (e.g., “A cup of black coffee”). - Add context/scene (e.g., “on a diner counter”). - Specify style (e.g., “surrealist painting”, “wood etching”). - Iterate: generate, review, tweak prompt/settings, repeat.
What’s the “Rick Astley” image and can I disable the NSFW safety checker?When the safety checker flags an image as NSFW, it replaces it with a placeholder (often the “Rickroll”). Because CompVis Stable Diffusion is open source, you can disable this in scripts/txt2img.py by redefining check_safety to return the original image (e.g., def check_safety(x_image): return x_image, False). Do this only if you understand and accept the implications.

pro $24.99 per month

  • access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
  • choose one free eBook per month to keep
  • exclusive 50% discount on all purchases
  • renews monthly, pause or cancel renewal anytime

lite $19.99 per month

  • access to all Manning books, including MEAPs!

team

5, 10 or 20 seats+ for your team - learn more


choose your plan

team

monthly
annual
$49.99
$399.99
only $33.33 per month
  • five seats for your team
  • access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
  • choose another free product every time you renew
  • choose twelve free products per year
  • exclusive 50% discount on all purchases
  • renews monthly, pause or cancel renewal anytime
  • renews annually, pause or cancel renewal anytime
  • A Damn Fine Stable Diffusion Book ebook for free
choose your plan

team

monthly
annual
$49.99
$399.99
only $33.33 per month
  • five seats for your team
  • access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
  • choose another free product every time you renew
  • choose twelve free products per year
  • exclusive 50% discount on all purchases
  • renews monthly, pause or cancel renewal anytime
  • renews annually, pause or cancel renewal anytime
  • A Damn Fine Stable Diffusion Book ebook for free