click to
read an excerpt

Look inside

read now watch video edition

ch 1 audio

Resources

chapter briefs Source code Book forum Source code on Github Register your pBook for a free eBook more

Become a
Reviewer

Help us create great books

Build a Text-to-Image Generator (from Scratch)

you own this product

With transformers and diffusions

Mark Liu

December 2025
ISBN 9781633435421
360 pages

Included with a Manning Online subscription

printed in black & white

catalog / Data Science / Deep Learning / Generative AI

resources: Source code Book forum Source code on Github Register your pBook for a free eBook

pro $24.99 per month

access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
choose one free eBook per month to keep
exclusive 50% discount on all purchases
renews monthly, pause or cancel renewal anytime

lite $19.99 per month

access to all Manning books, including MEAPs!

team

5, 10 or 20 seats+ for your team - learn more

eBook

pdf, ePub, online

$47.99 $33.59

you save $14.40 (30%)

include audio $24.99 $17.49

pro $24.99 per month

access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
choose one free eBook per month to keep
exclusive 50% discount on all purchases
renews monthly, pause or cancel renewal anytime

lite $19.99 per month

access to all Manning books, including MEAPs!

team

5, 10 or 20 seats+ for your team - learn more

eBook

$47.99 $33.59

you save $14.40 (30%)

include audio $24.99 $17.49

Look inside

Build your own vision transformer and diffusion models for text-to-image generation–from scratch!

Build a Text-to-Image Generator (from Scratch) takes you step-by-step through creating your own AI models that can generate images from text. You’ll explore two methods of image generation—vision transformers and diffusion models—and learn vital AI development techniques as you go.

Build a Text-to-Image Generator (from Scratch) teaches you how to:

Build and train models to generate high resolution images based on text descriptions
Edit an existing image based on text prompts
Build and train a model to add captions to images
Build and train a vision transformer to classify images
Fine-tune LLMs for downstream tasks such as classification, text or image generation
Better differentiate real images from deepfakes

Build a Text-to-Image Generator (from Scratch) dives into the powerful models behind AI image generators. The best way to learn is to build something from scratch, and in this book you’ll build your very own diffusion model and vision transformer. As you work through each stage of development, you’ll develop an understanding of how these models can be customized, applied, and integrated for impressive multimodal AI.

about the technology

AI-generated images appear everywhere from high-end advertising to casual social media feeds. Text-to-image tools like Dall-e, Midjourney, and Flux make it easy to create AI art, but how do they work? In this book, you’ll find out by building your own text-to-image generator!

about the book

Build a Text-to-Image Generator (from Scratch) explores both transformer-based image generation and diffusion models. You’ll work hands-on to build a pair of simple generation models that can classify images, automatically add captions, reconstruct images, and enhance existing graphics. Author Mark Liu guides you every step of the way with clear explanations, informative diagrams, and eye-opening examples you can build on your own laptop.

what's inside

Build a vision transformer to classify images
Edit images using text prompts
Fine-tune image models

about the reader

Requires basic knowledge of generative AI models and intermediate Python skills.

about the author

Mark Liu is the founding director of the Master of Science in Finance program at the University of Kentucky. He is also the author of Learn Generative AI with PyTorch.

eBook

pdf, ePub, online

$47.99 $33.59

you save $14.40 (30%)

include audio $24.99 $17.49

pro $24.99 per month

access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
choose one free eBook per month to keep
exclusive 50% discount on all purchases
renews monthly, pause or cancel renewal anytime

lite $19.99 per month

access to all Manning books, including MEAPs!

team

5, 10 or 20 seats+ for your team - learn more

eBook

$47.99 $33.59

you save $14.40 (30%)

include audio $24.99 $17.49

A practical and readable introduction with working code and clear explanations.

Andrey Lukyanenko, Meta

Empowers you to unlock creativity at the intersection of text and imagery.

Bojan Tunguz, Tabul.AI

Amazingly comprehensive, hype-free, hands-on, and code-rich guidebook.

Kirk Borne, Data Leadership Group

Successfully brings together the theoretical foundations and practical applications, from transformers to diffusion models.

Raymond Cheung, Parity Technologies

Build a Text-to-Image Generator (from Scratch)

pro $24.99 per month

lite $19.99 per month

team

pro $24.99 per month

lite $19.99 per month

team

about the technology

about the book

what's inside

about the reader

about the author

pro $24.99 per month

lite $19.99 per month

team

Add to Reading List

related titles

related titles

pro

team

pro

team

pro

team