This chapter introduces Generative AI as a fundamentally different way to build software. Instead of crafting fully deterministic logic, you design around a probabilistic Large Language Model that reads and writes natural language. These models are powerful yet “black-box”: non-deterministic, non-explanatory, stateless, and pre-trained. The aim is to equip you with a practical understanding of what GenAI apps can do, where they fall short, and how to harness them effectively—ultimately moving from simply using tools like ChatGPT to confidently building your own with low-code tooling such as LangFlow.
Turning an LLM into a reliable product requires wrapping it with the right architecture. Applications simulate memory by storing and resending conversation history so each model call has the context it needs, and they inject fresh or private knowledge into prompts to overcome the model’s static training—later formalized as Retrieval-Augmented Generation. Beyond memory and retrieval, real systems integrate tools/APIs to trigger actions, use prompt engineering to control tone and behavior, and can orchestrate multiple specialized agents to collaborate. In short, successful GenAI apps are not just a prompt plus a model; they are carefully engineered pipelines that manage state, knowledge, control, and orchestration around the LLM.
The chapter also demystifies how LLMs work. It builds intuition with tokens, embeddings, and autoregressive next-token prediction, then contrasts a toy example with GPT-3’s real scale: large vocabularies, high-dimensional embeddings, deep transformer stacks with attention and positional encoding, and additional networks to score next tokens. It distinguishes training from inference, outlining massive pretraining corpora and compute, followed by human-guided refinement (supervised fine-tuning and RLHF) that shaped ChatGPT’s conversational abilities. Finally, it explains why knowledge is fixed to a cutoff date and models are stateless by default—and how fine-tuning plus retrieval mechanisms let applications stay accurate, up to date, and tailored to specialized domains.
GenAI apps have an LLM (Magic Black Box) somewhere.
The magic box. Gets text as input and generates text as output.
LLM relationships: every chat is a first date.
Taming the GenAI beast.
The three types of machine learning: Unsupervised, supervised, and reinforcement.
The learning stages of ChatGPT.
Words are numbers in the eyes of an LLM.
Given a prompt, you can calculate the context.
Guess the next best word by combining embeddings with context.
The GPT sentence completion process.
How a GPT architecture generates sentences.
The two stages of GPT-3. First, it gets trained, and then the sentence completion is inferred.
Enhancing a pre-trained model through fine-tuning.
FAQ
What makes GenAI programming different from traditional programming?Traditional apps follow explicit, deterministic instructions written in a programming language. GenAI apps route part of the workflow through a Large Language Model (LLM) that takes natural-language prompts and produces non-deterministic outputs. This introduces randomness, ambiguity, and prompt design as core engineering concerns.What is a Large Language Model (LLM), and why is it called a “black box”?An LLM is a model that takes text as input and generates text as output. It operates probabilistically and does not expose an interpretable explanation of why it produced a specific answer, so its behavior is non-deterministic and non-explanatory—hence the “black box.”If LLMs are stateless and pre-trained, why do apps like ChatGPT seem to remember and learn?Memory and “learning” are added by the application layer, not the LLM itself. The app preserves conversation history and resends relevant snippets with each request (simulating memory), and it injects fresh or domain-specific knowledge into prompts (simulating learning). The underlying LLM remains stateless and fixed after training.How can I add “memory” to a GenAI application?Store prior user messages and model responses, then append the relevant history to each new prompt. This makes every LLM call self-sufficient and preserves context across turns. Practical designs select and forward only the most relevant snippets to stay within the model’s context window.How do I provide up-to-date or private knowledge to the LLM?Inject the needed facts directly into the prompt at inference time. For small corpora, you can attach the full context; otherwise, use a deterministic search (and later, Retrieval-Augmented Generation) to fetch and include only the most relevant passages with the user’s query.What are tokens and embeddings, and how does a GPT-style model generate text?Text is split into tokens (words, word pieces, or punctuation) and converted into numerical vectors called embeddings. The model computes a context from the prompt and predicts the most likely next token; it repeats this autoregressively to produce sentences, paragraphs, or full documents.What training stages teach a model like ChatGPT to converse?First, unsupervised pre-training teaches the model to predict the next token across massive text corpora. Then supervised fine-tuning conditions it for Q&A and dialog with curated examples. Finally, Reinforcement Learning from Human Feedback (RLHF) aligns responses with human preferences for helpfulness and safety.How does a real model like GPT-3 differ from the simplified intuition?GPT-3 operates at massive scale: tens of thousands of vocabulary tokens, high‑dimensional embeddings, and deep transformer networks with hundreds of billions of parameters. It uses attention to account for word positions and dependencies, and it’s limited by a finite context window (e.g., 2,048 tokens in GPT‑3, much larger in modern models).What’s the difference between pre-training and inference, and how do I handle the knowledge cutoff?Pre-training (done once) learns the model’s parameters from huge datasets; inference (real time) uses those fixed parameters to answer prompts. Because knowledge is fixed up to a cutoff date, you add recent or proprietary facts via retrieval in the prompt, and optionally fine-tune the model on new, domain-specific examples to specialize behavior.Beyond calling an LLM, what else do real GenAI apps need?They orchestrate memory management, knowledge retrieval (e.g., RAG), integration with external tools/APIs to take actions, careful prompt engineering to control tone and objectives, and sometimes multi-agent designs. The book demonstrates these patterns with low‑code tooling like LangFlow.
pro $24.99 per month
access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!