Overview

1 Peeking inside the black box

Generative AI applications feel natural to use yet behave very differently to build. Unlike traditional software, where deterministic steps in code yield predictable results, GenAI systems rely on Large Language Models that take natural language as input and produce natural language as output with probabilistic, non-deterministic behavior. These models are black boxes: they don’t explain their reasoning, are stateless between calls, and carry fixed, pre-trained knowledge. The chapter sets the goal of understanding what GenAI can and cannot do, and introduces a practical path to building useful applications—often with low-code tooling like LangFlow—by embracing these unique constraints.

To make real-world apps work, developers must wrap the LLM with supporting capabilities. Memory is simulated by capturing conversation history and resending relevant snippets with each prompt, while fresh or private knowledge is supplied at inference time—manually at first, and later with systematic retrieval methods such as RAG. Beyond context and knowledge, robust systems also need to trigger external actions via tools/APIs, shape the model’s tone and behavior through prompt engineering, and, at times, coordinate multiple specialized agents. Products like ChatGPT and Copilot do much of this orchestration behind the scenes; the chapter lays the groundwork for reproducing these strategies so you can “tame the beast” for your own domain.

The chapter also peeks inside the LLM to demystify how it generates text. It explains tokenization, embeddings, and autoregressive next-token prediction, then scales up to modern transformer architectures that use attention and billions of parameters (e.g., GPT-3) trained on vast datasets with massive compute. It distinguishes training from inference, highlights techniques like supervised fine-tuning and RLHF for aligning behavior, and clarifies why knowledge cutoffs exist—and how to overcome them by augmenting prompts and retrieving current information. With this mental model, you can design applications that reliably manage context, incorporate up-to-date knowledge, and choose the right model size (from SLMs to LLMs) for the task at hand.

GenAI apps have an LLM (Magic Black Box) somewhere.

The magic box. Gets text as input and generates text as output.

LLM relationships: every chat is a first date.

Taming the GenAI beast.

The three types of machine learning: Unsupervised, supervised, and reinforcement.

The learning stages of ChatGPT.

Words are numbers in the eyes of an LLM.

Given a prompt, you can calculate the context.

Guess the next best word by combining embeddings with context.

The GPT sentence completion process.

How a GPT architecture generates sentences.

The two stages of GPT-3. First, it gets trained, and then the sentence completion is inferred.

Enhancing a pre-trained model through fine-tuning.

FAQ

What makes GenAI programming different from traditional software development?

GenAI apps incorporate a Large Language Model (LLM) that introduces non-determinism and relies on natural-language instructions. Unlike traditional, largely deterministic code that follows explicit rules, GenAI workflows must handle probabilistic outputs, manage prompt wording, and orchestrate supporting elements (like memory and knowledge retrieval) around the LLM to produce useful results.

What is a Large Language Model (LLM), and what are its key limitations?

An LLM is a “black box” that takes text as input and generates text as output. Its key characteristics: it’s non-deterministic (probabilistic), non-explanatory (can’t show why it produced an answer), stateless (no built-in memory between calls), and pre-trained (its knowledge is fixed at training time). App designers must work around these limits.

If LLMs are stateless, how do tools like ChatGPT seem to remember previous messages?

They don’t “remember” internally. The application manages conversation state by collecting prior user messages and model outputs and re-sending the relevant history with each new prompt. This creates the appearance of memory within the model’s context window (e.g., GPT‑3 allowed ~2,048 tokens; modern models support far more).

How can GenAI apps answer questions about private or up-to-date information if LLMs are pre-trained?

By injecting the needed knowledge into the prompt at inference time. Apps can attach snippets from company documents, policies, or recent updates so the LLM can ground its answer. In production, this is typically automated via retrieval (e.g., Retrieval-Augmented Generation, or RAG), which finds and attaches relevant context on demand.

How do AI, Machine Learning (ML), and Generative AI (GenAI) relate to each other?

AI is the broad goal of replicating human-like intelligence. ML is a subset of AI that learns patterns from data. GenAI is a subset of ML focused on generating content (text, images, code) that is coherent and human-like. Tools like ChatGPT are GenAI systems within ML, which itself sits within AI.

How did ChatGPT learn to converse effectively?

Through three stages: (1) Unsupervised pre-training (GPT learns to predict the next token from massive text), (2) Supervised fine-tuning (learning from example dialogues to behave in Q&A/chat formats), and (3) Reinforcement Learning from Human Feedback (RLHF), where human preferences guide better conversational behaviors.

How does GPT generate text one token at a time?

It tokenizes text, converts tokens to high-dimensional embeddings (numbers), computes a context using transformer neural networks with attention (which weighs relevant parts of the input), and then produces a probability distribution over possible next tokens. It selects a token, appends it to the prompt, and repeats—an autoregressive loop that yields sentences and paragraphs.

What are embeddings and why are they so high-dimensional?

Embeddings map tokens to vectors that capture meaning and relationships. In GPT‑3, each token is represented by 12,288 numbers, and the vocabulary is ~50,000 tokens. The high dimensionality allows the model to encode subtle semantic nuances so that related concepts lie “near” each other in this numeric space.

What’s the difference between pre-training and inference, and why is training so resource-intensive?

Pre-training determines the model’s parameters by learning from vast text (e.g., ~45 TB for GPT‑3) using GPUs and massive compute (on the order of thousands of petaflop-days). Inference is the later stage where the fixed parameters are applied to new prompts to generate completions. GPT‑3’s architecture involves ~175B parameters, making training and storage extremely heavy compared to inference.

How do we keep an LLM’s knowledge current or tailor it to a domain?

For recency, use retrieval to enrich prompts with up-to-date sources (RAG). For specialization, perform fine-tuning—additional supervised training on domain-specific prompts and desired completions—to adapt behavior, style, or expertise beyond the base model’s pre-trained knowledge.

GenAI apps have an LLM (Magic Black Box) somewhere.

The magic box. Gets text as input and generates text as output.

LLM relationships: every chat is a first date.

Taming the GenAI beast.

The three types of machine learning: Unsupervised, supervised, and reinforcement.

The learning stages of ChatGPT.

Words are numbers in the eyes of an LLM.

Given a prompt, you can calculate the context.

Guess the next best word by combining embeddings with context.

The GPT sentence completion process.

How a GPT architecture generates sentences.

The two stages of GPT-3. First, it gets trained, and then the sentence completion is inferred.

Enhancing a pre-trained model through fine-tuning.

pro $24.99 per month

lite $19.99 per month

team

pro

team

pro

team