Overview

2 Core components: Large Language Models, prompting, and agents

This chapter establishes the core building blocks of effective AI agents: large language models as the probabilistic “brain,” prompt engineering as the primary control surface, and an agents SDK to orchestrate roles, tools, and workflows. It frames agents as systems that turn raw language generation into directed, auditable action by combining sound LLM fundamentals with disciplined prompting and structured interfaces. Readers are guided from conceptual grounding to hands‑on construction, assembling a minimal agent and iteratively expanding it with determinism controls, typed I/O, tool use, and execution tracing.

The chapter first demystifies LLMs as token predictors, explaining tokenization, training/inference dynamics, and why input size and format affect cost and uncertainty. It shows how parameters such as temperature, top‑p, and penalties shape variability and length, while stressing that these knobs only influence behavior—true reliability comes from well‑crafted prompts. A practical prompt toolbox follows: define a clear persona, front‑load directives, use delimiters, be specific, add examples, encourage step‑by‑step reasoning, state positive rules, remove ambiguity, pick suitable models/settings, and iterate. It also warns against common pitfalls—overlong or contradictory instructions, micro‑prompt fragmentation, inconsistent delimiters, and over‑specification—encouraging structured, workflow‑oriented prompts that “think like an LLM,” and noting that agents, unlike single LLM calls, can persist through loops and decisions.

Building on this foundation, the chapter uses the OpenAI Agents SDK to implement a minimal research planner, then tunes model settings for consistency and cost control. It introduces strongly typed outputs (via data models) to reduce variability and prevent brittle handoffs between steps, advocating strict schemas over permissive parsing. Execution tracing is highlighted as a cornerstone for debugging and optimization. Finally, the chapter equips agents with tools through lightweight decorators, explains tool‑chaining patterns, and outlines guardrails: limit tool count to cap overhead and complexity, anticipate failures and retries, and grant only the authority you can safely accept. The result is a pragmatic recipe for assembling reliable, extensible agents: prompt‑driven, parameter‑aware, strictly typed, tool‑enabled, and thoroughly traced.

The training cycle and inference/generation process of a large language model. On the left is the training process of the model where documents are ingested to train the LLM in a first pass. On the right, a user enters a prompt which is first tokenized and fed into the model which then outputs probabilities it uses to sample and produce the next token.
A comparison of tokenization of regular text compared to JSON.
The various parameters that can be used to modify an LLMs output. The right side of the figure provides an expanded view of the models predicting and sampling the tokens to generate output. On the left we see the various parameters that can be used to alter the sampling of the next token within a model.
A complex workflow showing what a search researcher may perform. The workflow illustrates the tasks an agent may undertake, complete with decision points (circles) and the flow from one task to another. (The figure is overly complex just to demonstrate the workflow.)In the figure, the workflow illustrates how we want an LLM or agent to perform a series of tasks and how it considers the output to make decisions based on the task results. The workflow details are not critical, and the complexity is used to demonstrate how complex instruction prompts can be constructed by following good prompt engineering practices. (Don’t be alarmed if you can’t follow this figure; it is just a toy example of complexity to the extreme.)
Comparing how an agent workflow may run with and without typed outputs/inputs. At the top the agent does not use strict outputs and could respond in any fashion, increasing variability when passing output to the next agent. Conversly, at the bottom the agent provides strict outputs which reduces variability and improves clarify for agents or processes receiving the output.
An example of the OpenAI Traces screen for reviewing your agent execution. At top you can see the workflow/trace from the agent and below that the specific details about calls to the underlying LLM including the inputs, model, tokens, instructions and output from the call
The Traces page for the Deep Research Workflow. This step represents the Research Planner agent making a call to the LLM and receiving a JSON output of a plan (list of tasks) to perform to achieve a research goal.
The various patterns for an agent to consume tools. Tools may be internal code functions or external connections to MCP servers, hosted locally or remotely and connected through using MCP protocols.
OpenAI Traces page showing the various tool calls and LLM responses executed by the agent

Summary

  • Large Language Models are probabilistic token-predictors; understanding tokenization and probability drives effective cost, context, and quality control.
  • Text length ≠ token length—measure tokens (e.g., with tiktoken or the Agents SDK telemetry) to keep budgets and context windows in check.
  • Generation knobs such as temperature, top-p, max_tokens, and penalty terms let you trade off creativity, consistency, and expense for each agent role.
  • Carefully crafted prompts follow the basic rules of clear persona, front-loaded instructions, structured delimiters, few-shot examples, and chain-of-thought, steering LLMs toward reliable, on-spec output.
  • Well-structured prompts avoid common pitfalls (over-complexity, contradictions, ambiguous delimiters, or variable output) and make agents safer and cheaper.
  • The OpenAI Agents SDK turns a prompt into a runnable agent; you can pin specific models and parameter settings to match the agent’s task profile.
  • Typed input/output schemas (Pydantic) eliminate brittle string parsing and keep multi-agent workflows stable despite the LLM’s stochastic nature.
  • Built-in tracing in the OpenAI API exposes every LLM interaction and tool invocation, giving vital observability for debugging and optimization.
  • Granting agents tools—local functions now, MCP-hosted services later—provides true agency; limit the tool list to reduce token overhead and failure risk.
  • Tool chaining lets an agent sequence multiple tool calls autonomously; trace data reveals the decision path and highlights performance bottlenecks.
  • Combining prompt engineering, model tuning, typed schemas, tracing, and curated tool sets yields production-ready agents that can confidently plan, reason, and execute deep research tasks.

FAQ

What does it mean that LLMs are “probabilistic token machines”?LLMs read input text as tokens (numeric IDs) and predict the next token based on learned probabilities. During training, the model minimizes error between predicted and expected tokens over billions of examples (often refined via RLHF). At inference, it repeatedly samples the next token until an end-of-sequence token is reached. The “intelligence” comes from the internal probability landscape learned during training, not from true understanding.
What is a token, and why does tokenization matter for cost and reliability?A token is a word or word-piece the model uses as its atomic unit. Tokenization can make structured text like JSON much longer (in tokens) than the same content written as prose, so text length is not a reliable proxy for token count. More tokens usually increase cost (output tokens often cost more than input) and can raise uncertainty. Use tools like tiktoken or the OpenAI Agents SDK’s built-in accounting to measure tokens.
How do temperature, top-p, presence/frequency penalties, and max_tokens affect output?Temperature scales token probabilities (higher = more random/creative; lower = more deterministic). Top-p limits sampling to a nucleus of likely tokens for tighter control. Presence and frequency penalties reduce repetition and encourage novelty. Max_tokens caps response length to control cost and verbosity. In practice, adjust a few knobs to match the task and let prompting do most of the steering.
Which generation settings should I usually change for agents?Most agents only need temperature (often 0.0 for consistency, higher for creativity) and max_tokens (to prevent rambling and control cost). Leave other parameters at defaults unless you have a specific issue (e.g., repetition). Remember that settings influence output but do not guarantee it; even temperature 0 can produce slight variation.
Why is prompt engineering essential, not hype?Good prompts make outputs more predictable, safe, and cost-efficient. Core techniques include: assign a clear role/persona, front‑load instructions and use delimiters, be specific about length/audience/objective, provide few‑shot examples, request step‑by‑step reasoning, emphasize positive instructions, eliminate ambiguity with numeric bounds, pick the right model/settings, and iterate. If you use typed outputs via the OpenAI Agents SDK, you often don’t need to hard-specify output formats in the prompt.
What common prompt pitfalls should I avoid?Avoid overlong, multi-topic prompts (split into steps/roles), contradictory rules (“be concise” vs “explain in depth”), and too many micro-prompts that cause latency. Keep delimiters consistent, don’t overload with dozens of constraints in one shot, and remember temperature 0 reduces but does not eliminate variability. Use guardrails, retries, and structured outputs to stabilize pipelines.
How do I build a minimal agent with the OpenAI Agents SDK?Define clear instructions, create an Agent with a name and those instructions, then run it with Runner.run_sync(input=...). Print result.final_output. Start simple (e.g., a plan with five concise steps) and iterate. Optionally set model and model_settings for consistency and length control.
Why and how should I use typed outputs (strict JSON) with agents?Typed outputs reduce variability, prevent parsing errors, and allow safe handoffs between agents. Define a Pydantic BaseModel (or TypedDict structures) and pass output_type=YourModel to the Agent. If you hit strict JSON errors, fix the schema (e.g., prefer a list of TypedDict items over dict[int, str], and forbid extra fields) rather than disabling strict mode, which can hide bugs.
What is tracing, and how does it help in agent development?Tracing (enabled by default with the OpenAI API) records each agent step, including LLM inputs/outputs, tokens, settings, and tool calls. View traces in the OpenAI Dashboard to debug, optimize, and audit workflows. Use a trace("Your Workflow Name") context manager to label and group runs for easier analysis.
How do I give agents tools, and how many should I register?Expose functions as tools (e.g., with a decorator) and register them via tools=[...]. Be explicit in prompts about when and why to use each tool. Limit tools to what’s necessary (often 5–10) because each tool adds token overhead, complexity, and failure points—and grants decision power to the agent. Expect and analyze tool chaining in traces to verify correct sequencing and performance.

pro $24.99 per month

  • access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
  • choose one free eBook per month to keep
  • exclusive 50% discount on all purchases
  • renews monthly, pause or cancel renewal anytime

lite $19.99 per month

  • access to all Manning books, including MEAPs!

team

5, 10 or 20 seats+ for your team - learn more


choose your plan

team

monthly
annual
$49.99
$499.99
only $41.67 per month
  • five seats for your team
  • access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
  • choose another free product every time you renew
  • choose twelve free products per year
  • exclusive 50% discount on all purchases
  • renews monthly, pause or cancel renewal anytime
  • renews annually, pause or cancel renewal anytime
  • AI Agents in Action, Second Edition ebook for free
choose your plan

team

monthly
annual
$49.99
$499.99
only $41.67 per month
  • five seats for your team
  • access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
  • choose another free product every time you renew
  • choose twelve free products per year
  • exclusive 50% discount on all purchases
  • renews monthly, pause or cancel renewal anytime
  • renews annually, pause or cancel renewal anytime
  • AI Agents in Action, Second Edition ebook for free