Overview

5 Agent Reasoning and Planning

This chapter explains how reasoning and planning transform LLM-powered agents from mere token predictors into purposeful problem solvers. Reasoning is framed as decomposing a goal into smaller tasks, while planning is the synthesis of these tasks into an executable strategy. The text contrasts non-reasoning models that need explicit prompting with newer models that natively exhibit structured thinking, then introduces a toolkit of patterns for eliciting, organizing, and controlling an agent’s thought process. It culminates by positioning the Sequential Thinking MCP server as a shared, model-agnostic scratchpad that helps agents manage multi-step reasoning and evolving plans.

The core techniques begin with Chain-of-Thought (CoT), which guides step-by-step reasoning and makes intermediate thoughts explicit, and ReAct, which interleaves thoughts with tool calls and observations in a feedback loop. Beyond single exchanges, planning equips agents with a global view over longer horizons so they can organize sub-tasks, act, observe, and revise. The chapter shows how concise instructions and tool definitions imbue agents with these behaviors, illustrates them on time-travel puzzles, and weighs trade-offs: transparency and reproducibility versus higher token usage and latency, plus the need to tailor prompts to each model’s style.

For harder problems, the text introduces Tree-of-Thoughts to explore multiple reasoning branches with evaluation and pruning, and Reflexion to iteratively improve solutions through a solver–critic feedback loop. These advanced strategies can be combined with CoT/ReAct and supported by the Sequential Thinking server to record, revise, verify, and coordinate plans. Practical guidance is offered for selecting patterns based on task complexity, cost, and responsiveness, with the caveat that outcomes vary by model. The chapter closes by underscoring reasoning and planning as core to capable agents and providing exercises that help readers practice, compare, and operationalize these approaches.

compares differences to prompting strategies with reasoning and non-reasoning models
compares the LLM thought process when using CoT prompting on the left and not using any explicit reasoning instructions.
demonstrates the ReAct paradigm of reasoning in a sequential diagram. The agent thought process occurs in the loop where the LLM first reasons, plans and then executes on the plan. After each execution of a task (tool) the LLM observes the output and then reasons if it needs to continue looping or the plan has been completed.
shows a partial workflow of the tree of thought process where thoughts are branched into nodes and each node is executed to determine which is the best path to follow.
illustrates the Reflexion reasoning strategy in a step-by-step flowchart. Each step in the flow demonstrates a task, decision, or output that can occur in an LLM or deterministic code.
shows the Traces page for the Time Travel Agent execution, from here we can see how the agent goes through the ReAct pattern of reasoning, acting, observing and reasoning again.

Summary

  • Large Language Models don’t “think” by default—they’re token-predictors—so agents must inject structured reasoning and planning to achieve multi-step goals.
  • Chain-of-Thought (CoT) prompting turns a model’s hidden intuition into explicit, step-by-step thoughts that are easy to debug, at the cost of extra tokens and latency.
  • ReAct augments CoT with tool calls: Reason → Act → Observe → Repeat, letting an agent gather information dynamically while iteratively refining its plan.
  • High-level planning goes beyond single chains: agents can ask an LLM to draft a strategic outline, then revise it as real-world feedback arrives.
  • Tree-of-Thoughts (ToT) explores many branches in parallel, pruning losers and expanding promising paths—powerful for complex search tasks but extremely token-hungry.
  • Reflexion wraps a solver–critic loop around any reasoning strategy: the critic provides feedback, the solver revises, and the cycle repeats until the answer passes a self-defined check.
  • Choosing a strategy is task-dependent: CoT for logic puzzles, ReAct for tool-heavy look-ups, ToT for deep planning, Reflexion for iterative improvement—mix and match as needs evolve.
  • The Sequential Thinking MCP server acts as a universal “scratchpad” tool; agents write, revise, and branch thoughts there while still using ReAct or ToT patterns externally.
  • Combining strategies scales up reasoning: e.g. CoT to draft a plan, ReAct+ToT to execute/branch it, Reflexion to self-grade and retry—expect high latency but high reliability.
  • Guardrails, schemas, and typed outputs remain essential; reasoning output should be validated just like any other agent I/O to avoid cascading errors.
  • Model choice matters: newer reasoning-native models (e.g. GPT-4o family) handle these patterns with fewer prompts, but even base models can reason when coached properly.
  • Keep tool lists lean; every additional tool inflates ReAct loops and Sequential Thinking calls—scope each agent to < 10 highly relevant tools.
  • A production-ready reasoning agent pairs structured prompts, the right reasoning strategy, Sequential Thinking for thought tracking, and guardrails for validation—yielding plans that can adjust, retry, and succeed autonomously.

FAQ

What do “reasoning” and “planning” mean for LLM-powered agents?Reasoning is how an LLM breaks a problem into smaller tasks; planning is how it organizes those tasks into an executable sequence to reach a goal. Together they enable agents to make decisions, take actions, and adjust as they progress.
Do I need a special “reasoning model,” or can prompting make any LLM reason?Both work. Modern models often reason by default, but you can elicit reliable reasoning from any LLM with prompt patterns like Chain-of-Thought (CoT) and ReAct. Even with reasoning models, prompts help you control style, structure, and reproducibility.
What is Chain-of-Thought (CoT) prompting and when should I use it?CoT asks the model to think step-by-step before answering. Use it for logic, math, or procedural problems where explicit intermediate steps improve accuracy and debuggability. Trade-offs: more tokens, higher latency, and occasional need to tune the step template to the model.
How does ReAct differ from CoT?ReAct interleaves reasoning with actions and observations: think → act (call a tool) → observe → adjust plan. It’s ideal when external tools, APIs, or retrieval are needed. Compared to CoT, ReAct is more interactive and adaptive, but still incurs extra tokens and complexity.
What does “planning with LLMs” add beyond CoT/ReAct?Planning introduces a global, longer-horizon view across multiple reasoning loops, organizing sub-goals and revisiting them as information changes. You can prompt the model to propose a plan first, execute it stepwise (often via ReAct), then revise based on observations.
What is Tree-of-Thoughts (ToT) and why use it?ToT explores multiple reasoning branches instead of a single linear chain. It generates candidate thoughts, evaluates them, prunes weak branches, and continues expanding promising ones. It helps on complex, search-like problems but is token- and time-intensive.
What is Reflexion and when is it appropriate?Reflexion adds a critique-and-retry loop: attempt → feedback → revised attempt. A critic (or heuristic/tool) provides brief feedback so the solver improves iteratively. It excels on ambiguous or hard tasks but typically needs a reliable feedback signal or ground truth check.
How do I instruct an agent to apply these patterns?Embed the pattern in the agent’s instructions: ask for step-by-step thinking for CoT; specify the reason → act → observe loop for ReAct; request multiple hypotheses with scoring/pruning for ToT; add a critic/feedback loop for Reflexion. Provide or register tools the agent can call.
What is the Sequential Thinking MCP server and what does it provide?It’s a “thinking scratchpad” tool exposed via MCP. It doesn’t reason for the agent; instead, it structures, stores, and tracks thoughts (including revisions and branches) across steps. This supports longer plans, ReAct loops, and even ToT-style exploration with persistent context.
How can I improve reliability when the agent still gets answers wrong?- Tweak the CoT steps to match the model’s style and grammar - Add small worked examples in the prompt - Use ReAct with explicit tool calls and inspect traces - Store and revisit global plans via the Sequential Thinking server - Add lightweight checks/critics (Reflexion) when a ground-truth or scoring tool is available - Balance pattern choice against latency and token costs

pro $24.99 per month

  • access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
  • choose one free eBook per month to keep
  • exclusive 50% discount on all purchases
  • renews monthly, pause or cancel renewal anytime

lite $19.99 per month

  • access to all Manning books, including MEAPs!

team

5, 10 or 20 seats+ for your team - learn more


choose your plan

team

monthly
annual
$49.99
$499.99
only $41.67 per month
  • five seats for your team
  • access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
  • choose another free product every time you renew
  • choose twelve free products per year
  • exclusive 50% discount on all purchases
  • renews monthly, pause or cancel renewal anytime
  • renews annually, pause or cancel renewal anytime
  • AI Agents in Action, Second Edition ebook for free
choose your plan

team

monthly
annual
$49.99
$499.99
only $41.67 per month
  • five seats for your team
  • access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
  • choose another free product every time you renew
  • choose twelve free products per year
  • exclusive 50% discount on all purchases
  • renews monthly, pause or cancel renewal anytime
  • renews annually, pause or cancel renewal anytime
  • AI Agents in Action, Second Edition ebook for free