1 Threat-modeling agentic pipelines
This chapter introduces AI agent pipelines as a structured way to bring large language models and autonomous reasoning into offensive security without losing control, accountability, or human judgment. It explains that modern testers already rely on chains of tools and artifacts, but the growing volume and complexity of scan results, logs, endpoints, and vulnerabilities make manual coordination increasingly fragile. AI can help by interpreting context, prioritizing findings, reducing cognitive load, and adapting strategy, but it must be embedded in disciplined workflows rather than used as ungoverned automation.
The chapter distinguishes between traditional automation, LLMs, agents, pipelines, and artifacts. Traditional tools execute repeatable actions, while LLMs provide probabilistic reasoning and interpretation. Agents extend LLMs by adding goals, tools, memory, knowledge, and orchestration so they can plan and act within limits. Pipelines then provide the architecture that makes agent behavior repeatable and auditable: they collect trusted inputs, interpret context, run controlled actions, evaluate results, and report findings. Artifacts such as open ports, resolved subdomains, HTTP metadata, endpoints, and error messages become the evidence that moves through this system and drives precise decisions.
A central theme is that AI should support offensive security professionals, not replace them. Humans define objectives, approve escalation, enforce scope, and remain responsible for ethical and legal boundaries. The chapter emphasizes best practices such as written authorization, avoiding production harm, protecting sensitive data, documenting actions, maintaining human oversight, and following responsible disclosure. For bug hunters, red teams, purple teams, blue teams, SOC analysts, and security leaders, AI pipelines offer faster triage, reusable playbooks, stronger detection feedback, safer training, and clearer accountability, turning creative security testing into a measurable and repeatable practice.
The conventional triage pipeline mental model.This diagram provides a high-level (macro) view of the conventional, human-driven security triage pipeline, as sketched in the notebook. It serves as a roadmap for this linear workflow, starting with data collection and proceeding sequentially through vulnerability assessment, risk scoring, and attack path planning. This entire, predictable sequence traditionally concludes with a human operator making a final decision or handoff.
Seven best practices for offensive security. 1) Authorized scope only: to ensure we do not cause damage or expose sensitive information, we need to work within pre-defined, scoped boundaries. 2) No production harm: do not do things that will negatively affect production traffic. 3) Follow the law: we must stay compliant and follow regulations. 4) Human oversight: humans must be in the loop to review and validate findings. 5) Protect sensitive data: proper processes must be set up to ensure personal and identifiable information is not exposed. 6) Document everything: we must store logs, traces, and other artifacts that will allow us to audit our systems. 7) Reasonable disclosure: We should give the affected party a reasonable amount of time to fix issues before publicly revealing them. These best practices ensure that offensive security teams consistently deliver value and maintain professionalism within their organization.
An illustrative example of how an LLM generates the next token. As the LLM generates a sentence, it considers the context of the previous words that were generated. The LLM takes in the last token and assesses the probability of the next token. In the example above, since green has the highest logit value, it is the next word to be generated in the sentence.
An overview of AI agent systems. AI agents consist of 4 components that are orchestrated together to produce an outcome: 1) the model, which is a foundation model, 2) tools, which are functions that the LLM can use to interact with the world (e.g., custom functions, APIs, MCP servers), 3) memory, where previous interactions are stored either in the context window or in a vector database, and 4) the knowledgebase, where additional context (documents, old conversations, etc) are stored in a vector database.
The Dynamic AI Agent System Mental Model. This diagram models the more dynamic system that results from introducing an AI Agent, representing the technology and its surrounding world. Unlike the linear sequence in Model 1, the central Agent creates a cyclical, event-driven workflow that allows it to initiate reconnaissance, penetration testing, or triage in response to new data. This model provides a framework for understanding the complex, parallel interactions and feedback loops unique to the AI-driven system. A reader can use this model to predict the AI's behavior or debug its emergent actions.
An example: reconnaissance agent pipeline. An AI agent system consists of 1) a data pipeline that feeds an LLM logs and other inputs from the system, 2) a reasoning component that allows AI models to determine appropriate actions and steps, 3) an evaluation component that assesses the impact of the changes, and 4) a reporting system for the security professionals.1.4.2. Core Components of an AI Security Pipeline
Summary
- Large language models (LLMs) introduce contextual reasoning to security testing, turning raw data into actionable intelligence when guided by skilled professionals.
- Because LLMs are probabilistic systems, their outputs can be unreliable without validation; human oversight is essential to ensure accuracy and safety.
- AI agents build on LLMs by adding memory, planning, and tool-use capabilities, enabling reasoning systems that can act rather than merely respond.
- Pipelines provide the structure agents need to remain reliable and accountable—defining clear stages for input, reasoning, action, evaluation, and reporting.
- AI agent pipelines allow offensive security teams to scale intelligence without losing control—empowering individuals, red teams, and CISOs alike to achieve measurable, repeatable outcomes.
FAQ
What is the main idea of “Threat-modeling agentic pipelines”?
The chapter explains how AI reshapes offensive security by introducing agentic pipelines: structured workflows where AI agents interpret artifacts, prioritize work, and support decisions while humans define objectives and approve escalation. The goal is not uncontrolled automation, but repeatable, auditable, and ethical security testing.
How do AI agents differ from traditional offensive security tools?
Traditional tools execute specific tasks, apply rules, and produce output. AI agents add reasoning, planning, memory, and tool use. Instead of only running commands, an agent can interpret results, decide what matters, refine a strategy, and suggest or trigger controlled next steps within a defined pipeline.
Why are tools alone no longer enough in modern offensive security?
Modern environments generate too much data for humans to interpret manually at scale. Scanners, logs, headers, certificates, and reconnaissance outputs can overwhelm analysts. The limiting factor is no longer execution, but understanding: deciding what matters, what to do next, and why. AI agents help reduce this cognitive burden by reasoning across results and prioritizing effort.
What is an artifact in an offensive security pipeline?
An artifact is a structured record of what a tool observed, such as open ports, resolved subdomains, HTTP response metadata, discovered endpoints, authentication behavior, error messages, or scan results. Artifacts are evidence, not interpretation. Pipelines move artifacts forward so agents or humans can decide which ones matter.
What role do pipelines play in AI-assisted offensive security?
Pipelines define how information moves from one stage of testing to the next. They route artifacts, enforce decision gates, log actions, and make workflows reproducible. In this model, tools execute, agents interpret, and pipelines govern the flow of information.
How does an LLM become an AI agent?
An LLM becomes part of an AI agent when it is combined with orchestration, tools, memory, goals, and sometimes a knowledge base. The LLM provides reasoning and language generation, while the agent system determines next steps, calls tools, evaluates results, and loops until a goal is met or a limit is reached.
What are the core components of an AI security pipeline?
A well-designed AI security pipeline typically collects trusted input from tools, logs, or APIs; interprets context using an LLM or agent; acts through controlled tools or scripts; evaluates results against rules or safety checks; and reports findings in a structured format. Each stage should be monitored, logged, and reviewable.
Why is human oversight important in agentic offensive security workflows?
Human oversight ensures that AI-assisted testing remains ethical, legal, and safe. Humans define objectives, approve escalation, validate findings, and review actions that could cause data loss, service disruption, or privacy violations. AI can accelerate analysis, but it should not replace professional judgment.
What are the key safety practices for offensive security testing with AI?
The chapter emphasizes several practices: stay within authorized written scope, avoid harming production systems, follow applicable laws, require human oversight for high-risk actions, protect sensitive data, document prompts and decisions, and follow responsible disclosure when real vulnerabilities are found.
How do AI agent pipelines benefit different security roles?
Bug hunters can use pipelines to triage data and draft reports faster. Red teams and penetration testers can turn engagements into reusable playbooks. Purple teams and detection engineers can use offensive traces to improve defenses. Blue teams and SOC analysts can replay recorded runs in safe environments. Security leaders gain visibility, accountability, and auditable evidence of testing outcomes.
AI Agents for Offensive Security ebook for free