4 Securing GenAI
This chapter explains why securing GenAI requires a different playbook from traditional software security. Because large language models interpret meaning rather than structure, they cannot reliably separate instructions from data, making prompt injection and jailbreaks a systematic risk rather than an edge case. The text introduces a compounding exposure model—environment, model, input, data access, ability to make changes, and agency—and shows through the ClaimAssist narrative how each step up in capability (from FAQ bot to autonomous agent) widens the blast radius of mistakes or attacks. Real-world incidents and large-scale red-teaming underline that once systems can read sensitive data, take actions, or plan autonomously, misinterpretation becomes a security incident, not just a bad answer.
The chapter catalogs the major failure modes across that ladder: indirect and multimodal prompt injection (including document and configuration poisoning), data exfiltration via the “lethal trifecta” of untrusted input, sensitive data access, and outbound channels, and environmental weaknesses such as vendor gaps, fragile supply chains, secrets sprawl, and denial-of-wallet resource exhaustion. It details model-specific risks—safety degradation from fine-tuning, backdoored or poisoned weights, and model theft—and enterprise access pitfalls like the confused deputy pattern and overshared permissions that retrieval makes newly visible. With write access and agency, risks shift to tool misuse, memory poisoning that persists across sessions, cascading multi-agent failures, and rogue behavior, all amplified by emerging tool ecosystems like MCP that can silently expand capabilities and trust boundaries.
Controls are mapped to exposure rather than one-size-fits-all. Baselines include hardened environments and vendors, comprehensive and privacy-aware logging, SBOM and supply‑chain hygiene, strict secret handling, quotas and circuit breakers, and input/output guardrails with narrowed retrieval scopes and document sanitization. For data access, the chapter emphasizes user‑context queries, least privilege, tenant isolation plus metadata filtering, and real‑time authorization. Once systems can act, it prescribes short‑lived scoped credentials, pre‑execution validations, explicit action exclusions, calibrated human‑in‑the‑loop approval tiers, and rate/scope limits. For agency, it moves enforcement outside the model with policy engines and approved tool registries, adds MCP gateways, sandboxed execution, memory isolation with paraphrased persistence, behavioral monitoring with reasoning traces, graceful degradation, kill switches, and agent identity governance—rounded out by staged red‑teaming and a pragmatic stance on shadow AI: meet user needs with sanctioned options to reduce unsafe workarounds.
ClaimAssist v1 on the left is a simple policy chatbot. ClaimAssist v4 on the right can accept user uploaded images and change a claim status as “ready for adjuster review”.
Security Risk Factors. Each additional factor compounds the risk.
Real life prompt injection attack
Image prompt injection causing a faulty decision in ChatGPT-5.
Summary
Generative AI introduces security challenges that traditional software controls were not designed to address. Large language models cannot reliably distinguish between data they should process and instructions they should follow. When you connect these systems to sensitive data, let them take actions, or grant them autonomy to pursue goals, you create attack surfaces that traditional controls cannot protect. Firewalls, access controls, and input validation were not designed for systems that blur the line between data and instruction.
The six risk factors introduced in Section 4.2 determine how much exposure a GenAI system carries. Environment and model risks set the foundation: an insecure underlying platform or a compromised model undermines everything built on top. Untrustworthy input and data security risks compound: prompt injection against a system with data access turns every input manipulation into a potential breach. Action capability and agency take it to its worst form: from “the model said something wrong” to “the model did something wrong at scale, across multiple sessions, without anyone asking it to.”
The Lethal Trifecta (untrusted input, access to sensitive data, and a channel for external communication) is the condition that makes agentic systems most dangerous. Break any one element and the worst attacks fail. But useful AI assistants often need all three, making containment rather than prevention the realistic goal.
The controls at each layer follow a consistent architecture: match permissions to the requesting user’s identity, not a shared service account; enforce authorization decisions outside the model’s reasoning process; require human review proportional to consequence; maintain complete audit trails for actions, not just outputs; contain damage through sandboxing and rate limits; and monitor reasoning traces, not just network traffic.
The attacks described in this chapter have affected production systems at major vendors and compromised customer data. Organizations that handle these threats successfully will match controls to exposure, treat security as a governance discipline, and constrain capabilities when adequate protections do not yet exist.
AI Governance ebook for free