4 Securing GenAI
Generative AI changes the security equation from structural flaws to behavioral risk. Unlike traditional software, models infer and improvise, so the attack surface is linguistic and contextual: prompts, retrieved content, memory, and downstream actions. Real-world failures span autonomy gone wrong, prompt injection and jailbreaks, feedback/data poisoning, insecure output handling, system-prompt leakage, and trojaned or backdoored models. Classical controls (firewalls, RBAC, encryption) still matter, but they cannot prevent a model from misinterpreting instructions, inventing policies, or producing unsafe commands that other systems trust. Securing GenAI therefore requires a new layer: treat prompts and context as attack surfaces, validate and constrain outputs before they trigger actions, and verify the provenance and integrity of models, adapters, datasets, and serving stacks.
Risk and responsibility shift with the deployment model. In SaaS, ease comes with limited control: consumers face shadow AI, prompt injection, RAG leakage, unsafe plug-ins, over-trust, audit gaps, and cross-tenant exposure, while providers must enforce isolation, guardrails, and insider controls. API integrators gain power—and accountability—over input sanitization, RAG curation and ACLs, function/tool abuse, quotas and spend guards, secrets and supply chain hygiene, and full-fidelity audit logging; they also depend on providers for isolation, stable versions, and clear change notices. Model hosters “own the engine”: they must vet artifacts and formats (prefer data-only over executable), patch vulnerable runtimes, guard against behavior drift and backdoors (from fine-tunes, LoRAs, merges), and prevent theft or extraction of model weights and capabilities.
Effective defense combines process and technical guardrails. Establish a governance lifecycle: diligent vendor selection, secure configuration, continuous monitoring, incident response, and iterative hardening. Apply practical controls: curate and sign sources; enforce permissions at answer time; detect and neutralize injection; require citations and validate structured outputs; keep humans in the loop for high-impact actions; restrict and sandbox tools/plug-ins with least-privilege tokens; rate-limit and cap spend; centralize egress through a gateway with logging, redaction, and policy checks; pin versions, canary test changes, and keep rollback paths; sandbox first loads and disable risky deserialization; manage secrets in a vault with rotation; minimize data, prefer zero/short retention; and continuously red-team for jailbreaks, bias, drift, and unsafe behavior. The model proposes—controls, monitoring, and people decide.
Example of malicious user prompting
Real life prompt injection attack
Threat Landscape for GenAI SaaS Adoption: Consumer vs. Provider Responsibilities
Image prompt injection causing a faulty decision in a ChatGPT 5
Essential Steps for Secure Integration of Vendors Data Governance Steps
Threat Landscape for GenAI API Adoption: Integrator vs. Provider Responsibilities
Input sanitization by a prompt guard rail
Summary
In the cloud era we secured networks, apps, and data stores. With GenAI we also have to secure behavior: what the model can be asked to do, what it’s allowed to say, and what happens to its outputs. Across the three deployment patterns (SaaS consumer, API integrator, and model hoster) the same theme repeats: you can’t outsource all the risk. Vendors run infrastructure, but you decide how the assistant is used, what it can touch, and whether its answers are trusted or verified.
For SaaS tools, your biggest risks live at the edges: employees pasting sensitive content (Shadow AI), prompt-injection and jailbreaks, weak or missing guardrails, and retrieval features that surface the wrong document to the wrong person. Because you can’t see or tune the provider’s system prompt or safety layers, the practical approach is scope and process: keep use cases narrow, require sources when possible, block high-impact automation, and make vendors “show their work” (testing, logging, change notices). Treat SaaS assistants like external services with data-handling rules, not magic boxes.
For API integrations, you gain control; and with it, obligations. You own input filters, output validation, quotas, and audit logs. You decide whether the model’s text can trigger actions (function/tool calling), and whether those actions require a human or policy checkpoint. Retrieval-augmented generation (RAG) behaves like a database: if you index junk or mis-scope permissions, the model will confidently quote it back. Secure the pipeline (ingestion -> indexing -> query-time access checks -> attribution), and assume model outputs are untrusted input until your code says otherwise.
For model hosters, the model and the runtime are now your software. The stack includes loaders, inference servers, containers, and GPUs (complete with CVEs, unsafe deserialization paths, and patch windows). Model packages can carry risky loaders; adapters and merges can shift behavior; fine-tunes can introduce backdoors or drift. Treat weights like secrets, loaders like untrusted code, and serving frameworks like production web servers: patch quickly, run with least privilege, pin and verify artifacts, and stage changes behind canaries and rollbacks.
Across all models, three patterns drive most incidents:
- Over-trust: not sanitizing what is sent to the model and taking fluent output as fact or letting it act without checks. Fix it with input and output sanitization, schema validation, policy checks, human-in-the-loop for high-impact steps, and clear limits on what the assistant can do.
- Unclear data paths: assistants pulling from stale, poisoned, or mis-scoped sources. Fix it with RAG hygiene: curation, per-document access control at query time, visible citations, and periodic re-indexing.
- Missing observability: no record of who asked what, what the model returned, or what actions fired. Fix it with audit logging across the whole chain (input prompt -> model response -> filters -> action) and simple analytics for spikes, long outputs, and repetitive probes.
The chapter also introduced a baseline vs. advanced playbook. Baseline focuses on what most teams can do this quarter: safe loaders, narrow use cases, pre- and post-call checks, quotas, simple RAG permissions, and essential logging. Advanced adds stronger isolation (sandboxed loading, policy gateways, segmented models), adversarial tests (hidden-trigger and injection probes), version pinning with fast rollbacks, and tighter API surfaces (reduced internals like log-probs, staged feature flags).
Procurement still matters. Ask vendors about data retention, memory controls, refusal testing, jailbreak resistance, logging access, and update practices. For API and self-hosted paths, extend that diligence to your own shop: secrets management, key rotation, egress controls, and who can change prompts, policies, or tool wiring. If a model touches money, customer records, or permissions, treat the assistant as a privileged workflow, not a chatbot.
At the very least, carry these three rules into implementation:
- Treat model outputs as untrusted until proven safe.
- Treat retrieval and memory as live access paths, not convenience features.
- Treat models and runtimes as production software, with patches, pins, logs, and rollbacks.
Do those three consistently (at SaaS, API, and self-host) and you’ll prevent the failures that look uniquely “AI,” but are, in practice, the same engineering mistakes we already know how to avoid.
FAQ
Why isn’t classical application security enough for Generative AI systems?
GenAI systems don’t just execute fixed rules—they infer and improvise in context. That makes the attack surface linguistic and behavioral, not only structural. Failures often arise from prompt/control gaps, context manipulation, and unsafe output handling (e.g., a model “suggests” a harmful command that a downstream system executes). Traditional controls (firewalls, RBAC, encryption) still matter, but must be complemented with prompt hardening, context boundaries, output validation, and continuous adversarial testing.What are the security tradeoffs of SaaS vs API integration vs self‑hosting?
- SaaS (fastest to adopt): Lowest control/visibility; risks include weak/opaque guardrails, insider misuse, cross‑tenant exposure, limited auditing, and shadow AI. Mitigate via contracts (no training, retention, residency), admin controls, redaction, narrow use cases, and output validation before acting.
- API integration (more control, more responsibility): You own input/output sanitation, RAG hardening, quotas/rate limits, logging, secrets, and guardrails. Still depend on provider isolation, stability, and model updates. Build gateways, validations, and SIEM pipelines.
- Self‑hosting (maximum control, maximum ops burden): Risks include malicious model artifacts (pickle), serving stack CVEs, model theft, behavior drift. Mitigate with safetensors, provenance/signatures and pinning, sandboxed first‑loads, patching, least‑privilege runtimes, encrypted weights, and usage analytics.
How do prompt injections and jailbreaks work—and how do we defend against them?
Attackers craft inputs (or embed hidden instructions in docs/web pages) that override instructions, leak system prompts, or induce unsafe behavior—often without any code exploit. Defenses:- Harden prompts and separate channels (system vs user vs tools); never let user/retrieved text populate system prompts.
- Add “anti‑phishing” style checks: flag urgency/override language and sensitive‑data requests; require escalation for risky asks.
- Constrain memory/context (disable or shorten; reset long threads).
- Moderate inputs/outputs; add policy checkers and refusal rules.
- Red‑team regularly with adversarial prompts, including indirect injections via RAG/web.
What is RAG data poisoning and leakage, and how can we mitigate it?
RAG can surface poisoned or overscoped content and bypass permissions if not enforced at answer time. Mitigations:- Curate sources with allow‑lists; avoid open/unvetted corpora.
- Enforce document‑level ACLs at query time, not just ingestion.
- Scrub documents (hidden text, scripts, “ignore prior instructions” phrases) before indexing.
- Require citations and last‑updated timestamps; prefer “I don’t know” over guessing.
- Keep indexes fresh; version and log changes.
- Plant canary docs to detect injection/leakage; demand retrieval logs (doc IDs, user/time) for auditability.
What is “execution by suggestion” (insecure output handling) and how do we prevent it?
Downstream systems sometimes trust model outputs as commands or configurations. A “helpful” suggestion can become destructive when executed blindly. Prevent by:- Treating outputs as untrusted; require strict schemas and validators.
- Adopting “propose vs execute”: humans or validated automations approve high‑impact actions (refunds, deletes, payouts).
- Adding two‑phase commits and allow‑lists for sensitive ops.
- Running tools/actions in sandboxes with least privilege and rate limits.
How do we safely use third‑party plug‑ins and LLM function/tool calling?
Plug‑ins and function calls expand capability but widen blast radius. Safeguards:- Allow‑list tools; disable by default; scope per role/session; add rate limits.
- Validate function names/arguments independently; treat all as untrusted input.
- Add human checkpoints for high‑impact actions; use two‑person rule where appropriate.
- Sandbox connectors (limited egress, no secrets); log proposed/executed calls and rejections.
- Vetting for SaaS plug‑ins; restrict who can install; require vendor transparency and logs.
What is Shadow AI and how can organizations reduce it?
Shadow AI is unapproved or untracked GenAI use (personal accounts, ad‑hoc scripts, stealth connectors). Reduce by:- Discovering usage (network/CASB/DLP/browser telemetry; surveys).
- Publishing a clear AI policy and procurement checks.
- Offering approved, easy alternatives (enterprise accounts, short retention, audit logs).
- Training for safe use and data minimization.
- Creating chokepoints: route all LLM traffic via a gateway; enforce keys, quotas, and logging; monitor egress to AI domains.
What should GenAI audit logging include?
Keep a reproducible chain: timestamp, request ID(s) on both sides, user/session, model name/version, prompt or hash, input/output token counts, retrieved doc IDs, validator/guardrail results (pass/fail + reason), and the final decision (executed/rejected) with action IDs. Protect access, set retention (e.g., 90–180 days hot), export to SIEM, and log rejections/timeouts/moderation hits as first‑class events.How do API integrators contain cost, abuse, and upstream failures?
- Quotas and rate limits: per‑user/team RPM/TPM aligned with provider caps; truncate oversize prompts/outputs.
- Spend guardrails: daily/hourly caps, circuit breakers, kill‑switches.
- Cost isolation: separate keys per app/team; anomaly alerts on spikes/long outputs/agent loops.
- Resilience: timeouts/retries/circuit breakers; fallback models; validate responses before use.
- Secrets discipline: vault storage, rotation, least‑privilege keys, server‑side proxy (no client keys).
What vendor risks matter most in SaaS/API, and what should we demand?
Key risks: insider misuse, weak/opaque guardrails, cross‑tenant leakage, model poisoning, undisclosed updates, developer‑portal attacks. Ask for: zero/short retention modes; tenant‑visible admin/access logs; per‑tenant isolation and deletion SLAs; model/version pinning and change notices; audit/export APIs; configurable safety filters; SSO/MFA/RBAC on portals; emergency key revocation; documented incident response.What extra steps are required when self‑hosting models securely?
- Model artifacts/provenance: prefer safetensors; disable auto code execution; pin by hash; sandbox first‑loads; track origin/approvals; keep rollbacks.
- Serving/runtime: patch vLLM/Ollama/Triton; run least‑privilege containers; restrict egress; monitor for crashes/anomalies; stage canary tests.
- Tuning/drift: version fine‑tunes/adapters; run safety smoke tests and adversarial probes after every change; constrain/chat memory; visible/clearable memory.
- Theft/misuse: encrypt weights; restrict/track exports; trim API signals (e.g., logprobs), add quotas/analytics to spot extraction; segment high‑value models behind brokers.
AI Governance ebook for free