1 Building on quicksand: the challenges of vibe engineering
AI-assisted development delivers rapid prototyping and early learning, but speed without discipline—“vibe coding”—creates brittle systems, security holes, and code no one truly owns. The chapter argues that model upgrades won’t rescue poor process: scaling has hit diminishing returns, so advantage shifts from raw horsepower to engineering rigor. “Vibe engineering” is proposed as the remedy: combine creative LLM prototyping with professional practices—clear intent, tight abstractions, verification, and operational guardrails—so teams turn intuition into reliable, production-safe software.
Through real incidents—hacks within days of launch, destructive file operations, a supply-chain compromise, and an overzealous agent deleting production data—the text exposes a new class of systemic risks rooted in unverified, context-detached code and automation bias. The hidden cost is “trust debt”: short-term velocity that offloads verification to reviewers and senior engineers, eroding vigilance and ownership. The cure is a verify-then-merge culture anchored in executable specifications that act as contracts: specs and tests first, then generation; PR checklists and policy gates; sandboxing, canaries, and fast rollback; retrieval for grounding; guarded automation; and CI pipelines that enforce security, performance, and correctness before a human approves.
The human role shifts from line-by-line author to system designer and validator, with ownership measured by the quality of the mental model, not who typed the code. To counter the “70% problem” (easy scaffolding, hard last mile) and the comprehension bottleneck, the chapter prescribes a repeatable loop—Vibe → Specify/Plan → Task/Verify → Refactor/Own—where specifications, property tests, SLO gates, and domain invariants define success up front, and agents work within those boundaries. Tools amplify this approach, but the mindset is the real change: treat LLMs like costed compute, master context and orchestration, and elevate craft into engineering by making taste and intent explicit, auditable, and executable.
The autonomy-risk spectrum: each step grants more leverage but demands tighter verification, governance, and engineering discipline
High-velocity, AI-powered app generation without professional rigor creates brittle, misleading progress.
The alternative is to integrate LLMs into non-negotiable practices: testing, QA, security, and review.
Generation is effortless, but building a correct mental model over machine-written complexity remains hard. Real ownership depends on understanding, not just producing, code.
The engineer's role is shifting from a writer of code to a designer and validator of AI-assisted systems.
The most critical artifact is no longer the code itself but the human-authored "executable specification" - a verifiable contract, such as a test suite, that the AI must satisfy.
Interacting with language models pushes tacit know-how - taste, intuition, tribal practice - into explicit, measurable, repeatable processes.
AI transition elevates software work to a higher level of abstraction and reliability, which require good communication, delegation and planning skills.
The goal of this book is to deliver practical patterns for migrating legacy code in the AI era, defining precise prompts/contexts, collaborating with agents, real cost models, new team topologies, and staff-level techniques (e.g. squeezing performance).
FAQ
What is “vibe coding,” and why is it risky in production?Vibe coding is an intuition-first, LLM-powered way to spin up working software fast, often without tests, security hygiene, or deep verification. It creates an illusion of speed: code “looks functional” but is brittle, opaque, and easy to exploit. Documented failures include a startup hacked within days, an AI CLI command that effectively erased a project, a supply-chain trojan via an AI-authored PR, and an agent that “cleaned” production data by deleting thousands of records.How does “vibe engineering” differ from vibe coding?Vibe engineering is systematic and evidence-driven. It wraps the probabilistic core of LLMs in a deterministic shell of human intent, anchored by executable specifications. It emphasizes rigorous testing, security, error handling, edge cases, non-functional requirements (performance, scalability, reliability), and production stability. It treats the model as a replaceable component; correctness comes from the process and contracts, not the provider.What is “trust debt,” and how does it accumulate?Trust debt is the hidden, compounding cost of shipping AI-generated code without adequate verification. It grows under “dump-and-review” habits, where authors offload verification to reviewers and automation bias dulls vigilance. Symptoms include over-trusting green tests, late cognitive handoffs, and senior engineers spending weeks reverse-engineering AI output during incidents—costs that velocity dashboards don’t show.Why won’t the “next, bigger model” solve these problems?Scale now shows diminishing returns: hallucinations, context blind spots, and the need for human verification persist. Data scarcity and reuse further limit gains. Competitive advantage shifts from having the strongest model to mastering usage: clear intent, retrieval, orchestration, testing, and cost/latency-aware operations. Process quality, not raw horsepower, closes the gap from “impressive demo” to “production-safe.”What is an executable specification, and why is it central to reliability?An executable spec is a human-authored, runable contract (tests, properties, API schemas, perf gates) that defines correctness before code exists. LLMs generate implementations that must satisfy the spec. Example: an ISBN-13 validator verified by a pytest suite. Different models produced different code, but all passed the same tests—proving correctness comes from the spec, not the model.What concrete techniques define vibe engineering in day-to-day work?- Systematic prompt engineering with code slices and tests so outputs compile and pass immediately
- Retrieval-augmented and grounded answers with citations to cut hallucinations
- Model-driven first-pass PR review via fixed checklists (validation, auth, perf)
- Incident triage: log summarization and reversible fix drafts
- Guarded automation: agents propose PRs, CI/policy gates enforce, auto-rollback on regressions
- Sandbox, canary, and policy gates (security, compliance, licensing) with full provenanceHow should teams transition from prototype to production without accruing trust debt?Adopt the loop: Vibe → Specify/Plan → Task/Verify → Refactor/Own. Use early prototyping to learn the domain, then freeze intent in executable specs, decompose into ≤2h tasks with clear “Done” checks, verify-then-merge (not dump-and-review), and finish with refactor/own so the team understands and documents the code it ships.What is the autonomy–risk ladder, and how should we govern it?It progresses from token completion → block suggestions → conversational IDE agents → local autonomous agents → near-fully autonomous developer agents. Each rung adds leverage and failure complexity, demanding tighter verification, governance, and staged rollout (sandbox → canary → broad). Never scale autonomy faster than you scale verification and auditability.What is the “70% problem,” and what lives in the hard 30%?AI accelerates the first ~70% (scaffolding and common patterns) but struggles with the last 30%: edge cases, architectural fit, comprehensive verification (property, mutation, performance, security), compliance, and performance/scalability. That final mile requires human judgment, domain context, and adversarial thinking—exactly what executable specs and strong CI gates enforce.How do teams maintain real code ownership when AI writes much of the code?- Make specs the source of truth; treat prompts/specs as versioned, reviewable artifacts
- Prefer small, staged commits with rationale and provenance
- Keep humans engaged at the right level (intent, invariants, risks), not just line-reading
- Avoid “machine verifying the machine” by curating adversarial tests and mutation thresholds
- Instrument production, enable fast rollback, and ask the litmus question: “Would you go on-call for this system?”
pro $24.99 per month
access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!