Chaos Engineering
Crash test your applications
Mikolaj Pawlikowski
  • MEAP began April 2020
  • Publication in Early 2021 (estimated)
  • ISBN 9781617297755
  • 325 pages (estimated)
  • printed in black & white

Excellent concept and execution! The author provides a reasonable approach to something that seems terribly complicated from the outside.

Burk Hufnagel
Auto engineers test the safety of a car by intentionally crashing it and carefully observing the results. Chaos engineering applies the same principles to software systems. In Chaos Engineering: Crash test your applications, you’ll learn to run your applications and infrastructure through a series of tests that simulate real-life failures. You’ll maximize the benefits of chaos engineering by learning to think like a chaos engineer, and how to design the proper experiments to ensure the reliability of your software. With examples that cover a whole spectrum of software, you’ll be ready to run an intensive testing regime on anything from a simple WordPress site to a massive distributed system running on Kubernetes.

About the Technology

Rather than just looking for code bugs and errors, chaos engineering sees how your software responds to calamity, including partial infrastructure outages, hardware failure, and other major pitfalls that can befall a production system. By observing a system in distress or under attack, chaos engineering ensures the reliability and resiliency of your software—especially for hard-to-test distributed systems with lots of moving parts and little scope for downtime.

About the book

In Chaos Engineering: Crash test your applications you’ll learn to design and execute controlled failure experiments that reveal the hidden problems in your software. Using a toolbox of open source tools, you’ll inject system-shaking failures at every level—from your Docker containers, to your Kubernetes deployment, to the UI. You’ll learn Linux monitoring for observing system metrics and evaluating your results, and even how to apply Chaos Engineering to make your human teams more reliable and resilient to handling failures. Best of all, all tools and examples come with a downloadable Linux VM image, letting you easily experiment without risk to your own systems.
Table of Contents detailed table of contents

PART 1 Chaos Engineering Fundamentals

1 Into the world of chaos engineering

1.1 What is chaos engineering?

1.2 Motivations for chaos engineering

1.2.1 Risk, cost and service-level indicators, objectives, and agreements (SL{I,O,A})

1.2.2 Testing a system as a whole

1.2.3 Emergent properties

1.3 Four steps to chaos engineering

1.3.1 Observability

1.3.2 Steady state

1.3.3 Hypothesis for our experiment

1.3.4 Run the experiment and prove (or refute) your hypothesis

1.4 What chaos engineering is not

1.5 A taste of chaos engineering

1.5.1 FizzBuzz as a service

1.5.2 A long, dark night

1.5.3 Post-mortem

1.5.4 Chaos engineering in a nutshell

1.6 Summary

2 First cup of chaos & blast radius

2.1 Setup - working with the code in this book

2.2 Scenario

2.3 Linux forensics 101

2.3.1 Exit codes

2.3.2 Killing processes

2.3.3 Out Of Memory (OOM) Killer

2.4 The first chaos experiment

2.4.1 Visibility

2.4.2 Steady state

2.4.3 Hypothesis

2.4.4 Run the experiment

2.5 Blast radius

2.6 Digging deeper

2.6.1 Saving the world

2.7 Summary

3 Observability

3.1 The app is slow

3.2 The USE method

3.3 Resources

3.3.1 System overview

3.3.1.1 uptime

3.3.2 Block IO

3.3.2.1 df
3.3.2.2 iostat
3.3.2.3 biotop

3.3.3 Networking

3.3.3.1 sar
3.3.3.2 tcptop

3.3.4 RAM

3.3.4.1 free
3.3.4.2 top
3.3.4.2 vmstat
3.3.4.3 oomkill

3.3.5 CPU

3.3.5.1 top
3.3.5.2 mpstat ­-P ALL 1
3.3.5.3 My dog ate my CPU - how do I fix it?

3.3.6 OS

3.3.6.1 opensnoop
3.3.6.2 execsnoop

3.3.7 other tools

3.4 Application

3.4.1 cProfile

3.4.2 BCC and Python

3.5 Automation - using time series

3.5.1 Prometheus & Grafana

3.6 Further reading

3.7 Summary

4 Database trouble & testing in production

4.1 We’re doing WordPress

4.2.1 Experiment 1: slow disks

4.2.2 Experiment 2: slow connection

4.3 Testing in production

4.4 Summary

PART 2 Chaos Engineering in Action

5 Poking Docker

6 Kubernetes

7 Who you gonna call? Syscall-busters!

8 Who ate from my JVM?

9 Application-level fault injection

10 There’s a monkey in my browser!

11 Trouble with time travel

PART 3 Chaos Engineering beyond machines

12 Chaos engineering (for) people

Appendixes

Appendix A: Appendix A. Installing chaos engineering tools

What's inside

  • Design, run and analyze Chaos Engineering experiments
  • See how applications react to a database connections latency
  • Experiment with Docker container isolation
  • Test software running on Kubernetes and the platform itself
  • Inject failure into software running in the HVM

About the reader

For developers with basic knowledge of scripting and Linux.

About the author

Mikolaj Pawlikowski has been practicing chaos engineering for four years, beginning with a large distributed Kubernetes-based microservices platform at Bloomberg. He is the creator of the Kubernetes Chaos Engineering tool PowerfulSeal, and the networking visibility tool Goldgpinger. He is an active member of the Chaos Engineering community and speaks at numerous conferences.

placing your order...

Don't refresh or navigate away from the page.
Manning Early Access Program (MEAP) Read chapters as they are written, get the finished eBook as soon as it’s ready, and receive the pBook long before it's in bookstores.
print book $49.99 pBook + eBook + liveBook
Additional shipping charges may apply
Chaos Engineering (print book) added to cart
continue shopping
go to cart

eBook $19.99 $39.99 3 formats + liveBook
Chaos Engineering (eBook) added to cart
continue shopping
go to cart

Prices displayed in rupees will be charged in USD when you check out.

FREE domestic shipping on three or more pBooks