table of content

1 Thinking in distributed systems: Models, mindsets, and mechanics

1.1 Software engineering and mental models

1.1.1 Mental models: The foundation of reasoning

1.1.2 Correct mental models

1.1.3 Complete mental models

1.2 Mental model of software systems

1.3 Different types of models

1.3.1 Different models describing the same aspects

1.3.2 Different models describing different aspects of a system

1.4 Thinking about distributed systems

1.4.1 Correctness

1.4.2 Scalability and reliability

1.4.3 Responsiveness

1.5 Two big ideas

1.5.1 Systems of systems

1.5.2 Global view vs. local view

1.6 Distributed Systems Incorporated

1.7 Navigating complexity

1.7.1 Simple yet complex

1.7.2 Emergent behavior

1.7.3 Changing perspective

1.7.4 Think globally; act locally

1.8 Thinking above the code

2 System models, order, and time

2.1 System models

2.1.1 Theory and practice

2.1.2 Synchronous distributed systems

2.1.3 Asynchronous distributed systems

2.1.4 Partially synchronous systems

2.1.5 Component and network behavior

2.1.6 Realistic system models

2.2 Order and time

2.2.1 The happened-before relationship

2.2.2 Time and clocks

2.2.3 Physical time and physical clocks

2.2.4 Logical time and logical clocks

2.2.5 Physical clocks vs. logical clocks

3 Failure tolerance

3.1 In theory

3.2 Types of failure tolerance

3.2.1 Masking failure tolerance

3.2.2 Nonmasking failure tolerance

3.2.3 Fail-safe failure tolerance

3.2.4 None of the above

3.3 In practice

3.3.1 System model

3.3.2 Failure handling

3.3.3 Failure classification

3.3.4 Failure detection

3.3.5 Failure mitigation

3.3.6 Putting everything together

4 Message delivery and processing

4.1 Exchanging messages

4.2 The uncertainty principle of message delivery and processing

4.2.1 Before sending the request

4.2.2 After sending the request and before receiving a response

4.2.3 After receiving a response

4.3 Silence and chatter

4.4 Exactly-once processing semantics

4.5 Idempotence

4.6 Case study: Charging a credit card

5 Transactions

5.1 Abstractions

5.2 The magic of transactions

5.2.1 Concurrency

5.2.2 Failure

5.3 The model of transactions

5.3.1 Correctness

5.3.2 Serializability

5.3.3 Completeness

5.3.4 Application-level abort

5.3.5 Platform-level abort

6 Distributed transactions

6.1 Atomic commitment: From a single RM to multiple RMs

6.1.1 Transaction on a single RM

6.1.2 Transaction on multiple RMs

6.1.3 Blocking and nonblocking

6.2 The essence of distributed transactions

6.3 Two-Phase Commit protocol

6.3.1 In the absence of failure

6.3.2 In the presence of failure

6.3.3 Improvement

7 Partitioning

7.1 Encyclopedias and volumes

7.2 Thinking in partitions

7.3 The mechanics of partitioning and balancing

7.4 (Re)partitioning

7.4.1 Types of partitioning

7.4.2 Data item to partition assignment strategies

7.5 Common item-based assignment strategies

7.5.1 Range partitioning

7.5.2 Hash partitioning

7.6 Repartitioning

7.6.1 Range partitioning

7.6.2 Hash partitioning

7.7 Consistent hashing

7.8 (Re)balancing and overpartitioning

8 Replication

8.1 Redundancy

8.2 Thinking about replication and consistency

8.3 Replication

8.4 The mechanics of replication

8.4.1 System model

8.4.2 Replication lag

8.4.3 Synchronous vs. asynchronous replication

8.4.4 State-based vs. log-based replication

8.4.5 Single-leader, multileader, and leaderless systems

9 Consistency

9.1 Consistency models

9.1.1 Common consistency models

9.1.2 Virtues and limitations

9.2 Linearizability

9.2.1 Queue and stack

9.2.2 Formal definition of linearizability

9.3 Eventual consistency

9.3.1 The shopping cart

9.3.2 Variants of eventual consistency

9.3.3 Implementation

9.4 Consistency, availability, and partition tolerance

9.4.1 History

9.4.2 Conjecture vs. theorem

9.4.3 CAP theorem

10 Distributed consensus

10.1 The challenge of reaching agreement

10.2 System model

10.3 State machine replication

10.4 The origin—and irony—of consensus

10.5 Implementing consensus

10.5.1 Leader-based consensus

10.5.2 Quorum-based consensus

10.5.3 Combining leader and quorum

10.6 Raft

10.6.1 The log

10.6.2 Terms

10.6.3 Leader Election protocol

10.6.4 Log Replication protocol

10.6.5 State machine safety

10.7 Raft puzzles

10.7.1 Puzzle 1

10.7.2 Puzzle 2

10.7.3 Puzzle 3

11 Durable executions

11.1 The pitfalls of partial executions

11.2 System model

11.2.1 Process definition

11.2.2 Process execution

11.3 The concept of failure-transparent recovery

11.4 Strategies of failure-transparent recovery

11.4.1 Restart

11.4.2 Resume

11.5 Implementation of failure-transparent recovery

11.5.1 Application-level implementation: Sagas

11.5.2 Platform-level implementation: Durable execution

12 Cloud and services

12.1 From proactive to reactive

12.2 Cloud computing

12.3 Cloud-native computing

12.4 Serverless computing

12.4.1 Traditional

12.4.2 Serverless

12.4.3 Cold path vs. hot path

12.5 Service

12.5.1 Global view vs. local view

12.5.2 Example recommendation service

12.6 Final thoughts

Overview

11 Durable executions

Durable executions are presented as a systems-level abstraction that, like database transactions, hides the messiness of failures while a process runs. The chapter shows why partial executions are dangerous: even if each step is atomic, their sequential composition is not, so a crash between steps can leave the system in an inconsistent state. It distinguishes process definitions (code) from executions (running code), models processes as sequences of failure-atomic actions, and clarifies that multi-step executions don’t automatically inherit atomicity. It also frames short- versus long-running work in logical time (number of steps), not wall-clock time.

Failure-transparent recovery means a failed-then-recovered execution is observationally equivalent to a failure-free one, judged by an application-defined equivalence function. Exactly-once behavior is the ideal but often unattainable; practical equivalences allow duplicating the last event(s) or even restarting from the beginning. Idempotence is therefore essential. Two recovery strategies are highlighted: restart, which is simple with idempotent steps but can misbehave with delays and nondeterminism; and resume, which continues from a persisted save point between steps.

The chapter contrasts application-level and platform-level implementations. At the application level, Sagas make definitions failure-aware, often via state machines that persist state and drive the next step. At the platform level, Durable Executions keep definitions failure-agnostic while delivering failure-transparent executions. Two platform approaches appear: log-based, which records each step’s output and replays with deduplication (simple, but requires determinism and grows logs), and state-based, which persists the continuation after each step (no replay, tolerates nondeterminism, but needs runtime support for serializable continuations). The net effect is clearer business logic when the platform shoulders failure handling.

The sequential composition of two atomic actions is not itself atomic.

Definition versus execution

A process P consists of one step: a.

A process P consists of two steps: a and b.

Resume

Failure handling

Summary

Failure transparency refers to the property of a system where failure-free executions are indistinguishable from failed and subsequently recovered executions.
Failure transparency can be achieved at two levels: the application level and the platform level.
At the application level, failure transparency relies on failure-aware process definitions, resulting in failure-transparent process executions.
At the platform level, failure transparency is achieved through failure-agnostic process definitions, enabling failure-transparent process executions.
Durable Executions are an emerging approach to implementing failure transparency at the platform level.
Durable Executions follow two implementation strategies: log-based and state-based.
In log-based implementations, the system records the output of each step in a durable log and, upon execution failure, replays the process while deduplicating previously executed events.
In state-based implementations, the system records the state (continuation) after each step and, upon failure, restores the continuation to resume execution without replaying steps.

FAQ

What are durable executions, in a nutshell?

Durable executions are a platform-level abstraction that makes long-running processes in distributed systems behave as if failures did not occur. Much like transactions in databases, they conceal partial failures so that a failure-free run is equivalent to a failed-then-recovered run.

Why are partial executions problematic in distributed systems?

Because the sequential composition of atomic steps is not itself atomic. If a process crashes between steps, you can observe a “half-done” outcome (for example, charging a card but failing to create an account), leaving the system in an inconsistent state.

What’s the difference between concurrency atomicity and failure atomicity?

- Concurrency atomicity (isolation): intermediate states are not observable by other processes; execution appears uninterrupted with respect to concurrency.
- Failure atomicity (all-or-nothing): the overall effect is either fully applied or not applied at all, though intermediate states may be observable during execution. In this chapter, “atomic” refers to failure atomicity.

How do “short-running” and “long-running” executions differ?

They differ by logical, not physical, time. A short-running execution has a single step; a long-running execution has multiple steps. Single-step executions inherit failure atomicity; multi-step executions do not, so they need explicit recovery strategies.

What’s the distinction between a process definition and a process execution?

- Process definition: a sequence of failure-atomic steps (P = A • P’ | ε).
- Process execution: the observable trace of events (t), which may end in success (✓) or crash-stop (×). The execution can halt at any time, so reasoning must account for partial traces.

What is failure-transparent recovery, and what role do equivalence functions play?

Recovery is failure-transparent if a recovered execution produces a sequence of events equivalent to some failure-free execution. The application defines the equivalence function, e.g., identity (exactly-once), “valid to duplicate last event,” “valid to duplicate last n events,” or “restart from the beginning.” Practical systems often avoid strict identity and rely on idempotent actions to tolerate duplicates.

Why is idempotence critical for recovery?

When failures cause retries or replays, steps may execute more than once. If each step is idempotent, repeating it yields the same effect as doing it once, making strategies like restart or resume safe despite potential duplicates.

What are the restart and resume strategies, and when does restart fall short?

- Restart: re-execute the process from the beginning; simple but assumes idempotent steps and determinism. It can be problematic with delays/timeouts (they get reset) and with non-deterministic actions (time, randomness).
- Resume: continue from the most recent save point by persisting state/continuations between steps; avoids resetting timers and reduces sensitivity to non-determinism.

How do Sagas differ from Durable Executions?

Sagas are application-level (failure-aware) definitions: developers encode state transitions and persistence/compensation to achieve failure transparency. Durable executions are platform-level (failure-agnostic): the runtime captures progress and handles recovery so business logic remains free of failure-handling code.

What are log-based and state-based implementations of durable executions?

- Log-based: persist each step’s outputs in a durable log and replay on recovery with deduplication. Pros: simple, minimal runtime support. Cons: requires determinism; log can grow large.
- State-based: persist and restore the execution’s continuation (state) to resume exactly where it failed. Pros: no need for determinism or growing logs. Cons: requires serializable continuations, which many runtimes don’t yet support (though some emerging languages do).

pro $24.99 per month

access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
choose one free eBook per month to keep
exclusive 50% discount on all purchases
renews monthly, pause or cancel renewal anytime

lite $19.99 per month

access to all Manning books, including MEAPs!

team

5, 10 or 20 seats+ for your team - learn more

eBook

pdf, ePub, online

$54.99 $41.24

you save $13.75 (25%)

include audio $24.99 $18.74

pro $24.99 per month

access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
choose one free eBook per month to keep
exclusive 50% discount on all purchases
renews monthly, pause or cancel renewal anytime

lite $19.99 per month

access to all Manning books, including MEAPs!

team

5, 10 or 20 seats+ for your team - learn more

eBook

$54.99 $41.24

you save $13.75 (25%)

include audio $24.99 $18.74

eBook

pdf, ePub, online

$54.99 $41.24

you save $13.75 (25%)

include audio $24.99 $18.74

pro $24.99 per month

access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
choose one free eBook per month to keep
exclusive 50% discount on all purchases
renews monthly, pause or cancel renewal anytime

lite $19.99 per month

access to all Manning books, including MEAPs!

team

5, 10 or 20 seats+ for your team - learn more