Distributed transactions coordinate changes across multiple resource managers so that a set of local transactions behaves as a single atomic unit. The chapter frames atomic commit as the core concern: all participants must either commit or abort together. It introduces a clear mental model in which each local transaction advances through working, prepared, and finally committed or aborted states, and defines correctness via safety (no conflicting outcomes among participants) and liveness (everyone eventually decides). Blocking versus non-blocking commit protocols are contrasted, with non-blocking protocols tolerating a single participant failure without preventing a decision.
The chapter then presents Two-Phase Commit (2PC), the most widely used atomic commit protocol. A client drives work at each resource manager but delegates the final decision to a single coordinator, which runs two phases. In the Prepare phase, the coordinator asks all participants to vote; each either votes to commit (after durably logging that intent) or unilaterally aborts. In the Commit phase, the coordinator decides: if all vote to commit, it logs and broadcasts commit; otherwise, it logs and broadcasts abort, and each participant durably records and applies the outcome. This design ensures atomicity across systems and, in failure-free conditions, provides both safety and liveness.
Under failures, 2PC remains safe but can block. Participant failures are recoverable without blocking: before voting, recovery implies abort; after voting to commit, a participant queries the coordinator; after recording the final outcome, it performs REDO or UNDO. Coordinator failures are the crux: once any participant has voted to commit, those participants may be stuck until the coordinator recovers, because they cannot know the global decision. Variants try to reduce blocking—such as letting participants consult each other—but subtle issues (for example, timeouts and imperfect clocks) can reintroduce safety risks. The takeaway is that 2PC is fundamentally a blocking protocol whose careful logging, state transitions, and recovery rules uphold safety while leaving liveness vulnerable to coordinator failure.
From a single RM to multiple RMs
How do we coordinate and guarantee multiple commits?
A distributed transaction consists of two or more non-distributed transactions.
State Machine of non-distributed transactions
Global transaction (outstanding messages not illustrated)
Two Phase Commit protocol
A resource manager fails before persistently recording ⟨Vote-To-Commit⟩ or ⟨Abort⟩.
A resource manager fails after persistently recording ⟨Vote-To-Commit⟩ or ⟨Abort⟩.
A resource manager fails after persistently recording ⟨Commit⟩.
A resource manager fails after persistently recording ⟨Abort⟩.
Failure of transaction coordinator after the first commit
Failure of transaction coordinator before the first commit
Summary
Distributed transactions extend non-distributed transactions to span multiple resource managers.
A distributed transaction, also referred to as a global transaction, consists of two or more non-distributed transactions, also referred to as local transactions.
Atomic Commit Protocols ensure distributed transactions achieve a unanimous commit or abort decision, upholding atomicity across resource managers.
Blocking commit protocols guarantee safety but not liveness in the presence of failure.
Non-blocking commit protocols guarantee safety and liveness in the presence of failure.
The Two-Phase Commit (2PC) protocol is the most well-known and the most well-studied atomic commit protocol.
2PC divides participants into a transaction coordinator and resource manager and operates in two phases: the Prepare Phase and the Commit Phase.
2PC guarantees safety and liveness in the case of a resource manager failure.
2PC guarantees safety in the case of the transaction coordinator failure.
FAQ
What is a distributed transaction and what is a resource manager (RM)?A distributed transaction spans changes across multiple systems. Each participating system is called a resource manager (RM). RMs include databases and other systems such as message queues. In this chapter, “resource manager” and “database system” are used interchangeably.Why do we need atomic commit protocols across multiple RMs?When a logical operation (like a money transfer) touches data on different RMs, we must prevent disagreement where one side commits while the other aborts. Atomic commit protocols ensure all sub-transactions unanimously commit or unanimously abort.How is atomicity ensured on a single RM versus across multiple RMs?On a single RM, atomicity is achieved by one atomic write to its local log: write Commit or Abort. If the RM fails before writing either, it recovers as if Abort was written. Across multiple RMs, atomicity is achieved by running an atomic commit protocol that coordinates all participants.What do safety and liveness mean, and what distinguishes blocking from non-blocking commit protocols?Safety: no two participants reach conflicting decisions (no one commits while another aborts). Liveness: every participant eventually reaches a final decision (commit or abort). Blocking protocols guarantee safety but not liveness under participant failures. Non-blocking protocols guarantee both safety and liveness in the presence of a single participant failure.What are the states of a local (non-distributed) transaction during a distributed transaction?Local transactions move through: Working (executing operations), Prepared (waiting for commit/abort decision), and then a final state: Committed or Aborted. From Working, a transaction can abort (e.g., constraint violation) or prepare; from Prepared, it commits or aborts upon instruction.How does Two-Phase Commit (2PC) work when there are no failures?2PC has a coordinator and two or more RMs. Phase 1 (Prepare): the coordinator logs Prepare and asks all RMs to vote. Each RM either logs Vote-to-Commit and replies yes, or logs Abort, replies Abort, and aborts locally. Phase 2 (Commit): if all vote yes, the coordinator logs Commit and instructs all to commit; otherwise (any Abort or timeout), it logs Abort and instructs all to abort. Each RM logs the final decision and applies it.Why can participants abort unilaterally but not commit during 2PC’s Prepare phase?Commit must be coordinated so all participants make the same decision; a single RM cannot safely commit alone. However, any RM can always abort safely on its own (e.g., on error or doubt). Hence the asymmetry: Vote-to-Commit versus Abort.How does 2PC handle resource manager failures?2PC remains safe and live despite RM failures. On recovery: if an RM failed before logging Vote-to-Commit or Abort, it logs Abort and informs the coordinator. If it failed after logging Vote-to-Commit, it must inquire the coordinator for the outcome. If it failed after logging Commit, it performs REDO; after logging Abort, it performs UNDO.Why is 2PC considered a blocking protocol, and what happens if the coordinator fails?2PC can block if the transaction coordinator fails after one or more RMs have voted to commit. Any RM that voted yes and is in Prepared cannot decide by itself; it must wait for the coordinator (or a reliable substitute) to learn whether to commit or abort. Example: the coordinator crashes after telling only one RM to commit—others are stuck.What improvements reduce 2PC blocking, and why can naive timeouts violate safety?Variants let RMs consult each other: if any RM has already committed or aborted, others can follow, reducing blocking. However, some executions still block. Simply timing out and aborting when all voted yes but no decision arrived can break safety due to unsynchronized clocks—some RMs might commit while others time out and abort.
pro $24.99 per month
access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!