| | Trademarks | xiv |
| Preface | xv |
| Introduction | xvi |
| A User's Guide to This Book | xxviii |
PART I BASIC DISTRIBUTED COMPUTING TECHNOLOGIES
|
CHAPTER 1 Fundamentals
| 3
|
| 1.1 | Introduction | 4 |
| 1.2 | Components of a Reliable Distributed Computing System | 7 |
| | 1.2.1 | Communication Technology | 11 |
| | 1.2.2 | Basic Transport and Network Services | 13 |
| | 1.2.3 | Reliable Transport Software and Communication Support | 14 |
| | 1.2.4 | Middleware: Software Tools, Utilities, and Programming Languages | 15 |
| | 1.2.5 | Distributed Computing Environments | 16 |
| | 1.2.6 | End-User Applications | 17 |
| 1.3 | Critical Dependencies | 18 |
| 1.4 | Next Steps | 19 |
| 1.5 | Related Reading | 20 |
| |
CHAPTER 2 Communication Technologies
| 21
|
| 2.1 | Types of Communication Devices | 22 |
| 2.2 | Properties | 23 |
| 2.3 | Ethernet | 25 |
| 2.4 | Fiber Distributed Data Interface | 27 |
| 2.5 | B-ISDN and the Intelligent Network | 28 |
| 2.6 | Asynchronous Transfer Mode | 31 |
| 2.7 | Cluster and Parallel Architectures | 36 |
| 2.8 | Next Steps | 36 |
| 2.9 | Related Reading | 37 |
| |
CHAPTER 3 Basic Communication Services
| 38
|
| 3.1 | Communication Standards | 39 |
| 3.2 | Addressing | 39 |
| 3.3 | Internet Protocols | 44 |
| | 3.3.1 | Internet Protocol: IP Layer | 44 |
| | 3.3.2 | Transmission Control Protocol: TCP | 45 |
| | 3.3.3 | User Datagram Protocol: UDP | 45 |
| | 3.3.4 | Internet Packet Multicast Protocol: IP Multicast | 46 |
| 3.4 | Routing | 47 |
| 3.5 | End-to-End Argument | 48 |
| 3.6 | O/S Architecture Issues: Buffering and Fragmentation | 49 |
| 3.7 | Xpress Transfer Protocol | 52 |
| 3.8 | Next Steps | 54 |
| 3.9 | Related Reading | 55 |
| |
CHAPTER 4 Remote Procedure Calls and the Client/Server Model
| 56
|
| 4.1 | The Client/Server Model | 57 |
| 4.2 | RPC Protocols and Concepts | 59 |
| 4.3 | Writing an RPC-Based Client or Server Program | 62 |
| 4.4 | The RPC Binding Problem | 65 |
| 4.5 | Marshaling and Data Types | 66 |
| 4.6 | Associated Services | 68 |
| | 4.6.1 | Naming Services | 69 |
| | 4.6.2 | Time Services | 70 |
| | 4.6.3 | Security Services | 71 |
| | 4.6.4 | Threads Packages | 72 |
| 4.7 | The RPC Protocol | 75 |
| 4.8 | Using RPC in Reliable Distributed Systems | 77 |
| 4.9 | Related Reading | 81 |
| |
CHAPTER 5 Streams
| 82
|
| 5.1 | Sliding Window Protocols | 83 |
| | 5.1.1 | Error Correction | 84 |
| | 5.1.2 | Flow Control | 85 |
| | 5.1.3 | Dynamic Adjustment of Window Size | 86 |
| | 5.1.4 | Burst Transmission Concept | 87 |
| 5.2 | Negative Acknowledgment Only | 87 |
| 5.3 | Reliability, Fault Tolerance, and Consistency in Streams | 88 |
| 5.4 | RPC over a Stream | 90 |
| 5.5 | Related Reading | 91 |
| |
CHAPTER 6 CORBA and Object-Oriented Environments
| 92
|
| 6.1 | The ANSA Project | 93 |
| 6.2 | Beyond ANSA to CORBA | 95 |
| 6.3 | OLE-2 and Network OLE (Active/X) | 96 |
| 6.4 | The CORBA Reference Model | 96 |
| 6.5 | TINA | 104 |
| 6.6 | IDL and ODL | 104 |
| 6.7 | ORB | 105 |
| 6.8 | Naming Service | 106 |
| 6.9 | ENSThe CORBA Event Notification Service | 106 |
| 6.10 | Life-Cycle Service | 108 |
| 6.11 | Persistent Object Service | 108 |
| 6.12 | Transaction Service | 108 |
| 6.13 | Interobject Broker Protocol | 109 |
| 6.14 | Future CORBA Services | 109 |
| 6.15 | Properties of CORBA Solutions | 109 |
| 6.16 | Related Reading | 111 |
| |
CHAPTER 7 Client/Server Computing
| 112
|
| 7.1 | Stateless and Stateful Client/Server Interactions | 113 |
| 7.2 | Major Uses of the Client/Server Paradigm | 113 |
| 7.3 | Distributed File Systems | 118 |
| 7.4 | Stateful File Servers | 122 |
| 7.5 | Distributed Database Systems | 129 |
| 7.6 | Applying Transactions to File Servers | 136 |
| 7.7 | Message-Oriented Middleware | 138 |
| 7.8 | Related Topics | 138 |
| 7.9 | Related Reading | 140 |
| |
CHAPTER 8 Operating System Support for High-Performance Communication
| 141
|
| 8.1 | Lightweight RPC | 143 |
| 8.2 | Fbufsand thex-Kernel Project | 146 |
| 8.3 | Active Messages | 148 |
| 8.4 | Beyond Active Messages: U-Net | 151 |
| 8.5 | Protocol Compilation Techniques | 154 |
| 8.6 | Related Reading | 155 |
| |
PART II THE WORLD WIDE WEB
|
CHAPTER 9 The World Wide Web
| 159
|
| 9.1 | The World Wide Web | 160 |
| 9.2 | Web Security and Reliability | 162 |
| 9.3 | Related Reading | 165 |
| |
CHAPTER 10 The Major Web Technologies
| 166
|
| 10.1 | Components of the Web | 167 |
| 10.2 | HyperText Markup Language | 168 |
| 10.3 | Virtual Reality Markup Language | 169 |
| 10.4 | Uniform Resource Locators | 170 |
| 10.5 | HyperText Transfer Protocol | 170 |
| 10.6 | Representations of Image Data | 174 |
| 10.7 | Authorization and Privacy Issues | 175 |
| 10.8 | Web Proxy Servers | 178 |
| 10.9 | Java, HotJava, and Agent-Based Browsers | 179 |
| 10.10 | GUI Builders and Other Distributed CASE Tools | 184 |
| 10.11 | TACOMA and the Agent Push Model | 185 |
| 10.12 | Web Search Engines and Web Crawlers | 187 |
| 10.13 | Browser Extensibility Features: Plug-in Technologies | 188 |
| 10.14 | Important Web Servers | 189 |
| 10.15 | Future Challenges | 190 |
| 10.16 | Related Reading | 192 |
| |
CHAPTER 11 Related Internet Technologies
| 193
|
| 11.1 | File Transfer Tools | 194 |
| 11.2 | Electronic Mail | 194 |
| 11.3 | Network Bulletin Boards (Newsgroups) | 195 |
| 11.4 | Message-Oriented Middleware Systems (MOMS) | 197 |
| 11.5 | Message Bus Architectures | 198 |
| 11.6 | Internet Firewalls and Gateways | 201 |
| 11.7 | Related Reading | 202 |
| |
PART III RELIABLE DISTRIBUTED COMPUTING
|
CHAPTER 12 How and Why Computer Systems Fail
| 205
|
| 12.1 | Hardware Reliability and Trends | 206 |
| 12.2 | Software Reliability and Trends | 207 |
| 12.3 | Other Sources of Downtime | 208 |
| 12.4 | Complexity | 209 |
| 12.5 | Detecting Failures | 210 |
| 12.6 | Hostile Environments | 211 |
| 12.7 | Related Reading | 213 |
| |
CHAPTER 13 Guaranteeing Behavior in Distributed Systems
| 214
|
| 13.1 | Consistent Distributed Behavior | 215 |
| 13.2 | Warning: Rough Road Ahead! | 216 |
| 13.3 | Membership in a Distributed System | 217 |
| 13.4 | Time in Distributed Systems | 218 |
| 13.5 | Failure Models and Reliability Goals | 224 |
| 13.6 | Relable Computing in a Static Membership Model | 225 |
| | 13.6.1 | The Distributed Commit Problem | 227 |
| | 13.6.2 | Reading and Updating Replicated Data with Crash Failures | 238 |
| 13.7 | Replicated Data with Nonbenign Failure Modes | 241 |
| 13.8 | Reliability in Asynchronous Environments | 244 |
| | 13.8.1 | Three-Phase Commit and Consensus | 247 |
| 13.9 | The Dynamic Group Membership Problem | 249 |
| 13.10 | The Group Membership Problem | 253 |
| | 13.10.1 | GMS Controversy | 254 |
| | 13.10.2 | GMS and Other System Processes | 255 |
| | 13.10.3 | Protocol Used to Track GMS Membership | 258 |
| | 13.10.4 | GMS Protocol to Handle Client Add and Join Events | 260 |
| | 13.10.5 | GMS Notifications with Bounded Delay | 261 |
| | 13.10.6 | Extending the GMS to Allow Partition and Merge Events | 264 |
| 13.11 | Dynamic Process Groups and Group Communication | 265 |
| | 13.11.1 | Group Communication Primitives | 266 |
| 13.12 | Delivery Ordering Options | 269 |
| | 13.12.1 | Nonuniform Failure-Atomic Group Multicast | 273 |
| | 13.12.2 | Dynamically Uniform Failure-Atomic Group Multicast | 275 |
| | 13.12.3 | Dynamic Process Groups | 276 |
| | 13.12.4 | View-Synchronous Failure Atomicity | 278 |
| | 13.12.5 | Summary of GMS Properties | 280 |
| | 13.12.6 | Ordered Multicast | 282 |
| 13.13 | Communication from Nonmembers to a Group | 294 |
| | 13.13.1 | Scalability | 296 |
| 13.14 | Communication from a Group to a Nonmember | 297 |
| 13.15 | Summary | 297 |
| 13.16 | Related Reading | 299 |
| |
CHAPTER 14 Point-to-Point and Multigroup Considerations
| 301
|
| 14.1 | Causal Communication outside of a Process Group | 302 |
| 14.2 | Extending Causal Order to Multigroup Settings | 305 |
| 14.3 | Extending Total Order to Multigroup Settings | 307 |
| 14.4 | Causal and Total Ordering Domains | 308 |
| 14.5 | Multicasts to Multiple Groups | 309 |
| 14.6 | Multigroup View Management Protocols | 310 |
| 14.7 | Related Reading | 311 |
| |
CHAPTER 15 The Virtually Synchronous Execution Model
| 312
|
| 15.1 | Virtual Synchrony | 313 |
| 15.2 | Extended Virtual Synchrony | 318 |
| 15.3 | Virtually Synchronous Algorithms and Tools | 322 |
| | 15.3.1 | Replicated Data and Synchronization | 322 |
| | 15.3.2 | State Transfer to a Joining Process | 327 |
| | 15.3.3 | Load-Balancing | 329 |
| | 15.3.4 | Primary-Backup Fault Tolerance | 331 |
| | 15.3.5 | Coordinator-Cohort Fault Tolerance | 333 |
| 15.4 | Related Reading | 334 |
| |
CHAPTER 16 Consistency in Distributed Systems
| 335
|
| 16.1 | Consistency in the Static and Dynamic Membership Models | 336 |
| 16.2 | General Remarks concerning Causal and Total Ordering | 346 |
| 16.3 | Summary and Conclusion | 349 |
| 16.4 | Related Reading | 350 |
| |
CHAPTER 17 Retrofitting Reliability into Complex Systems
| 351
|
| 17.1 | Wrappers and Toolkits | 352 |
| | 17.1.1 | Wrapper Technologies | 354 |
| | 17.1.2 | Introducing Robustness in Wrapped Applications | 359 |
| | 17.1.3 | Toolkit Technologies | 361 |
| | 17.1.4 | Distributed Programming Languages | 363 |
| 17.2 | Wrapping a Simple RPC Server | 364 |
| 17.3 | Wrapping a Web Server | 366 |
| 17.4 | Hardening Other Aspects of the Web | 366 |
| 17.5 | Unbreakable Stream Connections | 370 |
| | 17.5.1 | Reliability Options for Stream Communication | 371 |
| | 17.5.2 | An Unbreakable Stream That Mimics TCP | 373 |
| | 17.5.3 | Nondeterminism and Its Consequences | 374 |
| | 17.5.4 | Dealing with Arbitrary Nondeterminism | 375 |
| | 17.5.5 | Replicating the IP Address | 376 |
| | 17.5.6 | Maximizing Concurrency by Relaxing Multicast Ordering | 376 |
| | 17.5.7 | State Transfer Issues | 379 |
| | 17.5.8 | Discussion | 379 |
| 17.6 | Building a Replicated TCP Protocol Using a Toolkit | 380 |
| 17.7 | Reliable Distributed Shared Memory | 381 |
| | 17.7.1 | The Shared Memory Wrapper Abstraction | 382 |
| | 17.7.2 | Memory Coherency Options for Distributed Shared Memory | 384 |
| | 17.7.3 | False Sharing | 386 |
| | 17.7.4 | Demand Paging and Intelligent Prefetching | 387 |
| | 17.7.5 | Fault Tolerance Issues | 387 |
| | 17.7.6 | Security and Protection Considerations | 388 |
| | 17.7.7 | Summary and Discussion | 388 |
| 17.8 | Related Reading | 389 |
| |
CHAPTER 18 Reliable Distributed Computing Systems
| 390
|
| 18.1 | Architectural Considerations in Reliable Systems | 391 |
| 18.2 | Horus: A Flexible Group Communication System | 394 |
| | 18.2.1 | A Layered Process Group Architecture | 395 |
| 18.3 | Protocol Stacks | 397 |
| 18.4 | Using Horus to Build a Robust Groupware Application | 399 |
| 18.5 | Using Horus to Harden CORBA Applications | 401 |
| 18.6 | Basic Performance of Horus | 403 |
| 18.7 | Masking the Overhead of Protocol Layering | 405 |
| | 18.7.1 | Reducing Header Overhead | 406 |
| | 18.7.2 | Eliminating Layered Protocol Processing Overhead | 407 |
| | 18.7.3 | Message Packing | 408 |
| | 18.7.4 | Performance of Horus with the Protocol Accelerator | 409 |
| 18.8 | Scalability | 410 |
| 18.9 | Related Reading | 413 |
| |
CHAPTER 19 Security Options for Distributed Settings
| 414
|
| 19.1 | Security Options for Distributed Settings | 415 |
| 19.2 | Perimeter Defense Technologies | 417 |
| 19.3 | Access Control Technologies | 420 |
| 19.4 | Authentication Schemes and Kerberos | 422 |
| | 19.4.1 | RSA and DES | 422 |
| | 19.4.2 | Kerberos | 424 |
| | 19.4.3 | ONC Security and NFS | 426 |
| | 19.4.4 | Fortezza | 427 |
| 19.5 | Availability and Security | 429 |
| 19.6 | Related Reading | 431 |
| |
CHAPTER 20 Clock Synchronization and Synchronous Systems
| 432
|
| 20.1 | Clock Synchronization | 433 |
| 20.2 | Timed-Asynchronous Protocols | 438 |
| 20.3 | Adapting Virtual Synchrony for Real-Time Settings | 445 |
| 20.4 | Related Reading | 448 |
| |
CHAPTER 21 Transactional Systems
| 449
|
| 21.1 | Review of Transactional Model | 450 |
| 21.2 | Implementation of a Transactional Storage System | 453 |
| | 21.2.1 | Write-Ahead Logging | 453 |
| | 21.2.2 | Persistent Data Seen through an Updates List | 454 |
| | 21.2.3 | Nondistributed Commit Actions | 455 |
| 21.3 | Distributed Transactions and Multiphase Commit | 456 |
| 21.4 | Transactions on Replicated Data | 456 |
| 21.5 | Nested Transactions | 457 |
| | 21.5.1 | Comments on the Nested Transaction Model | 460 |
| 21.6 | Weak Consistency Models | 463 |
| | 21.6.1 | Epsilon Serializability | 463 |
| | 21.6.2 | Weak and Strong Consistency in Partitioned Database Systems | 464 |
| | 21.6.3 | Transactions on Multidatabase Systems | 465 |
| | 21.6.4 | Linearizability | 466 |
| | 21.6.5 | Transactions in Real-Time Systems | 466 |
| 21.7 | Advanced Replication Techniques | 467 |
| 21.8 | Related Reading | 471 |
| |
CHAPTER 22 Probabilistic Protocols
| 472
|
| 22.1 | Designing Probabilistic Protocols | 473 |
| 22.2 | Other Applications of Gossip Protocols | 475 |
| 22.3 | Hayden'spbcastPrimitive | 475 |
| | 22.3.1 | UnorderedpbcastProtocol | 477 |
| | 22.3.2 | Adding Total Ordering | 478 |
| | 22.3.3 | Probabilistic Reliability and the Bimodal Delivery Distribution | 478 |
| | 22.3.4 | An Extension topbcast | 481 |
| | 22.3.5 | Evaluation and Scalability | 481 |
| 22.4 | An Unscalable System Model | 482 |
| 22.5 | Replicated Data usingpbcast | 483 |
| | 22.5.1 | Representation of Replicated Data | 483 |
| | 22.5.2 | Update Protocol | 483 |
| | 22.5.3 | Read Protocol | 484 |
| | 22.5.4 | Locking Protocol | 484 |
| 22.6 | Related Reading | 485 |
| |
CHAPTER 23 Distributed System Management
| 486
|
| 23.1 | The Challenge of Distributed System Management | 487 |
| 23.2 | A Relational System Model | 488 |
| 23.3 | Instrumentation Issues: Sensors, Actuators | 489 |
| 23.4 | Management Information Bases: SNMP and CMIP | 490 |
| | 23.4.1 | Sensors and Events | 491 |
| | 23.4.2 | Actuators | 494 |
| 23.5 | Reactive Control in Distributed Settings | 495 |
| 23.6 | Fault Tolerance by State Machine Replication | 497 |
| 23.7 | Visualization of Distributed System States | 498 |
| 23.8 | Correlated Events | 498 |
| 23.9 | Information Warfare and Defensive Tactics | 499 |
| 23.10 | Related Reading | 503 |
| |
CHAPTER 24 Cluster Computer Architectures
| 504
|
| 24.1 | Introduction | 505 |
| 24.2 | Inside a High-Availability Cluster Product: The Stratus RADIO | 506 |
| 24.3 | Reliability Goals for Cluster Servers | 509 |
| 24.4 | Comparison with Fault-Tolerant Hardware | 511 |
| 24.5 | Protocol Optimizations | 512 |
| 24.6 | Cluster API Goals and Implementation | 514 |
| 24.7 | Related Reading | 515 |
| |
CHAPTER 25 Reasoning About Distributed Systems
| 516
|
| 25.1 | Dimensions of the System Validation Problem | 517 |
| 25.2 | Process- and Message-Oriented Models | 521 |
| 25.3 | System Definition Languages | 524 |
| 25.4 | High-Level Languages and Logics | 526 |
| |
CHAPTER 26 Other Distributed and Transactional Systems
| 528
|
| 26.1 | Related Work in Distributed Computing | 529 |
| | 26.1.1 | Amoeba | 529 |
| | 26.1.2 | Chorus | 530 |
| | 26.1.3 | Delta-4 | 530 |
| | 26.1.4 | Harp | 530 |
| | 26.1.5 | The Highly Available System (HAS) | 531 |
| | 26.1.6 | The Isis Toolkit | 532 |
| | 26.1.7 | Locus | 533 |
| | 26.1.8 | Sender-Based Logging and Manetho | 533 |
| | 26.1.9 | NavTech | 534 |
| | 26.1.10 | Phoenix | 534 |
| | 26.1.11 | Psync | 535 |
| | 26.1.12 | Rampart | 535 |
| | 26.1.13 | Relacs | 535 |
| | 26.1.14 | RMP | 536 |
| | 26.1.15 | StormCast | 536 |
| | 26.1.16 | Totem | 537 |
| | 26.1.17 | Transis | 538 |
| | 26.1.18 | The V System | 539 |
| 26.2 | Systems That Implement Transactions | 539 |
| | 26.2.1 | Argus | 540 |
| | 26.2.2 | Arjuna | 540 |
| | 26.2.3 | Avalon | 541 |
| | 26.2.4 | Bayou | 541 |
| | 26.2.5 | Camelot and Encina | 542 |
| |
| Appendix: Problems | 543 |
| Bibliography | 557 |
| Index | 581 |