Table of Contents

          Trademarksxiv
Prefacexv
Introductionxvi
A User's Guide to This Bookxxviii

PART I  BASIC DISTRIBUTED COMPUTING TECHNOLOGIES

CHAPTER 1  Fundamentals

3
1.1Introduction4
1.2Components of a Reliable Distributed Computing System7
          1.2.1Communication Technology11
1.2.2Basic Transport and Network Services13
1.2.3Reliable Transport Software and Communication Support14
1.2.4Middleware: Software Tools, Utilities, and Programming Languages15
1.2.5Distributed Computing Environments16
1.2.6End-User Applications17
1.3Critical Dependencies18
1.4Next Steps19
1.5Related Reading20
 

CHAPTER 2  Communication Technologies

21
2.1Types of Communication Devices22
2.2Properties23
2.3Ethernet25
2.4Fiber Distributed Data Interface27
2.5B-ISDN and the Intelligent Network28
2.6Asynchronous Transfer Mode31
2.7Cluster and Parallel Architectures36
2.8Next Steps36
2.9Related Reading37
 

CHAPTER 3  Basic Communication Services

38
3.1Communication Standards39
3.2Addressing39
3.3Internet Protocols44
3.3.1Internet Protocol: IP Layer44
3.3.2Transmission Control Protocol: TCP45
3.3.3User Datagram Protocol: UDP45
3.3.4Internet Packet Multicast Protocol: IP Multicast46
3.4Routing47
3.5End-to-End Argument48
3.6O/S Architecture Issues: Buffering and Fragmentation49
3.7Xpress Transfer Protocol52
3.8Next Steps54
3.9Related Reading55
 

CHAPTER 4  Remote Procedure Calls and the Client/Server Model

56
4.1The Client/Server Model57
4.2RPC Protocols and Concepts59
4.3Writing an RPC-Based Client or Server Program62
4.4The RPC Binding Problem65
4.5Marshaling and Data Types66
4.6Associated Services68
4.6.1Naming Services69
4.6.2Time Services70
4.6.3Security Services71
4.6.4Threads Packages72
4.7The RPC Protocol75
4.8Using RPC in Reliable Distributed Systems77
4.9Related Reading81
 

CHAPTER 5  Streams

82
5.1Sliding Window Protocols83
5.1.1Error Correction84
5.1.2Flow Control85
5.1.3Dynamic Adjustment of Window Size86
5.1.4Burst Transmission Concept87
5.2Negative Acknowledgment Only87
5.3Reliability, Fault Tolerance, and Consistency in Streams88
5.4RPC over a Stream90
5.5Related Reading91
 

CHAPTER 6  CORBA and Object-Oriented Environments

92
6.1The ANSA Project93
6.2Beyond ANSA to CORBA95
6.3OLE-2 and Network OLE (Active/X)96
6.4The CORBA Reference Model96
6.5TINA104
6.6IDL and ODL104
6.7ORB105
6.8Naming Service106
6.9ENSThe CORBA Event Notification Service106
6.10Life-Cycle Service108
6.11Persistent Object Service108
6.12Transaction Service108
6.13Interobject Broker Protocol109
6.14Future CORBA Services109
6.15Properties of CORBA Solutions109
6.16Related Reading111
 

CHAPTER 7  Client/Server Computing

112
7.1Stateless and Stateful Client/Server Interactions113
7.2Major Uses of the Client/Server Paradigm113
7.3Distributed File Systems118
7.4Stateful File Servers122
7.5Distributed Database Systems129
7.6Applying Transactions to File Servers136
7.7Message-Oriented Middleware138
7.8Related Topics138
7.9Related Reading140
 

CHAPTER 8  Operating System Support for High-Performance Communication

141
8.1Lightweight RPC143
8.2Fbufsand thex-Kernel Project146
8.3Active Messages148
8.4Beyond Active Messages: U-Net151
8.5Protocol Compilation Techniques154
8.6Related Reading155
 

PART II  THE WORLD WIDE WEB

CHAPTER 9  The World Wide Web

159
9.1The World Wide Web160
9.2Web Security and Reliability162
9.3Related Reading165
 

CHAPTER 10  The Major Web Technologies

166
10.1Components of the Web167
10.2HyperText Markup Language168
10.3Virtual Reality Markup Language169
10.4Uniform Resource Locators170
10.5HyperText Transfer Protocol170
10.6Representations of Image Data174
10.7Authorization and Privacy Issues175
10.8Web Proxy Servers178
10.9Java, HotJava, and Agent-Based Browsers179
10.10GUI Builders and Other Distributed CASE Tools184
10.11TACOMA and the Agent Push Model185
10.12Web Search Engines and Web Crawlers187
10.13Browser Extensibility Features: Plug-in Technologies188
10.14Important Web Servers189
10.15Future Challenges190
10.16Related Reading192
 

CHAPTER 11  Related Internet Technologies

193
11.1File Transfer Tools194
11.2Electronic Mail194
11.3Network Bulletin Boards (Newsgroups)195
11.4Message-Oriented Middleware Systems (MOMS)197
11.5Message Bus Architectures198
11.6Internet Firewalls and Gateways201
11.7Related Reading202
 

PART III  RELIABLE DISTRIBUTED COMPUTING

CHAPTER 12  How and Why Computer Systems Fail

205
12.1Hardware Reliability and Trends206
12.2Software Reliability and Trends207
12.3Other Sources of Downtime208
12.4Complexity209
12.5Detecting Failures210
12.6Hostile Environments211
12.7Related Reading213
 

CHAPTER 13  Guaranteeing Behavior in Distributed Systems

214
13.1Consistent Distributed Behavior215
13.2Warning: Rough Road Ahead!216
13.3Membership in a Distributed System217
13.4Time in Distributed Systems218
13.5Failure Models and Reliability Goals224
13.6Relable Computing in a Static Membership Model225
13.6.1The Distributed Commit Problem227
13.6.2Reading and Updating Replicated Data with Crash Failures238
13.7Replicated Data with Nonbenign Failure Modes241
13.8Reliability in Asynchronous Environments244
13.8.1Three-Phase Commit and Consensus247
13.9The Dynamic Group Membership Problem249
13.10The Group Membership Problem253
13.10.1  GMS Controversy254
13.10.2GMS and Other System Processes255
13.10.3Protocol Used to Track GMS Membership258
13.10.4GMS Protocol to Handle Client Add and Join Events260
13.10.5GMS Notifications with Bounded Delay261
13.10.6Extending the GMS to Allow Partition and Merge Events264
13.11Dynamic Process Groups and Group Communication265
13.11.1Group Communication Primitives266
13.12Delivery Ordering Options269
13.12.1Nonuniform Failure-Atomic Group Multicast273
13.12.2Dynamically Uniform Failure-Atomic Group Multicast275
13.12.3Dynamic Process Groups276
13.12.4View-Synchronous Failure Atomicity278
13.12.5Summary of GMS Properties280
13.12.6Ordered Multicast282
13.13Communication from Nonmembers to a Group294
13.13.1Scalability296
13.14Communication from a Group to a Nonmember297
13.15Summary297
13.16Related Reading299
 

CHAPTER 14  Point-to-Point and Multigroup Considerations

301
14.1Causal Communication outside of a Process Group302
14.2Extending Causal Order to Multigroup Settings305
14.3Extending Total Order to Multigroup Settings307
14.4Causal and Total Ordering Domains308
14.5Multicasts to Multiple Groups309
14.6Multigroup View Management Protocols310
14.7Related Reading311
 

CHAPTER 15  The Virtually Synchronous Execution Model

312
15.1Virtual Synchrony313
15.2Extended Virtual Synchrony318
15.3Virtually Synchronous Algorithms and Tools322
15.3.1Replicated Data and Synchronization322
15.3.2State Transfer to a Joining Process327
15.3.3Load-Balancing329
15.3.4Primary-Backup Fault Tolerance331
15.3.5Coordinator-Cohort Fault Tolerance333
15.4Related Reading334
 

CHAPTER 16  Consistency in Distributed Systems

335
16.1Consistency in the Static and Dynamic Membership Models336
16.2General Remarks concerning Causal and Total Ordering346
16.3Summary and Conclusion349
16.4Related Reading350
 

CHAPTER 17  Retrofitting Reliability into Complex Systems

351
17.1Wrappers and Toolkits352
17.1.1Wrapper Technologies354
17.1.2Introducing Robustness in Wrapped Applications359
17.1.3Toolkit Technologies361
17.1.4Distributed Programming Languages363
17.2Wrapping a Simple RPC Server364
17.3Wrapping a Web Server366
17.4Hardening Other Aspects of the Web366
17.5Unbreakable Stream Connections370
17.5.1Reliability Options for Stream Communication371
17.5.2An Unbreakable Stream That Mimics TCP373
17.5.3Nondeterminism and Its Consequences374
17.5.4Dealing with Arbitrary Nondeterminism375
17.5.5Replicating the IP Address376
17.5.6Maximizing Concurrency by Relaxing Multicast Ordering376
17.5.7State Transfer Issues379
17.5.8Discussion379
17.6Building a Replicated TCP Protocol Using a Toolkit380
17.7Reliable Distributed Shared Memory381
17.7.1The Shared Memory Wrapper Abstraction382
17.7.2Memory Coherency Options for Distributed Shared Memory384
17.7.3False Sharing386
17.7.4Demand Paging and Intelligent Prefetching387
17.7.5Fault Tolerance Issues387
17.7.6Security and Protection Considerations388
17.7.7Summary and Discussion388
17.8Related Reading389
 

CHAPTER 18  Reliable Distributed Computing Systems

390
18.1Architectural Considerations in Reliable Systems391
18.2Horus: A Flexible Group Communication System394
18.2.1A Layered Process Group Architecture395
18.3Protocol Stacks397
18.4Using Horus to Build a Robust Groupware Application399
18.5Using Horus to Harden CORBA Applications401
18.6Basic Performance of Horus403
18.7Masking the Overhead of Protocol Layering405
18.7.1Reducing Header Overhead406
18.7.2Eliminating Layered Protocol Processing Overhead407
18.7.3Message Packing408
18.7.4Performance of Horus with the Protocol Accelerator409
18.8Scalability410
18.9Related Reading413
 

CHAPTER 19  Security Options for Distributed Settings

414
19.1Security Options for Distributed Settings415
19.2Perimeter Defense Technologies417
19.3Access Control Technologies420
19.4Authentication Schemes and Kerberos422
19.4.1RSA and DES422
19.4.2Kerberos424
19.4.3ONC Security and NFS426
19.4.4Fortezza427
19.5Availability and Security429
19.6Related Reading431
 

CHAPTER 20  Clock Synchronization and Synchronous Systems

432
20.1Clock Synchronization433
20.2Timed-Asynchronous Protocols438
20.3Adapting Virtual Synchrony for Real-Time Settings445
20.4Related Reading448
 

CHAPTER 21  Transactional Systems

449
21.1Review of Transactional Model450
21.2Implementation of a Transactional Storage System453
21.2.1Write-Ahead Logging453
21.2.2Persistent Data Seen through an Updates List454
21.2.3Nondistributed Commit Actions455
21.3Distributed Transactions and Multiphase Commit456
21.4Transactions on Replicated Data456
21.5Nested Transactions457
21.5.1Comments on the Nested Transaction Model460
21.6Weak Consistency Models463
21.6.1Epsilon Serializability463
21.6.2Weak and Strong Consistency in Partitioned Database Systems464
21.6.3Transactions on Multidatabase Systems465
21.6.4Linearizability466
21.6.5Transactions in Real-Time Systems466
21.7Advanced Replication Techniques467
21.8Related Reading471
 

CHAPTER 22  Probabilistic Protocols

472
22.1Designing Probabilistic Protocols473
22.2Other Applications of Gossip Protocols475
22.3Hayden'spbcastPrimitive475
22.3.1UnorderedpbcastProtocol477
22.3.2Adding Total Ordering478
22.3.3Probabilistic Reliability and the Bimodal Delivery Distribution478
22.3.4An Extension topbcast481
22.3.5Evaluation and Scalability481
22.4An Unscalable System Model482
22.5Replicated Data usingpbcast483
22.5.1Representation of Replicated Data483
22.5.2Update Protocol483
22.5.3Read Protocol484
22.5.4Locking Protocol484
22.6Related Reading485
 

CHAPTER 23  Distributed System Management

486
23.1The Challenge of Distributed System Management487
23.2A Relational System Model488
23.3Instrumentation Issues: Sensors, Actuators489
23.4Management Information Bases: SNMP and CMIP490
23.4.1Sensors and Events491
23.4.2Actuators494
23.5Reactive Control in Distributed Settings495
23.6Fault Tolerance by State Machine Replication497
23.7Visualization of Distributed System States498
23.8Correlated Events498
23.9Information Warfare and Defensive Tactics499
23.10Related Reading503
 

CHAPTER 24  Cluster Computer Architectures

504
24.1Introduction505
24.2Inside a High-Availability Cluster Product: The Stratus RADIO506
24.3Reliability Goals for Cluster Servers509
24.4Comparison with Fault-Tolerant Hardware511
24.5Protocol Optimizations512
24.6Cluster API Goals and Implementation514
24.7Related Reading515
 

CHAPTER 25  Reasoning About Distributed Systems

516
25.1Dimensions of the System Validation Problem517
25.2Process- and Message-Oriented Models521
25.3System Definition Languages524
25.4High-Level Languages and Logics526
 

CHAPTER 26  Other Distributed and Transactional Systems

528
26.1Related Work in Distributed Computing529
26.1.1Amoeba529
26.1.2Chorus530
26.1.3Delta-4530
26.1.4Harp530
26.1.5The Highly Available System (HAS)531
26.1.6The Isis Toolkit532
26.1.7Locus533
26.1.8Sender-Based Logging and Manetho533
26.1.9NavTech534
26.1.10Phoenix534
26.1.11Psync535
26.1.12Rampart535
26.1.13Relacs535
26.1.14RMP536
26.1.15StormCast536
26.1.16Totem537
26.1.17Transis538
26.1.18The V System539
26.2Systems That Implement Transactions539
26.2.1Argus540
26.2.2Arjuna540
26.2.3Avalon541
26.2.4Bayou541
26.2.5Camelot and Encina542
 
Appendix: Problems543
Bibliography557
Index581