It is quite difficult to observe that this indeed is a non-trivial problem. Consider for example, a centralized scheduling process which grants requests in the order they are received. A process a sends a request to process b. It then causes another process c to send a request to b ( 'causes' automatically means that a sent a message to c). It is quite possible in this case for c's request to arrive at b before a's request arrives. This clearly violates condition (2). It is now easy to recognize the importance of a total ordering that ensures causal consistency for solving this problem (In fact, we would be able observe later that total ordering amidst out-of-band communications is what completely solves the problem).
Lamport presents a distributed algorithm for solving the problem of mutual exclusion using total ordering. An assumption is necessary here; of using a reliable stream protocol for transfer of various messages. Also, condition 2 of the problem definition is satisfiable due to the fact that total ordering only extends partial ordering. The algorithm assumes the availability of a message queue with every process in the system to which only the process itself has access. The following are the steps of the algorithm:
Requesting process
Lamport presents a distributed algorithm for solving the problem of mutual exclusion using total ordering. An assumption is necessary here; of using a reliable stream protocol for transfer of various messages. Also, condition 2 of the problem definition is satisfiable due to the fact that total ordering only extends partial ordering. The algorithm assumes the availability of a message queue with every process in the system to which only the process itself has access. The following are the steps of the algorithm:
Requesting process
- Enters its request in its own queue (ordered by time stamps)
- Sends a request to every node.
- Wait for replies from all other nodes. If own request is at the head of the queue and all replies have been received, enter critical section.
- Upon exiting the critical section, send a release message to every process.
Other processes
- After receiving a request, send a reply and enter the request in the request queue (ordered by time stamps)
- After receiving release message, remove the corresponding request from the request queue.
- If own request is at the head of the queue and all replies have been received, enter critical section.
Condition 1 of the problem definition holds because, at the time of release, a release resource message is broadcast and for every received release resource message, a corresponding request resource message is dequeued. Also, a consequence of satisfying condition 1 is that, the request resource message of any process Pi will eventually precede any other request in its queue, thus proving condition 3.
In the algorithm described above, it is important to observe that, the resource is granted if:
- Its request precedes any request from other processes.
- This process has learned all requests which precede its current request.
The clause described in a clearly asks for total ordering. The assumption involving reliable stream protocol is an effort to get sequential consistency into total ordering. The two put together in addition to the causal consistency obtained by extending partial ordering have successfully solved the problem at hand.
State machines and distributed systems.
The idea of using state machines to implement distributed systems was mentioned in an earlier section. The problem discussed in the previous section can be taken as an example. The state machine can be thought of to be comprising of a set of states S, a set of commands C and a transition function e:C X S -> S. The set C for this example would then comprise of all commands Pi requests resource and Pi releases resource. The set of states would consist of a queue of waiting request commands, where the request at the head of the queue is granted resource. Each process independently simulates its state machine. The system needs to know the entire command set of the set of all processes.
That is just a glimpse of how time isn't really as simple as it seems from the outside of a distributed system. This scheme for total ordering was notably the very first demonstration of the vitality and complexity of time in distributed systems. The scheme described by Lamport was indeed path-breaking although it had some drawbacks. The importance of his work however lies in the fact that it not only showed that timing was not trivial in distributed systems, but also simplified timing a great deal by demonstrating an ordering scheme so intuitive and simple. What followed Lamport's work was a host of better mutual exclusion algorithms that aimed at minimizing the amount of message exchange among nodes, and more practical ordering schemes for events (or simply better logical clocks) in distributed systems.