Kid For Life: February 2012

Tuesday, February 28, 2012

The Mutual Exclusion Problem in Distributed Systems

It is quite difficult to observe that this indeed is a non-trivial problem. Consider for example, a centralized scheduling process which grants requests in the order they are received. A process a sends a request to process b. It then causes another process c to send a request to b ( 'causes' automatically means that a sent a message to c). It is quite possible in this case for c's request to arrive at b before a's request arrives. This clearly violates condition (2). It is now easy to recognize the importance of a total ordering that ensures causal consistency for solving this problem (In fact, we would be able observe later that total ordering amidst out-of-band communications is what completely solves the problem).

Lamport presents a distributed algorithm for solving the problem of mutual exclusion using total ordering. An assumption is necessary here; of using a reliable stream protocol for transfer of various messages. Also, condition 2 of the problem definition is satisfiable due to the fact that total ordering only extends partial ordering. The algorithm assumes the availability of a message queue with every process in the system to which only the process itself has access. The following are the steps of the algorithm:

Requesting process

Enters its request in its own queue (ordered by time stamps)

Sends a request to every node.

Wait for replies from all other nodes. If own request is at the head of the queue and all replies have been received, enter critical section.

Upon exiting the critical section, send a release message to every process.

Other processes

After receiving a request, send a reply and enter the request in the request queue (ordered by time stamps)

After receiving release message, remove the corresponding request from the request queue.

If own request is at the head of the queue and all replies have been received, enter critical section.

    Condition 1 of the problem definition holds because, at the time of release, a release resource message is broadcast and for every received release resource message, a corresponding request resource message is dequeued. Also, a consequence of satisfying condition 1 is that, the request resource message of any process Pi will eventually precede any other request in its queue, thus proving condition 3.

    In the algorithm described above, it is important to observe that, the resource is granted if:

Its request precedes any request from other processes.

This process has learned all requests which precede its current request.

The clause described in a clearly asks for total ordering. The assumption involving reliable stream protocol is an effort to get sequential consistency into total ordering. The two put together in addition to the causal consistency obtained by extending partial ordering have successfully solved the problem at hand.

State machines and distributed systems.

The idea of using state machines to implement distributed systems was mentioned in an earlier section. The problem discussed in the previous section can be taken as an example. The state machine can be thought of to be comprising of a set of states S, a set of commands C and a transition function e:C X S -> S. The set C for this example would then comprise of all commands Pi requests resource and Pi releases resource. The set of states would consist of a queue of waiting request commands, where the request at the head of the queue is granted resource. Each process independently simulates its state machine. The system needs to know the entire command set of the set of all processes.

That is just a glimpse of how time isn't really as simple as it seems from the outside of a distributed system. This scheme for total ordering was notably the very first demonstration of the vitality and complexity of time in distributed systems. The scheme described by Lamport was indeed path-breaking although it had some drawbacks. The importance of his work however lies in the fact that it not only showed that timing was not trivial in distributed systems, but also simplified timing a great deal by demonstrating an ordering scheme so intuitive and simple. What followed Lamport's work was a host of better mutual exclusion algorithms that aimed at minimizing the amount of message exchange among nodes, and more practical ordering schemes for events (or simply better logical clocks) in distributed systems.

Sunday, February 26, 2012

Some time for time - Continued ...

Of logical clocks and total ordering

    Lamport defines a system of logical clocks that simply assigns a number to each event, in order to sequence them in the order of their actual occurrence. The system of clocks assigns to any process b, a number C(b) where the function C could be implemented using simple counters with no actual timing mechanisms at all. The function however, should satisfy what is termed the 'clock' condition to prove its correctness.

    The clock condition in simple terms requires that if an event a occurs before an event b (in the ordering based on the system of clocks), then a should happen at an earlier time than b. More formally,

    Clock Condition: For events a and b:

        if a -> b then C(a) < C(b)

    Clearly, the case of concurrent events makes it wrong to expect the converse condition to hold as well.

    It is important to note that the function C is local to individual processes. Since send and receive events belong to separate processes, we need what Lamport calls 'implementation rules' to guarantee that the system of clocks satisfies the clock condition.

    One, obviously the function C has to increment its value between any 2 successive events. Two, there must be a mechanism to make sure that the receive event must always get a higher value than the send event that caused it. The second condition is achieved by a simple implementation rule. Upon receiving a message, the receiving process sets the value of C to greater than or equal to its present value (value of C in the receiving process) and greater than the value assigned by the sending process. Mathematically,

        Cj = max(Cj(b), Ci(a) + 1)

     The system of logical clocks described above is clearly an irreflexive partial order. A relation of the sort 'a -> a' does not make sense in a physical system. Also, it is only a partial order since totality is absent (with concurrent events, neither of a -> b and b -> a is true). The ordering nonetheless, is a causal order.

    Ideally, we would like to obtain a total ordering with causal consistency. In an attempt to extend Lamport's system of logical clocks to achieve total ordering, Lamport suggests using an ordered pair of the form T(i,j) where i is the process identifier and j is the event's time stamp (or value C(a)). If Cj(b) = Ci(a), the value i acts as the arbiter. Clearly, the 'i' part of the ordered pair establishes a priority among processes and results in a total ordering of events. It is only the partial ordering given by j that is uniquely determined by the system of events. This total ordering is denoted by '=>'.

Total Ordering and distributed systems

    As described in an earlier section, being able to totally order events is of great help while building distributed systems. Lamport uses the example of solving the mutual exclusion problem in a distributed system to show this fact. The problem in general, can be termed a 'resource allocation' problem. The following is the problem definition taken straight from his paper:

A process that has been granted a resource must release it before it can be granted to another process.

Different requests for the resource must be granted in the order in which they were made.

If every resource which is granted the resource eventually releases it, then every request is eventually granted.

How Lamport solves the mutual exclusion problem in a distributed system and a little more to close this essay, coming up in the next blog post.

Saturday, February 25, 2012

Some time for time - Of Distributed Systems

"You know you have a distributed system when the crash of a computer you’ve never heard of stops you from getting any work done.” This comment by Leslie Lamport pretty much sums up the very idea of a distributed system, its challenges in design and the typical emergent properties of any distributed system.

One such important discussion that happens in the context of a distributed system is about time. Time as such, has never been defined to totality at any point in time. The simplest way to put it however is that time is a measure employed to 'order events'. When events can be sequenced, it becomes possible to compare their durations, measure intervals between them and quantify rates of change. Hence, time is anything that helps us achieve all of the above.

In a system that is not distributed, the view of time is less complicated simply because the

components of the system are too closely coupled (in terms of physical time) to view different times at a given point in time. The challenge in a distributed system hence lies in the fact that it consists of distinct processes which communicate with each other by exchanging messages, the message transmission delay not being negligible. The consequence this emergent property has on a distributed system is that physical theories of time can never be sufficient to say that an event a happened before another event b just because a happened at an earlier time than b!

Leslie Lamport's most cited paper

Leslie Lamport in his paper describes a partial ordering defined by the 'happened before' relation. He then uses arbitration in the form of process identifiers to arrive at a method to order events totally. Lamport also discusses the problem of mutual exclusion in the context of a distributed system and suggests the use of total ordering to solve it.

The most important point about the importance of total ordering in a distributed system Lamport makes in his paper is that a distributed system can be described as a particular sequential state machine, that is built using multiple processes (or 'processors' that perform processes). As a result, if you can totally order input requests, you obtain an algorithm to implement a state machine that is equivalent to a distributed system.

The rest of this essay, which I believe could go on for a few more blog posts, is about my understanding/misunderstanding of a few of Lamport's ideas in the wonderful paper here.

Partial Ordering using the happened-before relation

As defined earlier, a distributed system comprises of a collection of processes that communicate via messages. Each process consists of a sequence of events. An event is any indivisible/atomic operation that is part of a process. What can be termed an event entirely depends on the process it belongs to. Within a distributed system, ordering events becomes a non-trivial problem for reasons stated in the previous sections. It is not unreasonable to assume that a process has an apriori sequence; with its events ordered totally. In the case of events between 2 different processes however, ordering needs an efficient methodology. The first step to arriving at such a methodology is the little but significant observation of the fact that a 'send' event always precedes a corresponding 'receive' event. In other words, a send event always happens before its receiving event. This happened before relation is denoted by '->;'.

It is easy to observe the conditions needed to satisfy the '->;' relation.

a and b belong to the same process: 'a comes before b' implies a ->; b
a is the send event of one process and b is the corresponding receive event of another, implies that a ->; b
The '->;' relation is transitive.

Concurrent events are hence nothing but events where the ->; relation does not hold. In other words, concurrent events are those which have no impacts on one another i.e. If an event a does not causally affect event b, a and b are said to be concurrent. Otherwise, they are said be 'causally related'. This notion of causal relations among events becomes the basis for defining a system of logical clocks in a distributed system.

Lamport defines a system of logical clocks that can look extremely trivial at times, yet is a landmark in itself in the area of distributed systems. Its significance and implications are what makes one realize that nothing about this piece of Lamport's work is indeed trivial! This and a little more in my next post.