失效链接处理 |
分布式快照算法 PDF 下载
本站整理下载:
相关截图:
主要内容:
3. THE ALGORITHM
3.1. Motivation for the Steps of the Algorithm
The global-state recording algorithm works as follows: Each process records its
own state, and the two processes that a channel is incident on cooperate in
recording the channel state. We cannot ensure that the states of all processes
and channels will be recorded at the same instant because there is no global
clock; however, we require that the recorded process and channel states form a
“meaningful” global system state.
The global-state recording algorithm is to be superimposed on the underlying
computation, that is, it must run concurrently with, but not alter, the underlying
computation. The algorithm, may send messages and require processes to carry
out computations; however, the messages and computation required to record the
global state must not interfere with the underlying computation.
We now consider an example to motivate the steps of the algorithm. In the
example we shall assume that we can record the state of a channel instantaneously; we postpone discussion of how the channel state is recorded. Let c be a
channel from p to 9. The purpose of the example is to gain an intuitive
understanding of the relationship between the instant at which the state of
channel c is to be recorded and the instants at which the states of processes p
and q are to be recorded.
Example 3.1. Consider the single-token conservation system. Assume that
the state of process p is recorded in global state in-p. Then the state recorded for
p shows the token in p. Now assume that the global state transits to in-c (because
p sends the token). Suppose the states of channels c and c’ and of process q were
recorded in global state in-c, so the state recorded for channel c shows it with the
token and the states recorded for channel c’ and process q show them not in
possession of the token. The composite global state recorded in this fashion
would show two tokens in the system, one in p and the other in c. But a global
state with two tokens is unreachable from the initial global state in a single-t&en
conservation system! The inconsistency arises because the state of p is recorded
before p sent a message along c and the state of c is recorded after p sent the
message. Let n be the number of messages sent along c beforep’s state is recorded,
and let n’ be the number of messages sent along c before c’s state is recorded.
Our example suggests that the recorded global state may be inconsistent if n <
I n.
Now consider an alternate scenario. Suppose the state of c is recorded in global
state in-p, the system then transits to global state in-c, and the states of c’, p,
and q are recorded in global state in-c. The recorded global state shows no tokens
in the system. This example suggests that the recorded global state may be
inconsistent if the state of c is recorded before p sends a message along c and the
state of p is recorded after p sends a message along c, that is, if n > n’. Cl
We learn from these examples that (in general) a consistent global state
requires
n = n’. (1)
ACM Transactions on Computer Systems, Vol. 3, No. 1, February 1985.
70 l K. M. Chandy and L. Lamport
Let m be the number of messages received along c before q’s state is recorded.
Let m’ be the number of messages received along c before c’s state is recorded.
We leave it up to the reader to extend the example to show that consistency
requires
m = m’. (2)
In every state, the number of messages received along a channel cannot exceed
the number of messages sent along that channel, that is,
From the above equations,
n’ 2 m’. (3)
n 2 m. (4)
The state of channel c that is recorded must be the sequence of messages sent
along the channel before the sender’s state is recorded, excluding the sequence
of messages received along the channel before the receiver’s state is recordedthat is, if n’ = m’, the recorded state of c must be the empty sequence, and if n’
> m’, the recorded state of c must be the (m’ + l)st, . . . , n’th messages sent by
p along c. This fact and eqs. (l)-(4) suggest a simple algorithm by which q can
record the state of channel c. Process p sends a special message, called a marker,
after the nth message it sends along c (and before sending further messages along
c). The marker has no effect on the underlying computation. The state of c is
the sequence of messages received by q after q records its own state and before q
receives the marker along c. To ensure eq. (4), q must record its state, if it has
not done so already, after receiving a marker along c and before q receives further
messages along c.
Our example suggests the following outline for a global state detection algorithm.
3.2 Global-State-Detection Algorithm Outline
Marker-Sending Rule for a Process p. For each channel c, incident on, and
directed away from p:
p sends one marker along c after p records its state and before p sends further messages
along c.
Marker-Receiving Rule for a Process q. On receiving a marker along a channel
C:
if q has not recorded its state then
begin q records its state;
q records the state c as the empty sequence
end
else q records the state of c as the sequence of messages received along c after q’s state
was recorded and before q received the marker along c.
3.3 Termination of the Algorithm
The marker receiving and sending rules guarantee that if a marker is received
along every channel, then each process will record its state and the states of all
ACM Transactions on Computer Systems, Vol. 3, No. 1, February 1985.
Distributed Snapshots l 71
incoming channels. To ensure that the global-state recording algorithm terminates in finite time, each process must ensure that (Ll) no marker remains
forever in an incident input channel and (L2) it records its state within finite
time of initiation of the algorithm.
The algorithm can be initiated by one or more processes, each of which records
its state spontaneously, without receiving markers from other processes; we
postpone discussion of what may cause a process to record its state spontaneously.
If process p records its state and there is a channel from p to a process 4, then q
will record its state in finite time because p will send a marker along the channel
and q will receive the marker in finite time (Ll). Hence if p records its state and
there is a path (in the graph representing the system) from p to a process q, then
q will record its state in finite time because, by induction, every process along
the path will record its state in finite time. Termination in finite time is ensured
if for every process q: q spontaneously records its state or there is a path from a
process p, which spontaneously records its state, to q.
In particular, if the graph is strongly connected and at least one process
spontaneously records its state, then all processes will record their states in finite
time (provided Ll is ensured).
The algorithm described so far allows each process to record its state and the
states of incoming channels. The recorded process and channel states must be
collected and assembled to form the recorded global state. We shall not describe
algorithms for collecting the recorded information because such algorithms have
been described elsewhere [4, lo]. A simple algorithm for collecting information
in a system whose topology is strongly connected is for each process to send the
information it records along all outgoing channels, and for each process receiving
information for the first time to copy it and propagate it along all of its outgoing
channels. All the recorded information will then get to all the processes in finite
time, allowing all processes to determine the recorded global state.
4. PROPERTIES OF THE RECORDED GLOBAL STATE
To gain an intuitive understanding of the properties of the global state recorded
by the algorithm, we shall study Example 2.2. Assume that the state of p is
recorded in global state So (Figure 7), so the state recorded for p is A. After
recording its state, p sends a marker along channel c. Now assume that the
iystem goes to global state Si, then Sz, and then S3 while the marker is still in
transit, and the marker is received by q when the system is in global state SB. On
receiving the marker, q records its state, which is D, and records the state of c to
be the empty sequence. After recording its state, q sends a marker along channel
c’. On receiving the marker, p records the state of c’ as the sequence consisting
of the single message M’. The recorded global state S* is shown in Figure 8. The
recording algorithm was initiated in global state 5’0 and terminated in global state
s3.
Observe that the global state S* recorded by the algorithm is not identical to
any of the global states So, S1, Sz, S3 that occurred in the computation. Of what
use is the algorithm if the recorded global state never occurred? We shall now
answer this question.
|