6.2.2: Problem Sets 11-20

Last updated
Save as PDF

Page ID: 76175

$ \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } $ $ \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} $$\newcommand{\id}{\mathrm{id}}$ $ \newcommand{\Span}{\mathrm{span}}$ $ \newcommand{\kernel}{\mathrm{null}\,}$ $ \newcommand{\range}{\mathrm{range}\,}$ $ \newcommand{\RealPart}{\mathrm{Re}}$ $ \newcommand{\ImaginaryPart}{\mathrm{Im}}$ $ \newcommand{\Argument}{\mathrm{Arg}}$ $ \newcommand{\norm}[1]{\| #1 \|}$ $ \newcommand{\inner}[2]{\langle #1, #2 \rangle}$ $ \newcommand{\Span}{\mathrm{span}}$ $\newcommand{\id}{\mathrm{id}}$ $ \newcommand{\Span}{\mathrm{span}}$ $ \newcommand{\kernel}{\mathrm{null}\,}$ $ \newcommand{\range}{\mathrm{range}\,}$ $ \newcommand{\RealPart}{\mathrm{Re}}$ $ \newcommand{\ImaginaryPart}{\mathrm{Im}}$ $ \newcommand{\Argument}{\mathrm{Arg}}$ $ \newcommand{\norm}[1]{\| #1 \|}$ $ \newcommand{\inner}[2]{\langle #1, #2 \rangle}$ $ \newcommand{\Span}{\mathrm{span}}$$\newcommand{\AA}{\unicode[.8,0]{x212B}}$

Problem Set 11: RaidCo$^*$

$^*$ Credit for developing this problem set goes to Stephen A. Ward.

2007–2–11

(Chapter 2)

RaidCo is a company that makes pin-compatible hard disk replacements using tiny, chip-sized hard disks ("microdrives") that have become available cheaply. Each RaidCo product behaves like a hard disk, supporting the operations:

error ← GET (nblocks, starting_block_number, buffer_address)
error ← PUT (nblocks, starting_block_number, buffer_address)

to get or put an integral number of consecutive blocks from or to the disk array. Each operation returns a status value indicating whether an error has occurred.

The models are as follows:

R0: sector-level striping across all twelve microdrives, no redundancy/error correction
R1: six pairs of two mirrored microdrives, no striping
R2: 12-microdrive RAID 2, using bit-level striping, error detection, and error correction); microdrive’s internal sector-level error detection is disabled.
R3: 12-microdrive RAID 3, using sector-level striping and error correction.
R4: 12-microdrive RAID 4, no striping, dedicated parity disk
R5: 12-microdrive RAID 5, no striping, distributed parity

The microdrives each conform to the same read/write API sketched above, each microdrive providing 100,000 sectors of 1,000 bytes each, and offering a uniform 10 millisecond seek time and a read/write bandwidth of 100 megabytes per second; thus the entire 100 megabytes of data on a microdrive can be fetched using a single GET operation in one second. The RaidCo products do no caching or buffering: each GET or PUT involves actual data transfer to or from the involved microdrives. Since the microdrives have uniform seek time, the RaidCo products do not need, and do not use, any seek optimizations.

Q11.1 As good as the students were at programming, they unfortunately left the documentation unfinished. Your job is to complete the following table, showing certain specifications for each model drive (i.e., the size and performance parameters of the API supported by each RAID system). Entries assume error-free operation, and ignore transfer times that are small compared to seek times encountered.

	R0	R1	R2	R3	R4	R5
Block size (kilobytes) exposed to `GET/PUT`	1 kB	1 kB		11 kB		1 kB
Capacity, in blocks						1,100,000
Max time for a single 100-megabyte `GET` (seconds)	1/12 s				1 s	1 s
Time for a 1-block `PUT` (milliseconds)	10 ms	10 ms	10 ms			20 ms
Typical number of microdrives involved in a 1-block `GET`	1					1
Typical number of microdrives involved in a 2-block `GET`		2			1	1
Typical number of microdrives involved in a 1-block `PUT`		2

Add exercises text here.

Problem Set 12: ColdFusion $^*$

$^*$ Credit for developing this problem set goes to Hari Balakrishnan.

2000–3–8…15

(Chapter 2, with a bit of Chapter 3)

Alyssa P. Hacker and Ben Bitdiddle are designing a hot new system, called ColdFusion, whose goal is to allow users to back up their storage systems with copies stored in a distributed network of ColdFusion servers. Users interact with ColdFusion using PUT and GET operations.

PUT (data, fid) takes a data buffer and reliably stores it under a unique identifier fid , a positive integer, on some subset of the servers. It returns SUCCESS if it was successful in storing it on all the machines in the chosen subset, and FAILURE otherwise.
GET (fid) returns the contents of the most recent successful PUT to the system for the file identified by fid .

Because high availability is a key competitive advantage, Alyssa decides to replicate user data on more than one server machine. But rather than replicate each file on every server, she decides to be clever and use only a subset of the servers for each file. Thus, the PUT of a file stores it on some number ($A$) of the servers, invoking a SERVER_PUT operation on each server. SERVER_PUT is atomic and is implemented by each server.

If PUT is unable to successfully store the file on $A$ servers, it returns FAILURE .

When a client does a GET of the file, the GET software attempts to retrieve the file from some subset of the servers and picks the version using an election algorithm. It chooses $B$ servers to read data from, using an atomic SERVER_GET operation implemented by each server, following which it calls PICK_MAJORITY (). PICK_MAJORITY returns valid data corresponding to a version that is shared by more than 50% of the $B$ copies retrieved, and NULL otherwise. Even though the client may not know which specific servers hold the current copy, the number of servers ($A$) in PUT and the number ($B$) in GET are chosen so that if a client GET succeeds, it is certain to have received the most recent copy.

Alyssa and Ben write the following code for PUT and GET . There are $S$ servers in all, and $S > 2$. The particular ordering of the servers in the code below may be different at different clients, but all clients have the same list. They hire you as a consultant to help them figure out the missing parameters ($A$ and $B$) and analyze the system for correctness.

// Internet addresses of the S servers, S > 2 ip_address server[S] // each client may have a different ordering

procedure PUT (byte data[], integer fid) integer ntried, nputs ← 0 // # of servers tried and # successfully put for ntried from 0 to S do // Put “data” into a file identified by fid at server[ntried] status ← SERVER_PUT (data, fid, server[ntried]) if status = success then nputs ← nputs + 1 if nputs ≥ A then // yes! have SERVER_PUT () to A servers! return SUCCESS; return FAILURE // found < A servers to SERVER_PUT() to

procedure GET (integer fid, byte data[]) integer ntried, ngets ← 0 // # times tried and # times server_get returned success byte files[S][MAX_FILE_LENGTH] // array of files; an entry is a copy from a server integer index byte data[] for ntried from 0 to S do // Get file fid into buffer files[ntried] from server[ntried] status ← SERVER_GET (fid, files[ntried], server[ntried]) if status = success then ngets ← ngets + 1 if ngets ≥ B then // yes! have gotten data from B servers // PICK_MAJORITY () takes the array of files and magically knows which ones are // valid. It scans the ngets valid ones and returns an index in the files[] array // for one of the good copies, which corresponds to a version returned by more // than 50% of the servers. Otherwise, it returns –1. If ngets = 1, // PICK_MAJORITY () simply returns an index to that version. index ← PICK_MAJORITY (files, ngets); if index ≠ –1 then COPY (data, files[index]) // copy into data buffer return SUCCESS else return FAILURE return FAILURE // didn’t find B servers to SERVER_GET from

For questions 12.1 through 12.4, assume that operations execute serially (i.e., there is no concurrency). Assume also that the end-to-end protocol correctly handles all packet losses and delivers messages in to a recipient in the same order that the sender dispatched them. In other words, no operations are prevented from completing because of lost or reordered packets. However, servers may crash and subsequently recover.

Q12.1 Which reliability technique is the best description of the one being attempted by Alyssa and Ben?

A. Fail-safe design.
B. N-modular redundancy.
C. Pair-and-compare.
D. Temporal redundancy.

Q12.2 Which of the following combinations of $A$ and $B$ in the code above ensures that GET returns the results of the last successful PUT , as long as no servers fail? (Here, $ \lfloor x \rfloor$ is the largest integer $\leq x$, and $ \lceil y \rceil$ is the smallest integer $\geq y$. Thus $\lfloor 2.3 \rfloor = 2$, and $\lceil 2.3 \rceil = 3$. Remember also that $S > 2$.)

A. $A = 1 \quad\quad\quad\quad\quad\ B = S$
B. $A = \lceil S/3 \rceil \quad\quad\quad B = S$
C. $A = \lceil S/2 \rceil \quad\quad\quad B = S$
D. $A = \lceil (3S)/4 \rceil \quad\,\,\, B = \lfloor S+2 \rfloor + 1$
E. $A = S \quad\quad\quad\quad\,\,\,\ B = 1$

Q12.3 Suppose that the number of servers $S$ is an odd number larger than 2, and that the number of servers used for PUT is $A = S ⁄ 2$. If only PUT and no GET operations are done, how does the mean time to failure (MTTF) of the PUT operation change as $S$ increases? The PUT operation fails if the return value from PUT is FAILURE . Assume that the process that causes servers to fail is memoryless, and that no repairs are done.

A. As $S$ increases, there is more redundancy. So, the MTTF increases.
B. As $S$ increases, one still needs about one-half of the servers to be accessible for a successful PUT . So, the MTTF does not change with $S$.
C. As $S$ increases the MTTF decreases even though we have more servers in the system.
D. The MTTF is not a monotonic function of $S$; it first decreases and then increases.

Q12.4 Which of the following is true of ColdFusion's PUT and GET operations, for choices of $A$ and $B$ that guarantee that GET successfully returns the data from the last successful PUT when no servers fail?

A. A PUT that fails because some server was unavailable to it, done after a successful PUT , may cause subsequent GET attempts to fail, even if $B$ servers are available.
B. A failed PUT attempt done after a successful PUT cannot cause subsequent GET attempts to fail if $B$ servers are available.
C. A failed PUT attempt done after a successful PUT always causes subsequent GET attempts to fail, even if $B$ servers are available.
D. None of the above.

ColdFusion unveils their system for use on the Internet with $S$ = 15 servers, using $A = 2S/3$ and $B = 1 + 2S/3$. However, they find that the specifications are not always met—several times, GET does not return the data from the last PUT that returned SUCCESS .

In questions 12.5 through 12.8, assume that there may be concurrent operations.

Q12.5 Under which of these scenarios does ColdFusion always meet its specification (i.e., GET returns SUCCESS and the data corresponding to the last successful PUT )?

A. There is no scenario under which ColdFusion meets its specification for this choice of $A$ and $B$.
B. When a user PUT's data to a file with some fid , and at about the same time someone else PUT's different data to the same fid.
C. When a user PUT's a file successfully from her computer at home, drives to work and attempts to GET the file an hour later. In the meantime, no one performs any PUT operations to the same file, but three of the servers crash and are unavailable when she does her GET .
D. When the PUT of a file succeeds at some point in time, but some subsequent PUT's fail because some servers are unavailable, and then a GET is done to that file, which returns SUCCESS .

You tell Ben to pay attention to multisite coordination, and he implements his version of the two-phase commit protocol. Here, each server maintains a log containing READY (a new record he has invented), ABORT, and COMMIT records. The server always returns to a client the version of a file that last underwent COMMIT.

In Ben's protocol, when the client PUT's a file, the server returns SUCCESS or FAILURE as before. If it returns SUCCESS , the server appends a READY entry for this fid in its log. If the client sees that all the servers it asked returns SUCCESS , it sends a message asking them all to COMMIT . When a server receives this message, it writes a COMMIT entry in its log together with the file's fid . On the other hand, if even one of the servers returns FAILURE , the client sends a message to all the servers asking them to abort the operation, and each server writes an ABORT entry in its log. Finally, if a server gets a server put request for some fid that is in the READY state, it returns FAILURE to the requesting client.

Q12.6 Under which of these scenarios does ColdFusion always meet its specification (i.e., GET returns SUCCESS and the data corresponding to the last successful PUT )?

A. There is no scenario under which ColdFusion meets its specification for this choice of $A$ and $B$.
B. When a user PUT's data to a file with some fid , and at about the same time someone else PUT's different data to the same fid.
C. When a user PUT's a file successfully from her computer at home, drives to work and attempts to GET the file an hour later. In the meantime, no one performs any PUT operations to the same file, but three of the servers crash and are unavailable when she does her GET .
D. When the PUT of a file succeeds at some point in time and the corresponding COMMIT messages have reached the servers, but some subsequent PUTs fail because some servers are unavailable, and then a GET is done to that file, which returns SUCCESS .

Q12.7 When a server crashes and recovers, the original clients that initiated PUTs may be unreachable. This makes it hard for a server to know the status of its READY actions, since it cannot ask the clients that originated them. Assuming that no more than one server is unavailable at any time in the system, which of the following strategies allows a server to correctly learn the status of a past READY action when it recovers from a crash?

A. Contact any server that is up and running and call SERVER_GET with the file's fid ; if the server responds, change READY to COMMIT in the log.
B. Ask all the other servers that are up and running using server GET() with the file's fid ; if more than 50% of the other servers respond with identical data, just change READY to COMMIT .
C. Pretend to be a client and invoke GET with the file's fid; if GET is successful and the data returned is the same as what is at this server, just change READY to COMMIT .
D. None of the above.

To accommodate the possibility of users operating on entire directories at once, ColdFusion adds a two-phase locking protocol on individual files within a directory. Alyssa and Ben find that although this sometimes works, deadlocks do occur when a GET owns some locks that a PUT needs, and vice versa.

Q12.8 Ben analyzes the problem and comes up with several "solutions" (as usual). Which of his proposals will actually work, always preventing deadlocks from happening?

A. Ensure that the actions grab locks for individual files in increasing order of the fid of the file.
B. Ensure that no two actions grab locks for individual files in the same order.
C. Assign an incrementing timestamp to each action when it starts. If action A_i needs a lock owned by action A_j with a larger timestamp, abort action A_i and continue.
D. Assign an incrementing timestamp to each action when it starts. If action A_i needs a lock owned by action A_j with a smaller timestamp, abort action A_i and continue.

Problem Set 13: PigeonExpress.com

1999–3–5…14

(Chapter 3, based on Chapter 1)

After selling PigeonExpress!.com and taking a trip around the world, Ben Bitdiddle is planning his next start-up, AtomicPigeon!.com. AtomicPigeon improves over PigeonExpress by offering an atomic data delivery system.

Recall from problem set 2 that when sending a pigeon, Ben's software prints out a little header and writes a CD, both of which are given to the pigeon. The header contains the GPS coordinates of the sender and receiver, a type ( REQUEST or ACKNOWLEDGMENT ), and a sequence number:

structure header GPS source GPS destination integer type integer sequence_no

Ben starts with the code for the simple end-to-end protocol ( BEEP ) for PigeonExpress!.com. He makes a number of modifications to the sending and receiving code.

At the sender, Ben simplifies the code. The BEEP protocol transfers only a single CD:

shared next_sequence initially 0 // a globally shared sequence number.

procedure BEEP (target, CD[]) // send 1 CD to target header h // h is an instance of header. h.source ← MY_GPS // set source to my GPS coordinates h.destination ← target // set destination h.type ← REQUEST // this is a request message h.sequence_no ← next_sequence // set seq number // loop until we receive the corresponding ack, retransmitting if needed while h.sequence_no = next_sequence do send pigeon with h, CD // transmit wait 2,000 seconds

As before, pending and incoming acknowledgments are processed only when the sender is waiting:

procedure PROCESS_ACK (h) // process acknowledgment if h.sequence_no = sequence then // ack for current outstanding CD? next_sequence ← next_sequence + 1

Ben makes a small change to the code running on the receiving computer. He adds a variable expected_sequence at the receiver, which is used by PROCESS_REQUEST to filter duplicates:

integer expected_sequence initially 0 // duplicate filter.

procedure PROCESS_REQUEST (h, CD) // process request if h.sequence_no = expected_sequence then // the expected seq #? PROCESS (CD) // yes, process data expected_sequence ← expected_sequence + 1 // increase expectation h.destination ← h.source // send to where the pigeon came from h.source ← MY_GPS h.sequence_no ← h.sequence_no // unchanged h.type ← ACKNOWLEDGMENT; send pigeon with h // send an acknowledgment back

The assumptions for the pigeon network are the same as in problem set 2:

Some pigeons might get lost, but, if they arrive, they deliver data correctly (uncorrupted)
The network has one sender and one receiver
The sender and the receiver are single-threaded

Q13.1 Assume the sender and receiver do not fail (i.e., the only failures are that some pigeons may get lost). Does PROCESS in PROCESS_REQUEST process the value of CD exactly once?

A. Yes, since next_sequence is a nonce and the receiver processes data only when it sees a new nonce.
B. No, since next_sequence and expected_sequence may get out of sync because the receiver acknowledges requests even when it skips processing.
C. No, since if the acknowledgment isn't received within 2,000 seconds, the sender will send the same data again.
D. Yes, since pigeons with the same data are never retransmitted.

Ben's new goal is to provide atomicity, even in the presence of sender or receiver failures. The reason Ben is interested in providing atomicity is that he wants to use the pigeon network to provide P-commerce (something similar to E-commerce). He would like to write applications of the form:

procedure TRANSFER (amount, destination) WRITE (amount, CD) // write amount on a CD BEEP (destination, CD) // send amount

The amount always fits on a single CD.

If the sender or receiver fails, the failure is fail-fast. For now, let's assume that if the sender or receiver fails, it just stops and does not reboot; later, we will relax this constraint.

Q13.2 Given the current implementation of the BEEP protocol and assuming that only the sender may fail, what could happen during, say, the 100th call to TRANSFER ?

A. That TRANSFER might never succeed.
B. That TRANSFER might succeed.
C. PROCESS in PROCESS_REQUEST might process amount more than once.
D. PROCESS in PROCESS_REQUEST might process amount exactly once.

Ben's goal is to make transfer always succeed by allowing the sender to reboot and finish failed transfers. That is, after the sender fails, it clears volatile memory (including the nonce counter) and restarts the application. The application starts by running a recovery procedure, named RECOVER_SENDER , which retries a failed transfer if any exist.

To allow for restartable transfers, Ben supplies the sender and the receiver with durable storage that never fails. On the durable storage, Ben stores a log, in which each entry has the following form:

structure log_entry integer type // STARTED or COMMITTED integer sequence_no // a sequence number

The main objective of the sender's log is to allow RECOVER_SENDER to restore the value of next_sequence and to allow the application to restart an unfinished transfer, if any. Ben edits TRANSFER to use the log:

1 procedure TRANSFER (amount, destination) 2 WRITE (amount, CD) // write amount on a CD 3 ADD_LOG (STARTED, next_sequence) // append STARTED record 4 BEEP (destination, CD) // send amount (BEEP increases next_sequence) 5 ADD_LOG (COMMITTED, next_sequence – 1) // append COMMITTED record

ADD_LOG atomically appends a record to the log on durable storage. If ADD_LOG returns, the entry has been appended. Logs contain sufficient space for new records and they don't have to be garbage-collected.

Q13.3 Identify the line in this new version of TRANSFER that is the commit point.

Q13.4 How can the sender discover that a failure caused the transfer not to complete?

A. The log contains a STARTED record with no corresponding COMMITTED record.
B. The log contains a STARTED record with a corresponding ABORTED record.
C. The log contains a STARTED record with a corresponding COMMITTED record.
D. The log contains a COMMITTED record with no corresponding STARTED record.

Ben tries to write RECOVER_SENDER to recover next_sequence , but his editor crashes before committing the final editing and the expression in the if statement is missing, as indicated by a ? in the code below. Your job is to edit the code such that the correct expression is evaluated.

procedure RECOVER_SENDER () next_sequence ← 0 starting at end of log… for each entry in log do if ( ? ) then // What goes here? (See question 13.6) next_sequence ← (sequence_no of entry) + 1 break // terminate scan of log

Q13.5 After you edit RECOVER_SENDER for Ben, which of the following sequences could appear in the log? (The log records are represented as <type sequence_no>.

A. ..., <STARTED 1>, <COMMITTED 1>, <STARTED 2>, <COMMITTED 2>
B. ..., <STARTED 1>, <STARTED 1>, <COMMITTED 1>
C. ..., <STARTED 1>, <STARTED 1>, <STARTED 2>, <COMMITTED 1>, <STARTED 2>
D. ..., <STARTED 1>, <COMMITTED 1>, <COMMITTED 1>

Q13.6 What expression should replace the ? in the RECOVER_SENDER code above?

A. entry.type = COMMITTED
B. entry.type = STARTED
C. entry.type = ABORTED
D. FALSE

Q13.7 Given the current implementation of the BEEP protocol what could happen, say, during the 100th call to TRANSFER? (Remember only the sending computer may fail.)

A. If the sending computer keeps failing during recovery, that TRANSFER might never succeed.
B. That TRANSFER might succeed.
C. PROCESS in PROCESS_REQUEST might process amount more than once.
D. PROCESS in PROCESS_REQUEST might process amount exactly once.

Ben's next goal is to make PROCESS_REQUEST all-or-nothing. In the following questions, assume that whenever the receiving computer fails, it reboots, calls RECOVER_RECEIVER , and after RECOVER_RECEIVER is finished, it waits for messages and calls PROCESS_REQUEST on each message.

To make expected_sequence all-or-nothing, Ben tries to change the receiver in a way similar to the change he made to the sender. Again, his editor didn't commit all the changes in time. The missing code is marked by ? and #. The missing expression marked by ? evaluates the same expression as did the ? in RECOVER_SENDER . The new missing expressions are marked by # in PROCESS_REQUEST :

procedure RECOVER_RECEIVER () expected_sequence ← 0 starting at end of log… for each entry in log do if ( ? ) then // The expression of question 13.6 expected_sequence ← sequence_no of entry + 1 break // terminate scan of log

1 procedure PROCESS_REQUEST (h, CD) 2 if h.sequence_no = expected_sequence then // the expected seq #? 3 ADD_LOG (#, #) // See question 13.8. 4 PROCESS (CD) // yes, process data 5 expected_sequence ← expected_sequence + 1 // increase expectation 6 ADD_LOG (#, #) // See question 13.8. 7 h.destination ← source of h // send to where the pigeon came from 8 source of h.source ← MY_GPS 9 h.sequence_no ← h.sequence_no // unchanged 10 h.type ← ACKNOWLEDGMENT 11 send pigeon with h // send an acknowledgment back

As you can see from the code, Ben chose not to implement a write-ahead protocol because PROCESS is implemented by a third party, for example, a bank: PROCESS might be a call into the bank's transaction database system.

Q13.8 Complete the ADD_LOG calls on the lines 3 and 6 in PROCESS_REQUEST such that expected_sequence will be all-or-nothing.

3 ADD_LOG ( , ) 6 ADD_LOG ( , )

Q13.9 Can PROCESS in PROCESS_REQUEST be called multiple times for a particular call to TRANSFER ?

A. No, because expected_sequence is recovered and h.sequence_no is checked against it.
B. Yes, because failed transfers will be restarted and result in the acknowledgment being retransmitted.
C. Yes, because after 2,000 seconds a request will be retransmitted.
D. Yes, because the receiver may fail after PROCESS , but before it commits.

Q13.10 How should PROCESS, called by PROCESS_REQUEST, be implemented to guarantee exactly-once semantics for transfers? (Remember that the sender is persistent.)

A. As a normal procedure call;
B. As a remote procedure call;
C. As a nested transaction;
D. As a top-level transaction.

Problem Set 14: Sick Transit $^*$

$^*$ Credit for developing this problem set goes to Stephen A. Ward.

1997–3–6…15

(Chapter 3)

Gloria Mundi, who stopped reading the text before getting to Chapter 3, is undertaking to resurrect the failed London Ambulance Service as a new streamlined company called Sick Transit. She has built a new computer she intends to use for processing ST's activities.

A key component in Gloria's machine is a highly reliable sequential-access infinite tape, which she plans to use as an append-only log. Records can be appended to the tape, but once written are immutable and durable. Records on the tape can be read any number of times, from front-to-back or from back-to-front. There is no disk in the ST system; the tape is the only non-volatile storage.

Because of the high cost of the infinite tape, Gloria compromised on the quality of more conventional components like RAM and CPU, which fail frequently but fortunately are fail-fast: every error causes an immediate system crash. Gloria plans to ensure that, after a crash, a consistent state can be reconstructed from the log on the infinite tape.

Gloria's code uses transactions, each identified by a unique transaction ID. The visible effect of a completed transaction is confined to changes in global variables whose WRITE operations are logged. The log will contain entries recording the following operations:

BEGIN (tid) // start a new transaction, whose unique ID is tid COMMIT (tid) // commit a transaction ABORT (tid) // abort a transaction WRITE (tid, variable, old_value, new_value) // write a global variable, specifying previous & new values.

To keep the system simple, Gloria plans to use the above forms as the application-code interface, in addition to a READ (tid, variable) call which returns the current value of variable . Each of the calls will perform the indicated operation and write a log entry as appropriate. Reading an unwritten variable is to return ZERO .

Gloria begins by considering the single-threaded case (only one transaction is active at any time). She stores values of global variables in a table in RAM. Gloria is now trying to figure out how to reset variables to committed values following a crash, using the log tape.

Q14.1 In the single-threaded case, what value should variable v be restored to following a crash?

A. 37
B. new_value from the last logged WRITE (tid, v, old_value, new_value), or ZERO if unwritten
C. new_value from the last logged WRITE (tid, v, old_value, new_value) that is not followed by an ABORT (tid) , or ZERO if unwritten.
D. new_value from the last logged WRITE (tid, v, old_value, new_value) that is followed by a COMMIT (tid) , or ZERO otherwise.
E. Either old_value or new_value from the last logged WRITE (tid, v, old_value, new_value) , depending on whether that WRITE is followed by a COMMIT on the same tid , or ZERO if unwritten.

Gloria now tries running concurrent transactions on her system. Accesses to the log are serialized by the sequential-access tape drive.

Her first trial involves concurrent execution of these two transactions:

BEGIN (t1) t1x ← READ (x) t1y ← READ (y) WRITE (t1, x, t1x, t1y + 1) COMMIT (t1)

BEGIN (t2) t2x ← READ (x) t2y ← READ (y) WRITE (t2, y, t2y, t2x + 2) COMMIT (t2)

The initial values of x and y are ZERO , as are all uninitialized variables in her system. Here the READ primitive simply returns the most recently written value of a variable from the RAM table, ignoring COMMITs.

Q14.2 In the absence of locks or other synchronization mechanism, will the result necessarily correspond to some serial execution of the two transactions?

A. Yes.
B. No, since the execution might result in x = 3, y = 3.
C. No, since the execution might result in x = 1, y = 3.
D. No, since the execution might result in x = 1, y = 2.

Gloria is considering using locks, and automatically adding code to each transaction to guarantee before-or-after atomicity. She would like to maximize concurrency; she is, however, anxious to avoid deadlocks. For each of the following proposals, decide whether the approach (1) yields semantics consistent with before-or-after atomicity and (2) introduces potential deadlocks.

Q14.3 A single, global lock where ACQUIRE is at the start of each transaction and RELEASE occurs at COMMIT .

Q14.4 A lock for each variable. Every READ or WRITE operation is immediately surrounded by an ACQUIRE and RELEASE of that variable's lock.

Q14.5 A lock for each variable on which a transaction performs READ or WRITE , acquired immediately prior to the first reference to that variable in the transaction; all locks are released at COMMIT .

Q14.6 A lock for each variable on which a transaction performs READ or WRITE , acquired in alphabetical order, immediately following the BEGIN . All locks are released at COMMIT .

Q14.7 A lock for each variable on which a transaction performs WRITE , acquired, in alphabetical order, immediately following the BEGIN . All locks are released at COMMIT .

In the general case (concurrent transactions) Gloria would like to avoid having to read the entire log during crash recovery. She proposes periodically adding a CHECKPOINT entry to the log, and reading the log backwards from the end restoring committed values to RAM. The backwards scan should end as soon as committed values have been restored to all variables. Each CHECKPOINT entry in the log contains current values of all variables and a list of uncommitted transactions at the time of the CHECKPOINT .

Q14.8 What portion of the tape must be read to properly restore values committed at the time of the crash?

A. All of the tape; checkpoints don't help.
B. Enough to include the STARTED record from each transaction that was uncommitted at the time of the crash.
C. Enough to include the last CHECKPOINT , as well as the STARTED record from each transaction that was uncommitted at the time of the crash.
D. Enough to include the last CHECKPOINT , as well as the STARTED record from each transaction that was uncommitted at the time of the last checkpoint.

Simplicity Winns, Gloria's one-time classmate, observes that since global variable values can be reconstructed from the log their storage in RAM is redundant. She proposes eliminating the RAM as well as all of Gloria's proposed locks, and implementing a READ (tid, var) primitive which returns an appropriate value of var by examining the log.

Simplicity's plan is to implement READ so that each transaction a "sees" the values of global values at the time of BEGIN (a), as well as changes made within a . She quickly sketches an implementation of READ which she claims gives appropriate atomicity semantics:

procedure READ(tid, var) winners ← EMPTY // winners is a list. prior ← FALSE for each entry in log do if entry is STARTED (tid) then prior ← TRUE if entry is COMMITTED (Etid) and prior = TRUE then add Etid to winners if entry is WRITE (Etid, var, old_value, new_value) then if Etid = tid then return new_value if Etid is in winners then return new_value return 0

Gloria is a little dazed by Simplicity's quick synopsis, but thinks that Simplicity is likely correct. Gloria asks for your help in figuring out what Simplicity's algorithm actually does.

Q14.9 Suppose transaction t performs a READ of variable x but does not write it. Will each READ of x in t see the same value? If so, concisely describe the value returned by each READ ; if not, explain.

Q14.10 Does Simplicity's scheme REALLY offer transaction semantics yet avoid deadlocks?

A. Yup. Read it and weep, Gloria.
B. It doesn't introduce deadlocks, but doesn't guarantee before-or-after transaction semantics either.
C. It gives before-or-after transactions, but introduces possible deadlocks.
D. Simplicity's approach doesn't work even when there's no concurrency—it gives wrong answers.

Q14.11 The real motivation of the Sick Transit problem is a stupid pun. What does sic transit gloria mundi actually mean?

A. Thus passes a glorious Monday.
B. Thus passes the glory of the world.
C. Gloria threw up on the T Monday.
D. This bus for First Class and Coach.
E. This is the last straw! If I wanted to take @#%!*% Latin, I'd have gone to Oxford.

Problem Set 15: The Bank of Central Peoria, Limited

1998–3–7…14

(Chapter 3)

Ben Bitdiddle decides to go into business. He bids $1 at a Resolution Trust Corporation auction and becomes the owner of the Bank of Central Peoria, Limited (BCPL).

When he arrives at BCPL, Ben is shocked to learn that the only programmer who understood BCPL's database has left the company to work on new animation techniques for South Park. Hiring programmers is difficult in Peoria, so Ben decides to take over the database code himself.

Ben learns that an account is represented as a structure with the following information:

structure account integer account_id // account identification number integer balance // account balance

The BCPL system implements a standard transaction interface for accessing accounts:

tid ← BEGIN () // Starts a new transaction that will be identified as number tid balance ← READ (tid, account.account_id) // Returns the balance of an account WRITE (tid, account.account_id, newbalance) // Updates the balance of an account COMMIT (tid) // Makes the updates of transaction tid visible to other transactions

The BCPL system uses two disks, both accessed synchronously (i.e., GET and PUT operations on the disks won't return until the data is read from the disk or has safely been written to the disk, respectively). One disk contains nothing but the account balances, indexed by number. This disk is called the database disk. The other disk is called the log disk and exclusively stores, in chronological order, a sequence of records of the form:

structure logrecord integer op // WRITE, COMMIT, or END integer tid // transaction number integer account_id // account number integer new_balance // new balance for account “account”

where the meaning of each record is given by the op field:

op = WRITE // Update of an account to a new balance by transaction tid op = COMMITTED // Transaction tid’s updates are now visible to other transactions // and durable across crashes op = END // Transaction tid’s writes have all been installed on the database disk

For each active transaction, the BCPL system keeps a list in volatile memory called intentions containing pairs ( account_id , new_balance ). The implementation of READ is as follows:

procedure READ (tid, account_id) if account_id is in intentions of tid then pair ← last pair containing account_id from intentions of tid return pair.new_balance else GET account containing account_id from database return account.balance

A. Recovery

For this section, assume that there are no concurrent transactions.

Ben asks whether the database computer has ever crashed and learns that it crashed frequently due to intense sound vibrations from the jail next door. Ben decides he had better understand how recovery works in the BCPL system. He examines the implementation of the recovery procedure. He finds the following code:

1 procedure RECOVERY () 2 winners ← NULL 3 reading the log from oldest to newest, 4 for each record in log do 5 if record.op = COMMITTED then add record.tid to winners 6 if record.op = END then remove record.tid from winners 7 again reading the log from oldest to newest, 8 for each record in log do 9 if record.op = WRITE and record.tid is in winners then 10 INSTALL (record.new_balance in database for record.account_id 11 for each tid in winners do 12 LOG {END, tid}

Q15.1 What would happen if lines 11 and 12 were omitted?

A. The system might fail to recover correctly from the first crash that occurs.
B. The system would recover correctly from the first crash but the log would be corrupt so the system might fail to recover correctly from the second crash.
C. The system would recover correctly from multiple crashes but would have to do more work when recovering from the second and subsequent crashes.

Q15.2 For the RECOVERY and READ procedures to be correct, which of the following could be correct implementations of the COMMIT procedure?

A. procedure COMMIT (tid) {}

B. procedure COMMIT (tid) for each pair in tid.intentions do INSTALL (pair.new_balance in database for pair.account_id) tid.intentions ← NULL LOG {COMMITTED, tid}

C. procedure COMMIT (tid) LOG {END, tid} for each pair in tid.intentions do INSTALL (pair.new_balance in database for pair.account_id) tid.intentions ← NULL LOG {COMMITTED, tid}

D. procedure COMMIT (tid) LOG {COMMITTED, tid} for each pair in tid.intentions do INSTALL (pair.new_balance in database for pair.account_id) tid.intentions ← NULL LOG {END, tid}

Q15.3 For the RECOVERY and READ procedures to be correct, which of the following could be correct implementations of the WRITE procedure?

A. procedure WRITE (tid, account_id, new_balance) LOG {WRITE, tid, account_id, new_balance}

B. procedure WRITE (tid, account_id, new_balance) add the pair {account_id, new_balance} to tid.intentions LOG {WRITE, tid, account_id, new_balance}

C. procedure WRITE (tid, account_id, new_balance) LOG {WRITE, tid, account_id, new_balance} add the pair {account_id, new_balance} to tid.intentions

D. procedure WRITE (tid, account_id, new_balance) LOG {WRITE, tid, account_id, new_balance} add the pair {account_id, new_balance} to tid.intentions INSTALL new_balance in database for account_id

Ben is rather surprised to see there is no ABORT (tid) procedure that terminates a transaction and erases its database updates. He calls up the database developer, who says it should be easy to add. Ben figures he might as well add the feature now, and adds a new log record type ABORTED .

Q15.4 Which of the following could be correct implementations of the ABORT procedure? Assume that the RECOVERY procedure is changed correspondingly.

A. procedure ABORT (tid) tid.intentions ← NULL

B. procedure ABORT (tid) LOG {ABORTED, tid}

C. procedure ABORT (tid) LOG {ABORTED, tid} tid.intentions ← NULL

D. procedure ABORT (tid) tid.intentions ← NULL LOG {ABORTED, tid}

B. Buffer cache

In section B, again assume that there are no concurrent transactions.

BCPL is in intense competition with the nearby branch of Peoria Authorized Savings, Credit and Loan. BCPL's competitive edge is lower account fees. Ben decides to save the cost of upgrading the computer system hardware by adding a volatile memory buffer cache, which will make the database much more efficient on the current hardware. The buffer cache is used for GETs and PUTs to the database disk only; GETs and PUTs to the log disk remain write-through and synchronous.

The buffer cache uses an LRU replacement policy. Each account record on the database disk is cached or replaced separately. In other words, the cache block size, disk block size, and account record sizes are all identical.

Q15.5 Why will adding a buffer cache for the database disk make the system more efficient?

A. It is faster to copy from the buffer cache than to GET from the disk.
B. If common access patterns can be identified, performance can be improved by prefetching multiple account balances into the cache.
C. It reduces the total number of disk GETs when one transaction reads the same account balance multiple times without updating it.
D. It reduces the total number of disk GETs when multiple consecutive transactions read the same account balance.

Ben then makes a mistake. He reasons that the intentions list described in section A is now unnecessary, since the list just keeps in-memory copies of database data, which is the same thing done by the buffer cache. He deletes the intentions list code and modifies PUT so it updates the copy of the account balance in the buffer cache. He also modifies the system to delay writing the END record until all buffered accounts modified by that transaction have been written back to the database disk. Much to his horror, the next time the inmates next door try an escape and the resulting commotion causes the BCPL system to crash, the database does not recover to a consistent state.

Q15.6 What might have caused recovery to fail?

A. The system crashed when only some of the modifications made by a committed transaction had reached the database disk.
B. The LRU replacement policy updated the database disk with data modified by an uncommitted transaction, which later committed before the crash.
C. The LRU replacement policy updated the database disk with data modified by an uncommitted transaction, which failed to commit before the crash.
D. The LRU replacement policy updated the database disk with data modified by a committed transaction, which later completed before the crash.
E. The LRU replacement policy updated the database disk with data modified by a committed transaction, which did not complete before the crash.

C. Concurrency

Ben restores the intention-list code, deletes the buffer cache code and goes back to the simpler system described in section A. He is finally ready to investigate how the BCPL system manages concurrent transactions. He calls up the developer and she tells him that there is a lock stored in main memory for each account in the database, used by the CONCURRENT_BEGIN and CONCURRENT_COMMIT procedures. Since BCPL runs concurrent transactions, all its applications actually use these two procedures rather than the lower-level BEGIN and COMMIT procedures described earlier.

An application doing a concurrent transaction must declare the list of accounts it will use as an argument to the CONCURRENT_BEGIN procedure.

procedure CONCURRENT_BEGIN (account_list) do atomically for each account in account_list do ACQUIRE (account.lock) tid ← BEGIN () tid.account_list ← account_list return tid

procedure CONCURRENT_COMMIT (tid) COMMIT (tid) for each account in tid.account_list do RELEASE (account.lock)

Ben runs two transactions concurrently. Both transactions update account number 2:

tida ← CONCURRENT_BEGIN (MAKELIST (2)) tmpa ← READ (tida, 2) WRITE (tida, 2, tmpa + 1) CONCURRENT_COMMIT (tida)

tidb ← CONCURRENT_BEGIN (MAKELIST (2)) tmpb ← READ (tidb, 2) WRITE (tidb, 2, tmpb + 2) CONCURRENT_COMMIT (tidb)

MAKELIST creates a list from its arguments; in this case the list has just one element. The initial balance of account 2 before these transactions start is 0.

Q15.7 What possible values can account 2 have after completing these two transactions (assuming no crashes)?

A. 0
B. 1
C. 2
D. 3
E. 4

Ben is surprised by the order of the operations in CONCURRENT_COMMIT , since COMMIT is expensive (requiring synchronous writes to the log disk). It would be faster to release the locks first.

Q15.8 If the initial balance of account 2 is zero, what possible values can account 2 have after completing these two transactions (assuming no crashes) if the locks are released before the call to COMMIT ?

A. 0
B. 1
C. 2
D. 3
E. 4

Problem Set 16: Whisks$^*$

$^*$ Credit for developing this problem set goes to Eddie Kohler.

1996–3–4a…d

(Chapter 3)

The Odd Disk Company (ODC) has just invented a new kind of non-volatile storage, the Whisk. A Whisk is unlike a disk in the following ways:

Compared with disks, Whisks have very low read and write latencies.
On the other hand, the data rate when reading and writing a Whisk is much less than that of a disk.
Whisks are associative. Where disks use sector addresses, a Whisk block is named with a pair of items: an address and a tag. We write these pairs as $A/t$, where $A$ is the address and $t$ is the tag. Thus, for example, there might be three blocks on the Whisk with address 49, each with different tags: 49/1, 49/2, and 49/97.

The Whisk provides four important operations:

data ← GET (A/t): This is the normal read operation.
PUT (A/t, data): Just like a normal disk. If the system crashes during a write operation, a partially written block may result.
boolean ← EXISTS (A/t): Returns TRUE if block A/t exists on the Whisk.
CHANGE_TAG (A/m, n): Atomically changes the tag m of block A/m to n (deleting any previous block A/n in the process). The atomicity includes both all-or-nothing atomicity and before-or-after atomicity.

Ben Bitdiddle is excited about the properties of Whisks. Help him develop different storage systems using Whisks as the medium.

Q16.1 At first, Ben emulates a normal disk by writing all blocks with tag 0. But now he wants to add an ATOMIC_PUT operation. Design an ATOMIC_PUT for Ben's Whisk, and identify the step that is the commit point.

Ben has started work on a Whisk transaction system; he'd like you to help him finish it. Looking through his notes, you see that Ben's system will use no caches or logs: all writes go straight to the Whisk. One sentence particularly catches your eye: a joyfully scrawled "Transaction IDs Are Tags!!" Ben's basic idea is this. The current state of the database will be stored in blocks with tag 0. When a transaction t writes a block, the data is stored in the separate block A/t until the transaction commits.

Ben has set aside a special disk address, ComRec, to hold commit records for all running transactions. For a transaction t , the contents of ComRec/t is either committed, aborted, or pending, depending on the state of transaction t .

So far, three procedures have been implemented. In these programs, t is a transaction ID, A is a Whisk block address, and data is a data block.

procedure AA_BEGIN (t) PUT (ComRec/t, PENDING) procedure AA_READ (t, A) if EXISTS (A/t) then return GET (A/t) else // uninitialized! return GET (A/0) procedure AA_WRITE (t, A, data) PUT (A/t, data)

The following questions are concerned only with all-or-nothing atomicity; there are no concurrent transactions.

Q16.2 Write pseudocode for AA_COMMIT (t) and AA_ABORT (t) , and identify the commit point in AA_COMMIT . Assume that the variable dirty , an array with num_dirty elements, holds all the addresses to which t has written. (Don't worry about any garbage an aborted transaction might leave on disk, and assume transaction IDs are never reused.)

Q16.3 Write the pseudocode for AA_RECOVER , the program that handles recovery after a crash. Ben has already done some of the work: his code examines the ComRec blocks and determines which transactions are COMMITTED , ABORTED , or PENDING . When your pseudocode is called, he has already set 6 variables for you (you might not need them all):

num_committed // the number of committed transactions committed[i] // an array holding the committed transactions’ IDs num_aborted // the number of aborted transactions aborted[i] // an array holding the aborted transactions’ IDs num_pending // the number of transactions in progress in_progress[i] // an array holding the in-progress transactions’ IDs

Whisk addresses run from 0 to N.

Problem Set 17: ANTS (Advanced "Nonce-ensical Transcation System)$^*$

$^*$ Credit for developing this problem set goes to Hari Balakrishnan.

2002–3–6…13

(Chapter 3)

Sara Bellum, forever searching for elegance, sets out to design a new transaction system called ANTS, based on the idea of nonces. She observes that the locking schemes she learned in Chapter 3 cause transactions to wait for locks held by other transactions. She observes that it is possible for a transaction to simply abort and retry, instead of waiting for a lock. A little bit more work convinces her that this idea may allow her to design a system in which transactions don't need to use locks for before-or-after atomicity.

Sara sets out to write pseudocode for the following operations: BEGIN () , READ () , WRITE () , COMMIT () , ABORT () , and RECOVER () . She intends that, together, these operations will provide transaction semantics: before-or-after atomicity, all-or-nothing atomicity, and durability. You may assume that once any of these operations starts, it runs to completion without preemption or failure, and that no other thread is running any of the procedures at the same time. The system may interleave the execution of multiple transactions, however.

Sara's implementation assigns a transaction identifier (TID) to a transaction when it calls BEGIN () . The TIDs are integers, and ANTS assigns them in numerically increasing order. Sara's plan for the transaction system's storage is to maintain cell storage for variables, and a write-ahead log for recovery. Sara implements both the cell storage and the log using non-volatile storage. The log contains the following types of records:

STARTED TID
COMMITTED TID
ABORTED TID
UPDATED TID, Variable Name, Old Value

Sara implements BEGIN () , COMMIT () , ABORT () , and RECOVER () as follows:

BEGIN () allocates the next TID , appends a STARTED record to the log, and returns the TID.
COMMIT () appends a COMMITTED record to the log and returns.
ABORT (TID) undoes all of transaction TID 's WRITE () operations by scanning the log backwards and writing the old values from the transaction's UPDATED records back to the cell storage. After completing the undo, ABORT (TID) appends an ABORTED entry to the log, and returns.
RECOVER () is called after a crash and restart, before starting any more transactions. It scans the log backwards, undoing each WRITE record of each transaction that had neither committed nor aborted at the time of the crash. RECOVER () appends one ABORTED record to the log for each such transaction.

Sara's before-or-after intention is that the result of executing multiple transactions concurrently is the same as executing those same transactions one at a time, in increasing TID order. Sara wants her READ () and WRITE () implementations to provide before-or-after atomicity by adhering to the following rule:

Suppose a transaction with TID t executes READ (X) . Let u be the highest TID less than t that calls WRITE (X) and commits. The READ (X) executed by t should return the value that u writes.

Sara observes that this rule does not require her system to execute transactions in strict TID order. For example, the fact that two transactions call READ () on the same variable does not (by itself) constrain the order in which the transactions must execute.

To see how Sara intends ANTS to work, consider the following two transactions:

	Transaction T_A	Transaction T_B
`1`	`tid_a ← BEGIN () // returns 15`
`2`		`tid_b ← BEGIN () // returns 16`
`3`	`v_a ← READ (tid_a, X)`
`4`	`v_a ← v_a + 1`
`5`		`v_b ← READ (tid_b, X)`
`6`		`v_b ← v_b + 1`
`α`	`WRITE (tid_a, X, v_a)`	`WRITE (tid_b, X, v_b)`
`β`	`COMMIT (tid_a)`	`COMMIT (tid_b)`

Each transaction marks its start with a call to BEGIN , then reads the variable X from the cell store and stores it in a local variable, then adds one to that local variable, then writes the local variable to X in the cell store, and then commits. Each transaction passes its TID ( tid_a and tid_b respectively) to the READ , WRITE , and COMMIT procedures.

These transactions both read and write the same piece of data, X . Suppose that TA starts just before TB , and Sara's BEGIN allocates TIDs 15 and 16 to T_A and T_B, respectively. Suppose that ANTS interleaves the execution of the transactions as shown through line 6, but that ANTS has not yet executed lines α and β . No other transactions are executing, and no failures occur.

Q17.1 In this situation, which of the following actions can ANTS take in order to ensure before-or-after atomicity?

A. Force just T_A to abort, and let T_B proceed.
B. Force just T_B to abort, and let T_A proceed.
C. Force neither T_A nor T_B to abort, and let both proceed.
D. Force both T_A and T_B to abort.

To help enforce the before-or-after intention, Sara's implementation of ANTS maintains the following two pieces of information for each variable:

ReadID — the TID of the highest-numbered transaction that has successfully read this variable using READ .
WriteID — the TID of the highest-numbered transaction that has successfully written this variable using WRITE .

Sara defines the following utility procedures in her implementation of ANTS:

INPROGRESS (TID) returns FALSE if TID has committed or aborted, and otherwise TRUE . (All transactions interrupted by a crash are aborted by the RECOVER procedure.)
EXIT () terminates the current thread immediately.
LOG () appends a record to the log and waits for the write to the log to complete.
READ_DATA (x) reads cell storage and returns the corresponding value.
WRITE_DATA (x, v) writes value v into cell storage x .

Sara now sets out to write pseudocode for READ and WRITE :

1 procedure READ (tid, x) // Return the value stored in cell x 2 if tid < x.WriteID then 3 ABORT (tid) 4 EXIT () 5 if tid > x.WriteID and INPROGRESS (x.WriteID) then 6 ABORT (tid) // Last transaction to have written x is still in progress 7 EXIT () 8 v ← READ_DATA (x) // In all other cases execute the read 9 x.ReadID ← MAX (tid, x.ReadID) // Update ReadID of x 10 return v

11 procedure WRITE (tid, x, v) // Store value v in cell storage x 12 if tid < x.ReadID then 13 ABORT (tid) 14 EXIT () 15 else if tid < x.WriteID then 16 [Mystery Statement I] // See question 17.3 17 else if tid > x.WriteID and INPROGRESS(x.WriteID) then 18 ABORT (tid) 19 EXIT () 20 LOG (WRITE, tid, x, READ_DATA (x)) 21 WRITE_DATA (x, v) 22 [Mystery Statement II] // Now update ReadID of x (see question 17.5)

Help Sara complete the design above by answering the following questions.

Q17.2 Consider lines 5–7 of READ. Sara is not sure if these lines are necessary. If lines 5–7 are removed, will the implementation preserve Sara's before-or-after intention?

A. Yes, the lines can be removed. Because the previous WRITE to x , by the transaction identified by x.WriteID , cannot be affected by transaction tid , READ_DATA (x) can safely execute.
B. Yes, the lines can be removed. Suppose transaction T₁ successfully executes WRITE (x) , and then transaction T₂ executes READ (x) before T₁ commits. After this, T₁ cannot execute WRITE (x) successfully, so T₂ would have correctly read the last written value of x from T₁.
C. No, the lines cannot be removed. One reason is: The only transaction that can correctly execute READ_DATA (x) is the transaction with TID equal to x.WriteID . Therefore, the condition on line 5 of READ should simply read: "if tid > x.WriteID".
D. No, the lines cannot be removed. One reason is that before-or-after atomicity might not be preserved when transactions abort.

Q17.3 Consider Mystery Statement I on line 16 of WRITE . Which of the following operations for this statement preserves Sara's before-or-after intention?

A. ABORT (tid); EXIT ();
B. return (without aborting tid )
C. Find the higher-numbered transaction T_h corresponding to x.WriteID ; ABORT (T_h) and terminate the thread that was running T_h; perform WRITE_DATA (x, v) in transaction tid ; and return.
D. All of the above choices.

Q17.4 Consider lines 17–19 of WRITE . Sara is not sure if these lines are necessary. If lines 17–19 are removed, will Sara's implementation preserve her before-or-after intention? Why or why not?

A. Yes, the lines can be removed. We can always recover the correct values from the log.
B. Yes, the lines can be removed since this is the WRITE call; it's only on a READ call that we need to be worried about the partial results from a previous transaction being visible to another running transaction.
C. No, the lines cannot be removed. One reason is: If transaction T₁ writes to cell x and then transaction T₂ writes to cell x , then an abort of T₂ followed by an abort of T₁ may leave x in an incorrect state.
D. No, the lines cannot be removed. One reason is: If transaction T₁ writes to cell x and then transaction T₂ writes to cell x , then an abort of T₁ followed by an abort of T₂ may leave x in an incorrect state.

Q17.5 Which of these operations for Mystery Statement II on line 22 of WRITE preserves Sara's before-or-after intention?

A. (x.WriteID) ← tid
B. (x.WriteID) ← MIN(x.WriteID, tid)
C. (x.WriteID) ← MAX(x.WriteID, tid)
D. (x.WriteID) ← MAX(x.WriteID, x.ReadID)

Ben Bitdiddle looks at the READ and WRITE pseudocode shown above for Sara's system and concludes that her system is in fact nonsensical! To make his case, he constructs the following concurrent transactions:

	Transaction T₁	Transaction T₂
`1`	`tid₁ ← BEGIN ()`
`2`		`tid₂ ← BEGIN ()`
`3`	`WRITE (tid₁, A, v₁)`
`4`		`v₂ ← READ (tid₂, A)`
`5`		`WRITE (tid₂, B, v₂)`
`6`		`COMMIT (tid₂)`
`7`	`v₁ ← READ (tid₁, B)`
`8`	`COMMIT (tid₁)`

The two transactions are interleaved in the order shown above. Note that T₁ begins before T₂. Ben argues that this leads to a deadlock.

Q17.6 Why is Ben's argument incorrect?

A. Both transactions will abort, but they can both retry if they like.
B. Only T₂ will abort on line 4. So, T₁ can proceed.
C. Only T₁ will abort on line 7. So, T₂ can proceed.
D. Sara's system does not suffer from deadlocks, though concurrent transactions may repeatedly abort and never commit.

Recall that Sara uses a write-ahead log for crash recovery.

Q17.7 Which of these statements is true about log entries in Sara's ANTS implementation?

A. The order of STARTED entries in the log is in increasing TID order.
B. The order of COMMITTED entries in the log is in increasing TID order.
C. The order of ABORTED entries in the log is in increasing TID order.
D. The order of UPDATED entries in the log for any given variable is in increasing TID order.

Q17.8 The WRITE procedure appends the UPDATED record to the log before it installs in cell storage. Sara wants to improve performance by caching the non-volatile cell storage in the volatile main memory. She changes READ_DATA to read the value from the cache if it is there; if it isn't, READ_DATA reads from non-volatile cell storage. She changes WRITE_DATA to update just the cache; ANTS will install to non-volatile cell storage later. Can ANTS delay the install to non-volatile cell storage until after the COMMITTED record has been written to the log, and still ensure transaction semantics?

A. No, because if the system crashed between the COMMIT and the write to non-volatile storage, RECOVER would not recover cell storage correctly.
B. Yes, because the log contains enough information to undo uncommitted transactions after a crash.
C. Yes, because line 3 of READ won't let another transaction read the data until after the write to non-volatile storage completes.
D. None of the above.

Problem Set 18: KeyDB$^*$

$^*$ Credit for developing this problem set goes to Hari Balakrishnan.

2005–3–13

(Chapter 3)

Keys–R–Us has contracted with you to implement an in-memory key-value transactional store named KeyDB. KeyDB provides a hash table interface to store key-value bindings and to retrieve the value previously associated with a key.

You decide to use locks to provide before-or-after atomicity. Lock L_k is a lock for key k , which corresponds to the entry KeyDB[k] . A single transaction may read or write multiple KeyDB entries. Your goal is to achieve correct before-or-after atomicity for all transactions that use KeyDB. Transactions may abort. ACQUIRE (L_k) is called before the first READ or WRITE to KeyDB[k] and RELEASE (L_k) is called after the last access to KeyDB[k] .

Q18.1 For each of the following locking rules, is the rule is necessary, sufficient, or neither necessary nor sufficient to always guarantee correct before-or-after atomicity between any set of concurrent transactions?

A. An ACQUIRE (L_k) must be performed after the start of a transaction and before the first READ or WRITE of KeyDB[k] , and a RELEASE (L_k) must be performed some time after the last READ or WRITE of KeyDB[k] and before the end of the transaction.
B. ACQUIREs of every needed lock must occur after the start of a transaction and before any other operation, and there can be no RELEASE of a lock before COMMIT or ABORT if the corresponding data item was modified by the thread.
C. ACQUIREs of every needed lock must occur after the start of a transaction and before the first RELEASE , and there can be no RELEASE of a lock before COMMIT or ABORT if the corresponding data item was modified by the thread.
D. All threads that ACQUIRE more than one lock must ACQUIRE the locks in the same order, and there may be no RELEASEs of locks before COMMIT or ABORT .
E. ACQUIREs of every needed lock must occur after the start of a transaction and before the first RELEASE , and a lock may be RELEASEd at at any time after the last READ or WRITE of the corresponding data before COMMIT or ABORT .

Q18.2 Determine whether each of the following locking rules either avoids or is likely (with probability approaching 1 as time goes to infinity) to eliminate permanent deadlock between any set of concurrent transactions.

A. ACQUIREs of every needed lock must occur after the start of a transaction and before any other operation, and there can be no RELEASE of a lock before COMMIT or ABORT .
B. ACQUIREs of every needed lock must occur after the start of a transaction and before the first RELEASE , and there can be no RELEASE of a lock before COMMIT or ABORT .
C. All threads that ACQUIRE more than one lock must ACQUIRE the locks in the same order.
D. When a transaction begins, set a timer to a value longer than the transaction is expected to take. If the timer expires, ABORT the transaction and try it again with a timer set to a value chosen with random exponential backoff.

Problem Set 19: Alice's Reliable Book Store$^*$

$^*$ Credit for developing this problem set goes to Barbara Liskov

2006–3–9

(Chapter 3)

Alice has implemented a version of ALL_OR_NOTHING_PUT and ALL_OR_NOTHING_GET using only two copies, based an idea she got by reading Section 3.8.1. Her implementation appears below. In her implementation each virtual all-or-nothing sector x is stored at two disk locations, x.D0 and x.D1 , which are updated and read as follows:

// Write the bits in data at item x 1 procedure ALL_OR_NOTHING_PUT (data, x) 2 flag ← CAREFUL_GET (buffer, x.D0); // read into a temporary buffer 3 if flag = OK then 4 CAREFUL_PUT (data, x.D1); 5 CAREFUL_PUT (data, x.D0); 6 else 7 CAREFUL_PUT (data, x.D0); 8 CAREFUL_PUT (data, x.D1);

// Read the bits of item x and return them in data 1 procedure ALL_OR_NOTHING_GET (reference data, x) 2 flag ← CAREFUL_GET (data, x.D0); 3 if flag = OK then 4 return; 5 CAREFUL_GET (data, x.D1);

The CAREFUL_GET and CAREFUL_PUT procedures are as specified in Section 2.6.4.5 and Figure $2.6.1$. The property of these two procedures that is relevant is that CAREFUL_GET can detect cases when the original data is damaged by a system crash during CAREFUL_PUT .

Assume that the only failure to be considered is a fail-stop failure of the system during the execution of ALL_OR_NOTHING_GET or ALL_OR_NOTHING_PUT . After a fail-stop failure the system restarts.

Q19.1 Which of the following statements are true and which are false for Alice's implementation of ALL_OR_NOTHING_PUT and ALL_OR_NOTHING_GET ?

A. Her code obeys the rule "never overwrite the only copy."
B. ALL_OR_NOTHING_PUT and ALL_OR_NOTHING_GET ensure that if just one of the two copies is good (i.e., CAREFUL_GET will succeed for one of the two copies), the caller of ALL_OR_NOTHING_GET will see it.
C. ALL_OR_NOTHING_PUT and ALL_OR_NOTHING_GET ensure that the caller will always see the result of the last ALL_OR_NOTHING_PUT that wrote at least one copy to disk.

Q19.2 Suppose that when ALL_OR_NOTHING_PUT starts running, the copy at x.D0 is good. Which statement's completion is the commit point of ALL_OR_NOTHING_PUT ?

Q19.3 Suppose that when ALL_OR_NOTHING_PUT starts running, the copy at x.D0 is bad. Which statement's completion is the commit point of ALL_OR_NOTHING_PUT ?

Consider the following chart showing possible states that the data could be in prior running ALL_OR_NOTHING_PUT :

	State 1	State 2	State 3
`x.D0`	old	old	bad
`x.D1`	old	bad	old

For example, when the system is in state 2, x.D0 contains an old value and x.D1 contains a bad value, meaning that CAREFUL_GET will return an error if someone tries to read x.D1 .

Q19.4 Assume that ALL_OR_NOTHING_PUT is attempting to store a new value into item x and the system fails. Which of the following statements are true?

A. ( x.D0 = new, x.D1 = new) is a potential outcome of ALL_OR_NOTHING_PUT , starting in any of the three states.
B. Starting in state S1, a possible outcome is ( x.D0 = bad, x.D1 = old).
C. Starting in state S2, a possible outcome is ( x.D0 = bad, x.D1 = new).
D. Starting in state S3, a possible outcome is ( x.D0 = old, x.D1 = new).
E. Starting in state S1, a possible outcome is ( x.D0 = old, x.D1 = new).

Ben Bitdiddle proposes a simpler version of ALL_OR_NOTHING_PUT . His simpler version, named SIMPLE_PUT , would be used with the existing ALL_OR_NOTHING_GET .

procedure SIMPLE_PUT (data, x) CAREFUL_PUT (data, x.D0) CAREFUL_PUT (data, x.D1)

Q19.5 Will the system work correctly if Ben replaces ALL_OR_NOTHING_PUT with SIMPLE_PUT ? Explain.

Q19.6 Now consider failures other than system failures while running the original ALL_OR_NOTHING_PUT . Which of the following statements is true and which false?

A. Suppose x.D0 and x.D1 are stored on different disks. Then ALL_OR_NOTHING_PUT and ALL_OR_NOTHING_GET also mask a single head crash (i.e., the disk head hits the surface of a spinning platter), assuming no other failures.
B. Suppose x.D0 and x.D1 are stored as different sectors on the same track. Then ALL_OR_NOTHING_PUT and ALL_OR_NOTHING_GET also mask a single head crash, assuming no other failures.
C. Suppose that the failure is that the operating system overwrites the in-memory copy of the data being written to disk by ALL_OR_NOTHING_PUT . Nevertheless, ALL_OR_NOTHING_PUT and ALL_OR_NOTHING_GET mask this failure, assuming no other failures.

Now consider how to handle decay failures. The approach is to periodically correct them by running a SALVAGE routine. This routine checks each replicated item periodically and if one of the two copies is bad, it overwrites that copy with the good copy. The code for SALVAGE is in Figure 3.8.1.

Assume that there is a decay interval $D$ such that at least one copy of a duplicated sector will probably still be good $D$ seconds after the last execution of ALL_OR_NOTHING_PUT or SALVAGE on that duplicated sector. Further assume that the system recovers from a failure in less than $F$ seconds, where $F << D$, and that system failures happen so infrequently that it is unlikely that more than one will happen in a period of $D$ seconds.

Q19.7 Which of the following methods ensures that the approach handles decay failures with very high probability?

A. SALVAGE runs only in a background thread that cycles through the disk with the guarantee that each replicated sector is salvaged every $P$ seconds, where $P$ is less than $D - F$.
B. SALVAGE runs as the first step of ALL_OR_NOTHING_PUT , and only then.
C. SALVAGE runs as the first step of both ALL_OR_NOTHING_PUT and ALL_OR_NOTHING_GET , and only then.
D. SALVAGE runs on all duplicated sectors as part of recovering from a fail-stop failure, and only then.

Problem Set 20: Establishing Serializability$^*$

$^*$ Credit for developing this problem set goes to Barbara Liskov.

2006–3–17

(Chapter 3)

Chapter 3 explained that one technique of ensuring correctness is to serialize concurrent transactions that act on shared variables, and it offered methods such as version histories or two-phase locking to ensure serialization. Louis Reasoner has come up with his own locking scheme that does not have an easy proof of correctness, and he wants to know whether or not it actually leads to correct results. Louis implements his locking scheme, runs a particular set of three transactions two different times, and observes the order in which individual actions of the transactions occur. The observed order is known as a schedule.

Here are Louis's three transactions:

T₁: BEGIN (); WRITE (x); READ (y); WRITE (z); COMMIT ();
T₂: BEGIN (); READ (x); WRITE (z); COMMIT ();
T₃: BEGIN (); READ (z); WRITE (y); COMMIT ();

The records x , y and z are stored on disk. Louis's first run produces schedule 1 and his second run produces schedule 2:

	Schedule 1	Schedule 2
`1`	`T₁: WRITE (x)`	`T₃: READ (z)`
`2`	`T₂: READ (x)`	`T₂: READ (x)`
`3`	`T₁: READ (y)`	`T₁: WRITE (x)`
`4`	`T₃: READ (z)`	`T₃: WRITE (y)`
`5`	`T₃: WRITE (y)`	`T₁: READ (y)`
`6`	`T₂: WRITE (z)`	`T₂: WRITE (z)`
`7`	`T₁: WRITE (z)`	`T₁: WRITE (z)`

The question Louis needs to answer is whether or not these two schedules can be serialized. One way to establish serializability is to create what is called an action graph. An action graph contains one node for each transaction and an arrow (directed edge) from T_i to T_j if T_i and T_j both use the same record r in conflicting modes (that is, both transactions write r or one writes r before the other reads r ) and T_i uses r first. If for a particular schedule there is a cycle in its action graph, that schedule is not serializable. If there is no cycle, then the arrows reveal a serialization of those transactions.

Q20.1 The table below lists all of the possible arrows that might lead from one transaction to another. For schedule 1, fill in the table, showing whether or not that arrow exists, and if so list the two steps that create that arrow. To get you started, one row is filled in. Also, draw the arrows in Figure $PS.20.1$.

Arrow	Exists?	Steps
T1 → T2
T1 → T3
T2 → T1
T2 → T3
T3 → T1
T3 → T2	Yes	4 and 6

A horizontal line of 3 circles represents 3 transactions: from left to right, they are labeled T1, T2, and T3. An arrow that is labeled "4 and 6" points from T3 to T2.

Figure $\PS.20.1$: Answer template for Q20.1.

Q20.2 Is schedule 1 serializable? If not, explain briefly why not. If so, give a serial schedule for it.

Q20.3 Now fill in the table for schedule 2. This time you get to fill in the whole table yourself. Also, draw in the arrows in Figure $PS.20.2$.

Arrow	Exists?	Steps
T1 → T2
T1 → T3
T2 → T1
T2 → T3
T3 → T1
T3 → T2

A horizontal line of 3 circles represents 3 transactions: from left to right, they are labeled T1, T2, and T3.

Figure $PS.20.2$: Answer template for Q20.2.

Q20.4 Is schedule 2 serializable? If not, explain why not. If so, give a serial schedule for it.

Q20.5 Could schedule 2 have been produced by two-phase locking, in which a transaction acquires a lock on an object as the first part of the step in which it first uses that object? For example, step 3 of schedule 2 is the first time that transaction T1 uses record x , so it would start that step by acquiring a lock for x . Explain.

Louis is also concerned about recovery. When he ran the three transactions and obtained schedule 2, he found that the system generated the following log:

1 BEGIN (transaction: T1) 2 BEGIN (transaction: T2) 3 BEGIN (transaction: T3) 4 CHANGE (transaction: T1, record: x, undo: 1, redo: 2) 5 CHANGE (transaction: T3, record: y, undo: 1, redo: 2) 6 CHANGE (transaction: T2, record: z, undo: 1, redo: 2) 7 COMMIT (transaction: T3) 8 COMMIT (transaction: T2) 9 CHANGE (transaction: T1, record: z, undo: 2, redo: 3) 10 COMMIT (transaction: T1)

In a CHANGE record the undo field gives the old value before this change, and the redo field gives the new value afterwards. For example, entry 4 indicates that the old value of x was 1 and the new value is 2. The system uses the redo/undo recovery procedure of Figure 3.4.7.

Q20.6 Suppose the system crashed after record 7 of the log has made it to disk but before record 8 is written. What states do x , y , and z have after recovery is complete?

Q20.7 Suppose instead that the system crashed after record 9 of the log has made it to disk but before record 10 is written. What states do x , y , and z have after recovery is complete?

Louis's database consists of a collection of integer objects stored on disk. Each WRITE operation increments by 1 the object being modified. The system is using a write-ahead logging protocol and there is an in-memory cache that the system periodically flushes to disk, without checking to see if the cached objects belong to committed transactions.

To save space in the log, Louis's friend Ben Bitdiddle suggests that CHANGE records could just indicate the operation that was performed. For example, log entry 4 would be:

4 CHANGE (transaction: T1, record: x, operation: increment)

When the recovery manager sees this entry, it performs the specified operation: increment x by 1. Ben makes no other changes to the recovery protocol.

Q20.8 All objects are initialized to 0. Louis tries Ben's plan, but after the first system crash and recovery he discovers that it doesn't work. Explain why.