6.1: Building Reliable Transport - Stop-and-Wait
- Page ID
Retransmit-on-timeout generally requires sequence numbering for the packets, though if a network path is guaranteed not to reorder packets then it is safe to allow the sequence numbers to wrap around surprisingly quickly (for stop-and-wait, a single-bit sequence number will work; see exercise 8.5). However, as the no-reordering hypothesis does not apply to the Internet at large, we will assume conventional numbering. Data[N] will be the Nth data packet, acknowledged by ACK[N].
In the stop-and-wait version of retransmit-on-timeout, the sender sends only one outstanding packet at a time. If there is no response, the packet may be retransmitted, but the sender does not send Data[N+1] until it has received ACK[N]. Of course, the receiving side will not send ACK[N] until it has received Data[N]; each side has only one packet in play at a time. In the absence of packet loss, this leads to the following:
6.1.1 Packet Loss
Lost packets, however, are a reality. The left half of the diagram below illustrates a lost Data packet, where the sender is the host sending Data and the Receiver is the host sending ACKs. The receiver is not aware of the loss; it sees Data[N] as simply slow to arrive.
The right half of the diagram, by comparison, illustrates the case of a lost ACK. The receiver has received a duplicate Data[N]. We have assumed here that the receiver has implemented a retransmit-on-duplicate strategy, and so its response upon receipt of the duplicate Data[N] is to retransmit ACK[N].
As a final example, note that it is possible for ACK[N] to have been delayed (or, similarly, for the first Data[N] to have been delayed) longer than the timeout interval. Not every packet that times out is actually lost!
In this case we see that, after sending Data[N], receiving a delayed ACK[N] (rather than the expected ACK[N+1]) must be considered a normal event.
In principle, either side can implement retransmit-on-timeout if nothing is received. Either side can also implement retransmit-on-duplicate; this was done by the receiver in the second example above but not by the sender in the third example (the sender received a second ACK[N] but did not retransmit Data[N+1]).
At least one side must implement retransmit-on-timeout; otherwise a lost packet leads to deadlock as the sender and the receiver both wait forever. The other side must implement at least one of retransmit-on-duplicate or retransmit-on-timeout; usually the former alone. If both sides implement retransmit-on-timeout with different timeout values, generally the protocol will still work.
6.1.2 Sorcerer’s Apprentice Bug
A strange thing happens if one side implements retransmit-on-timeout but both sides implement retransmit-on-duplicate, as can happen if the implementer takes the naive view that retransmitting on duplicates is “safer”; the moral here is that too much redundancy can be the Wrong Thing. Let us imagine that an implementation uses this strategy (with the sender retransmitting on timeouts), and that the initial ACK is delayed until after Data is retransmitted on timeout. In the following diagram, the only packet retransmitted due to timeout is the second Data; all the other duplications are due to the bilateral retransmit-on-duplicate strategy.
All packets are sent twice from Data on. The transfer completes normally, but takes double the normal bandwidth. The usual fix is to have one side (usually the sender) retransmit on timeout only. TCP does this; see 12.19 TCP Timeout and Retransmission. See also exercise 1.5.
6.1.3 Flow Control
Stop-and-wait also provides a simple form of flow control to prevent data from arriving at the receiver faster than it can be handled. Assuming the time needed to process a received packet is less than one RTT, the stop-and-wait mechanism will prevent data from arriving too fast. If the processing time is slightly larger than RTT, all the receiver has to do is to wait to send ACK[N] until Data[N] has not only arrived but also been processed, and the receiver is ready for Data[N+1].
For modest per-packet processing delays this works quite well, but if the processing delays are long it introduces a new problem: Data[N] may time out and be retransmitted even though it has successfully been received; the receiver cannot send an ACK until it has finished processing. One approach is to have two kinds of ACKs: ACKWAIT[N] meaning that Data[N] has arrived but the receiver is not yet ready for Data[N+1], and ACKGO[N] meaning that the sender may now send Data[N+1]. The receiver will send ACKWAIT[N] when Data[N] arrives, and ACKGO[N] when it is done processing it.
Presumably we want the sender not to time out and retransmit Data[N] after ACKWAIT[N] is received, as a retransmission would be unnecessary. This introduces a new problem: if the subsequent ACKGO[N] is lost and neither side times out, the connection is deadlocked. The sender is waiting for ACKGO[N], which is lost, and the receiver is waiting for Data[N+1], which the sender will not send until the lost ACKGO[N] arrives. One solution is for the receiver to switch to a timeout model, perhaps until Data[N+1] is received.
TCP has a fix to the flow-control problem involving sender-side polling; see 12.17 TCP Flow Control.