6.2 Sliding Windows
Stop-and-wait is reliable but it is not very efficient (unless the path involves neither intermediate switches nor significant propagation delay; that is, the path involves a single LAN link). Most links along a multi-hop stop-and-wait path will be idle most of the time. During a file transfer, ideally we would like zero idleness (at least along the slowest link; see 6.3 Linear Bottlenecks).
We can improve overall throughput by allowing the sender to continue to transmit, sending Data[N+1] (and beyond) without waiting for ACK[N]. We cannot, however, allow the sender get too far ahead of the returning ACKs. Packets sent too fast, as we shall see, simply end up waiting in queues, or, worse, dropped from queues. If the links of the network have sufficient bandwidth, packets may also be dropped at the receiving end.
Now that, say, Data and Data may be simultaneously in transit, we have to revisit what ACK means: does it mean that the receiver has received only Data, or does it mean both Data and Data have arrived? We will assume the latter, that is, ACKs are cumulative: ACK[N] cannot be sent until Data[K] has arrived for all K≤N. With this understanding, if ACK is lost then a later-arriving ACK makes up for it; without it, if ACK is lost the only recovery is to retransmit Data.
The sender picks a window size, winsize. The basic idea of sliding windows is that the sender is allowed to send this many packets before waiting for an ACK. More specifically, the sender keeps a state variable last_ACKed, representing the last packet for which it has received an ACK from the other end; if data packets are numbered starting from 1 then initially last_ACKed = 0.
At any instant, the sender may send packets numbered last_ACKed + 1 through last_ACKed + winsize; this packet range is known as the window. Generally, if the first link in the path is not the slowest one, the sender will most of the time have sent all these.
If ACK[N] arrives with N > last_ACKed (typically N = last_ACKed+1), then the window slides forward; we set last_ACKed = N. This also increments the upper edge of the window, and frees the sender to send more packets. For example, with winsize = 4 and last_ACKed = 10, the window is [11,12,13,14]. If ACK arrives, the window slides forward to [12,13,14,15], freeing the sender to send Data. If instead ACK arrives, then the window slides forward to [14,15,16,17] (recall that ACKs are cumulative), and three more packets become eligible to be sent. If there is no packet reordering and no packet losses (and every packet is ACKed individually) then the window will slide forward in units of one packet at a time; the next arriving ACK will always be ACK[last_ACKed+1].
Note that the rate at which ACKs are returned will always be exactly equal to the rate at which the slowest link is delivering packets. That is, if the slowest link (the “bottleneck” link) is delivering a packet every 50 ms, then the receiver will receive those packets every 50 ms and the ACKs will return at a rate of one every 50 ms. Thus, new packets will be sent at an average rate exactly matching the delivery rate; this is the sliding-windows self-clocking property. Self-clocking has the effect of reducing congestion by automatically reducing the sender’s rate whenever the available fraction of the bottleneck bandwidth is reduced.
Below is a video of sliding windows in action, with winsize = 5. (A link is here, if the embedded video does not display properly, which will certainly be the case with non-html formats.) The nodes are labeled 0, 1 and 2. The second link, 1–2, has a capacity of five packets in transit either way, so one “flight” (windowful) of five packets can exactly fill this link. The 0–1 link has a capacity of one packet in transit either way. The video was prepared using the network animator, “nam”, described further in 16 Network Simulations: ns-2.
The first flight of five data packets leaves node 0 just after T=0, and leaves node 1 at around T=1 (in video time). Subsequent flights are spaced about seven seconds apart. The tiny packets moving leftwards from node 2 to node 0 represent ACKs; at the very beginning of the video one can see five returning ACKs from the previous windowful. At any moment (except those instants where packets have just been received) there are in principle five packets in transit, either being transmitted on a link as data, or being transmitted as an ACK, or sitting in a queue (this last does not happen in this video). Due to occasional video artifacts, in some frames not all the ACK packets are visible.
6.2.1 Bandwidth × Delay
As indicated previously (5.1 Packet Delay), the bandwidth × RTT product represents the amount of data that can be sent before the first response is received. It plays a large role in the analysis of transport protocols. In the literature the bandwidth×delay product is often abbreviated BDP.
The bandwidth × RTT product is generally the optimum value for the window size. There is, however, one catch: if a sender chooses winsize larger than this, then the RTT simply grows – due to queuing delays – to the point that bandwidth × RTT matches the chosen winsize. That is, a connection’s own traffic can inflate RTTactual to well above RTTnoLoad; see 220.127.116.11 Case 3: winsize = 6 below for an example. For this reason, a sender is often more interested in bandwidth × RTTnoLoad, or, at the very least, the RTT before the sender’s own packets had begun contributing to congestion.
We will sometimes refer to the bandwidth × RTTnoLoad product as the transit capacity of the route. As will become clearer below, a winsize smaller than this means underutilization of the network, while a larger winsize means each packet spends time waiting in a queue somewhere.
Below are simplified diagrams for sliding windows with window sizes of 1, 4 and 6, each with a path bandwidth of 6 packets/RTT (so bandwidth × RTT = 6 packets). The diagram shows the initial packets sent as a burst; these then would be spread out as they pass through the bottleneck link so that, after the first burst, packet spacing is uniform. (Real sliding-windows protocols such as TCP generally attempt to avoid such initial bursts.)
With winsize=1 we send 1 packet per RTT; with winsize=4 we always average 4 packets per RTT. To put this another way, the three window sizes lead to bottle-neck link utilizations of 1/6, 4/6 and 6/6 = 100%, respectively.
While it is tempting to envision setting winsize to bandwidth × RTT, in practice this can be complicated; neither bandwidth nor RTT is constant. Available bandwidth can fluctuate in the presence of competing traffic. As for RTT, if a sender sets winsize too large then the RTT is simply inflated to the point that bandwidth × RTT matches winsize; that is, a connection’s own traffic can inflate RTTactual to well above RTTnoLoad. This happens even in the absence of competing traffic.
6.2.2 The Receiver Side
Perhaps surprisingly, sliding windows can work pretty well with the receiver assuming that winsize=1, even if the sender is in fact using a much larger value. Each of the receivers in the diagrams above receives Data[N] and responds with ACK[N]; the only difference with the larger sender winsize is that the Data[N] arrive faster.
If we are using the sliding-windows algorithm over single links, we may assume packets are never reordered, and a receiver winsize of 1 works quite well. Once switches are introduced, however, life becomes more complicated (though some links may do link-level sliding-windows for per-link throughput optimization).
If packet reordering is a possibility, it is common for the receiver to use the same winsize as the sender. This means that the receiver must be prepared to buffer a full window full of packets. If the window is [11,12,13,14,15,16], for example, and Data is delayed, then the receiver may have to buffer Data through Data.
Like the sender, the receiver will also maintain the state variable last_ACKed, though it will not be completely synchronized with the sender’s version. At any instant, the receiver is willing to accept Data[last_ACKed+1] through Data[last_ACKed+winsize]. For any but the first of these, the receiver must buffer the arriving packet. If Data[last_ACKed+1] arrives, then the receiver should consult its buffers and send back the largest cumulative ACK it can for the data received; for example, if the window is [11-16] and Data, Data and Data are in the buffers, then on arrival of Data the correct response is ACK. Data fills the “gap”, and the receiver has now received everything up through Data. The new receive window is [14-19], and as soon as the ACK reaches the sender that will be the new send window as well.
6.2.3 Loss Recovery Under Sliding Windows
Suppose winsize = 4 and packet 5 is lost. It is quite possible that packets 6, 7, and 8 may have been received. However, the only (cumulative) acknowledgment that can be sent back is ACK; the sender does not know how much of the windowful made it through. Because of the possibility that only Data (or more generally Data[last_ACKed+1]) is lost, and because losses are usually associated with congestion, when we most especially do not wish to overburden the network, the sender will usually retransmit only the first lost packet, eg packet 5. If packets 6, 7, and 8 were also lost, then after retransmission of Data the sender will receive ACK, and can assume that Data now needs to be sent. However, if packets 6-8 did make it through, then after retransmission the sender will receive back ACK, and so will know 6-8 do not need retransmission and that the next packet to send is Data.
Normally Data through Data would time out shortly after Data times out. After the first timeout, however, sliding windows protocols generally suppress further timeout/retransmission responses until recovery is more-or-less complete.
Once a full timeout has occurred, usually the sliding-windows process itself has ground to a halt, in that there are usually no packets remaining in flight. This is sometimes described as pipeline drain. After recovery, the sliding-windows process will have to start up again. Most implementations of TCP, as we shall see later, implement a mechanism (“fast recovery”) for early detection of packet loss, before the pipeline has fully drained.