6.3: Linear Bottlenecks
- Page ID
- 11138
\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)
\( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)
( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)
\( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)
\( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)
\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)
\( \newcommand{\Span}{\mathrm{span}}\)
\( \newcommand{\id}{\mathrm{id}}\)
\( \newcommand{\Span}{\mathrm{span}}\)
\( \newcommand{\kernel}{\mathrm{null}\,}\)
\( \newcommand{\range}{\mathrm{range}\,}\)
\( \newcommand{\RealPart}{\mathrm{Re}}\)
\( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)
\( \newcommand{\Argument}{\mathrm{Arg}}\)
\( \newcommand{\norm}[1]{\| #1 \|}\)
\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)
\( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)
\( \newcommand{\vectorA}[1]{\vec{#1}} % arrow\)
\( \newcommand{\vectorAt}[1]{\vec{\text{#1}}} % arrow\)
\( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vectorC}[1]{\textbf{#1}} \)
\( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)
\( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)
\( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)
\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)
\(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)6.3 Linear Bottlenecks
Consider the simple network path shown below, with bandwidths shown in packets/ms. The minimum bandwidth, or path bandwidth, is 3 packets/ms.
The slow links are R2–R3 and R3–R4. We will refer to the slowest link as the bottleneck link; if there are (as here) ties for the slowest link, then the first such link is the bottleneck. The bottleneck link is where the queue will form. If traffic is sent at a rate of 4 packets/ms from A to B, it will pile up in an ever-increasing queue at R2. Traffic will not pile up at R3; it arrives at R3 at the same rate by which it departs.
Furthermore, if sliding windows is used (rather than a fixed-rate sender), traffic will eventually not queue up at any router other than R2: data cannot reach B faster than the 3 packets/ms rate, and so B will not return ACKs faster than this rate, and so A will eventually not send data faster than this rate. At this 3 packets/ms rate, traffic will not pile up at R1 (or R3 or R4).
There is a significant advantage in speaking in terms of winsize rather than transmission rate. If A sends to B at any rate greater than 3 packets/ms, then the situation is unstable as the bottleneck queue grows without bound and there is no convergence to a steady state. There is no analogous instability, however, if A uses sliding windows, even if the winsize chosen is quite large (although a large-enough winsize will overflow the bottleneck queue). If a sender specifies a sending window size rather than a rate, then the network will converge to a steady state in relatively short order; if a queue develops it will be steadily replenished at the same rate that packets depart, and so will be of fixed size.
6.3.1 Simple fixed-window-size analysis
We will analyze the effect of window size on overall throughput and on RTT. Consider the following network path, with bandwidths now labeled in packets/second.
We will assume that in the backward B⟶A direction, all connections are infinitely fast, meaning zero delay; this is often a good approximation because ACK packets are what travel in that direction and they are negligibly small. In the A⟶B direction, we will assume that the A⟶R1 link is infinitely fast, but the other four each have a bandwidth of 1 packet/second (and no propagation-delay component). This makes the R1⟶R2 link the bottleneck link; any queue will now form at R1. The “path bandwidth” is 1 packet/second, and the RTT is 4 seconds.
As a roughly equivalent alternative example, we might use the following:
with the following assumptions: the C–S1 link is infinitely fast (zero delay), S1⟶S2 and S2⟶D each take 1.0 sec bandwidth delay (so two packets take 2.0 sec, per link, etc), and ACKs also have a 1.0 sec bandwidth delay in the reverse direction.
In both scenarios, if we send one packet, it takes 4.0 seconds for the ACK to return, on an idle network. This means that the no-load delay, RTTnoLoad, is 4.0 seconds.
(These models will change significantly if we replace the 1 packet/sec bandwidth delay with a 1-second propagation delay; in the former case, 2 packets take 2 seconds, while in the latter, 2 packets take 1 second. See exercise 4.0.)
We assume a single connection is made; ie there is no competition. Bandwidth × delay here is 4 packets (1 packet/sec × 4 sec RTT)
6.3.1.1 Case 1: winsize = 2
In this case winsize < bandwidth×delay (where delay = RTT). The table below shows what is sent by A and each of R1-R4 for each second. Every packet is acknowledged 4 seconds after it is sent; that is, RTTactual = 4 sec, equal to RTTnoLoad; this will remain true as the winsize changes by small amounts (eg to 1 or 3). Throughput is proportional to winsize: when winsize = 2, throughput is 2 packets in 4 seconds, or 2/4 = 1/2 packet/sec. During each second, two of the routers R1-R4 are idle. The overall path will have less than 100% utilization.
Time |
A |
R1 |
R1 |
R2 |
R3 |
R4 |
B |
---|---|---|---|---|---|---|---|
T |
sends |
queues |
sends |
sends |
sends |
sends |
ACKs |
0 |
1,2 |
2 |
1 |
||||
1 |
2 |
1 |
|||||
2 |
2 |
1 |
|||||
3 |
2 |
1 |
|||||
4 |
3 |
3 |
2 |
1 |
|||
5 |
4 |
4 |
3 |
2 |
|||
6 |
4 |
3 |
|||||
7 |
4 |
3 |
|||||
8 |
5 |
5 |
4 |
3 |
|||
9 |
6 |
6 |
5 |
4 |
Note the brief pile-up at R1 (the bottleneck link!) on startup. However, in the steady state, there is no queuing. Real sliding-windows protocols generally have some way of minimizing this “initial pileup”.
6.3.1.2 Case 2: winsize = 4
When winsize=4, at each second all four slow links are busy. There is again an initial burst leading to a brief surge in the queue; RTTactual for Data[4] is 7 seconds. However, RTTactual for every subsequent packet is 4 seconds, and there are no queuing delays (and nothing in the queue) after T=2. The steady-state connection throughput is 4 packets in 4 seconds, ie 1 packet/second. Note that overall path throughput now equals the bottleneck-link bandwidth, so this is the best possible throughput.
T |
A sends |
R1 queues |
R1 sends |
R2 sends |
R3 sends |
R4 sends |
B ACKs |
---|---|---|---|---|---|---|---|
0 |
1,2,3,4 |
2,3,4 |
1 |
||||
1 |
3,4 |
2 |
1 |
||||
2 |
4 |
3 |
2 |
1 |
|||
3 |
4 |
3 |
2 |
1 |
|||
4 |
5 |
5 |
4 |
3 |
2 |
1 |
|
5 |
6 |
6 |
5 |
4 |
3 |
2 |
|
6 |
7 |
7 |
6 |
5 |
4 |
3 |
|
7 |
8 |
8 |
7 |
6 |
5 |
4 |
|
8 |
9 |
9 |
8 |
7 |
6 |
5 |
At T=4, R1 has just finished sending Data[4] as Data[5] arrives from A; R1 can begin sending packet 5 immediately. No queue will develop.
Case 2 is the “congestion knee” of Chiu and Jain [CJ89], defined here in 1.7 Congestion.
6.3.1.3 Case 3: winsize = 6
T |
A sends |
R1 queues |
R1 sends |
R2 sends |
R3 sends |
R4 sends |
B ACKs |
---|---|---|---|---|---|---|---|
0 |
1,2,3,4,5,6 |
2,3,4,5,6 |
1 |
||||
1 |
3,4,5,6 |
2 |
1 |
||||
2 |
4,5,6 |
3 |
2 |
1 |
|||
3 |
5,6 |
4 |
3 |
2 |
1 |
||
4 |
7 |
6,7 |
5 |
4 |
3 |
2 |
1 |
5 |
8 |
7,8 |
6 |
5 |
4 |
3 |
2 |
6 |
9 |
8,9 |
7 |
6 |
5 |
4 |
3 |
7 |
10 |
9,10 |
8 |
7 |
6 |
5 |
4 |
8 |
11 |
10,11 |
9 |
8 |
7 |
6 |
5 |
9 |
12 |
11,12 |
10 |
9 |
8 |
7 |
6 |
10 |
13 |
12,13 |
11 |
10 |
9 |
8 |
7 |
Note that packet 7 is sent at T=4 and the acknowledgment is received at T=10, for an RTT of 6.0 seconds. All later packets have the same RTTactual. That is, the RTT has risen from RTTnoLoad = 4 seconds to 6 seconds. Note that we continue to send one windowful each RTT; that is, the throughput is still winsize/RTT, but RTT is now 6 seconds.
One might initially conjecture that if winsize is greater than the bandwidth×RTTnoLoad product, then the entire window cannot be in transit at one time. In fact this is not the case; the sender does usually have the entire window sent and in transit, but RTT has been inflated so it appears to the sender that winsize equals the bandwidth×RTT product.
In general, whenever winsize > bandwidth×RTTnoLoad, what happens is that the extra packets pile up at a router somewhere along the path (specifically, at the router in front of the bottleneck link). RTTactual is inflated by queuing delay to winsize/bandwidth, where bandwidth is that of the bottleneck link; this means winsize = bandwidth×RTTactual. Total throughput is equal to that bandwidth. Of the 6 seconds of RTTactual in the example here, a packet spends 4 of those seconds being transmitted on one link or another because RTTnoLoad=4. The other two seconds, therefore, must be spent in a queue; there is no other place for packets wait. Looking at the table, we see that each second there are indeed two packets in the queue at R1.
If the bottleneck link is the very first link, packets may begin returning before the sender has sent the entire windowful. In this case we may argue that the full windowful has at least been queued by the sender, and thus has in this sense been “sent”. Suppose the network, for example, is
where, as before, each link transports 1 packet/sec from A to B and is infinitely fast in the reverse direction. Then, if A sets winsize = 6, a queue of 2 packets will form at A.
6.3.2 RTT Calculations
We can make some quantitative observations of sliding windows behavior, and about queue utilization. First, we note that RTTnoLoad is the physical “travel” time (subject to the limitations addressed in 5.2 Packet Delay Variability); any time in excess of RTTnoLoad is spent waiting in a queue somewhere. Therefore, the following holds regardless of competing traffic, and even for individual packets:
- queue_time = RTTactual − RTTnoLoad
When the bottleneck link is saturated, that is, is always busy, the number of packets actually in transit (not queued) somewhere along the path will always be bandwidth × RTTnoLoad.
Second, we always send one windowful per actual RTT, assuming no losses and each packet is individually acknowledged. This is perhaps best understood by consulting the diagrams above, but here is a simple non-visual argument: if we send Data[N] at time TD, and ACK[N] arrives at time TA, then RTT = TA − TD, by definition. At time TA the sender is allowed to send Data[N+winsize], so during the RTT interval TD ≤ T < TA the sender must have sent Data[N] through Data[N+winsize-1]; that is, winsize many packets in time RTT. Therefore (whether or not there is competing traffic) we always have
- throughput = winsize/RTTactual
where “throughput” is the rate at which the connection is sending packets.
This relationship holds even if winsize or the bottleneck bandwidth changes suddenly, though in that case RTTactual might change from one packet to the next, and the throughput here must be seen as a measurement averaged over the RTT of one specific packet. If the sender doubles its winsize, those extra packets will immediately end up in a queue somewhere (perhaps a queue at the sender itself, though this is why in examples it is often clearer if the first link has infinite bandwidth so as to prevent this). If the bottleneck bandwidth is cut in half without changing winsize, eventually the RTT must rise due to queuing. See exercise 12.0.
In the sliding windows steady state, where throughput and RTTactual are reasonably constant, the average number of packets in the queue is just throughput×queue_time (where throughput is measured in packets/sec):
3. queue_usage = throughput × (RTTactual − RTTnoLoad)= winsize × (1 − RTTnoLoad/RTTactual)
To give a little more detail making the averaging perhaps clearer, each packet spends time (RTTactual − RTTnoLoad) in the queue, from equation 1 above. The total time spent by a windowful of packets is winsize × (RTTactual − RTTnoLoad), and dividing this by RTTactual thus gives the average number of packets in the queue over the RTT interval in question.
In the presence of competing traffic, the throughput referred to above is simply the connection’s current share of the total bandwidth. It is the value we get if we measure the rate of returning ACKs. If there is no competing traffic and winsize is below the congestion knee – winsize < bandwidth × RTTnoLoad – then winsize is the limiting factor in throughput. Finally, if there is no competition and winsize ≥ bandwidth × RTTnoLoad then the connection is using 100% of the capacity of the bottleneck link and throughput is equal to the bottleneck-link physical bandwidth. To put this another way,
- RTTactual = winsize/bottleneck_bandwidth
queue_usage = winsize − bandwidth × RTTnoLoad
Dividing the first equation by RTTnoLoad, and noting that bandwidth × RTTnoLoad = winsize - queue_usage = transit_capacity, we get
- RTTactual/RTTnoLoad = winsize/transit_capacity
= (transit_capacity + queue_usage) / transit_capacity
Regardless of the value of winsize, in the steady state the sender never sends faster than the bottleneck bandwidth. This is because the bottleneck bandwidth determines the rate of packets arriving at the far end, which in turn determines the rate of ACKs arriving back at the sender, which in turn determines the continued sending rate. This illustrates the self-clocking nature of sliding windows.
We will return in 14 Dynamics of TCP to the issue of bandwidth in the presence of competing traffic. For now, suppose a sliding-windows sender has winsize > bandwidth × RTTnoLoad, leading as above to a fixed amount of queue usage, and no competition. Then another connection starts up and competes for the bottleneck link. The first connection’s effective bandwidth will thus decrease. This means that bandwidth × RTTnoLoad will decrease, and hence the connection’s queue usage will increase.
6.3.3 Graphs at the Congestion Knee
Consider the following graphs of winsize versus
- throughput
- delay
- queue utilization
The critical winsize value is equal to bandwidth × RTTnoLoad; this is known as the congestion knee. For winsize below this, we have:
- throughput is proportional to winsize
- delay is constant
- queue utilization in the steady state is zero
For winsize larger than the knee, we have
- throughput is constant (equal to the bottleneck bandwidth)
- delay increases linearly with winsize
- queue utilization increases linearly with winsize
Ideally, winsize will be at the critical knee. However, the exact value varies with time: available bandwidth changes due to the starting and stopping of competing traffic, and RTT changes due to queuing. Standard TCP makes an effort to stay well above the knee much of the time, presumably on the theory that maximizing throughput is more important than minimizing queue use.
The power of a connection is defined to be throughput/RTT. For sliding windows below the knee, RTT is constant and power is proportional to the window size. For sliding windows above the knee, throughput is constant and delay is proportional to winsize; power is thus proportional to 1/winsize. Here is a graph, akin to those above, of winsize versus power:
6.3.4 Simple Packet-Based Sliding-Windows Implementation
Here is a pseudocode outline of the receiver side of a sliding-windows implementation, ignoring lost packets and timeouts. We abbreviate as follows:
W: winsizeLA: last_ACKed
Thus, the next packet expected is LA+1 and the window is [LA+1, …, LA+W]. We have a data structure EarlyArrivals in which we can place packets that cannot yet be delivered to the receiving application.
Upon arrival of Data[M]:
if M≤LA or M>LA+W, ignore the packetif M>LA+1, put the packet into EarlyArrivals.if M==LA+1:deliver the packet (that is, Data[LA+1]) to the applicationLA = LA+1 (slide window forward by 1)while (Data[LA+1] is in EarlyArrivals) {output Data[LA+1]LA = LA+1}send ACK[LA]
A possible implementation of EarlyArrivals is as an array of packet objects, of size W. We always put packet Data[M] into position M % W.
At any point between packet arrivals, Data[LA+1] is not in EarlyArrivals, but some later packets may be present.
For the sender side, we begin by sending a full windowful of packets Data[1] through Data[W], and setting LA=0. When ACK[M] arrives, LA<M≤LA+W, the window slides forward from [LA+1…LA+W] to [M+1…M+W], and we are now allowed to send Data[LA+W+1] through Data[M+W]. The simplest case is M=LA+1.
Upon arrival of ACK[M]:
if M≤LA or M>LA+W, ignore the packetotherwise:set K = LA+W+1, the first packet just above the old windowset LA = M, just below the bottom of the new windowfor (i=K; i≤LA+W; i++) send Data[i]
Note that new ACKs may arrive while we are in the loop at the last line. We assume here that the sender stolidly sends what it may send and only after that does it start to process additional arriving ACKs. Some implementations may take a more asynchronous approach, perhaps with one thread processing arriving ACKs and incrementing LA and another thread sending everything it is allowed to send.
To add support for timeout and retransmission, each transmitted packet would need to be stored, together with the time it was sent. Periodically this collection of stored packets must then be scanned, looking for packets for which send_time
+ timeout_interval
≤ current_time
; those packets get retransmitted. When a packet Data[N] is acknowledged (perhaps by an ACK[M] for M>N), it can be deleted.