This section describes the flow control method used by TCP. This is one of the few things that has been preserved without almost any change since 1981. We will use the original RFC 793 TCP for this section.
References: Jacobson's SIGCOMM'88 paper, Section 1 (especially figure 1); RFC 793, page ?; RFC 1323, page 2
Stop-and-Wait protocols have some desirable properties:
However, these protocols become rather inefficient if the path connecting the sender and the receiver has a large Bandwidth x Delay product (that is, the maximum number of bits that can be seen in transit on the link). This issue will be discussed further at the end of this section.
A possible solution to this problem would be to give the source a credit of packets that can be sent without having to wait for acknowledgements. This would allow the source to fill the pipe, then, the arrival of acknowledgements would maintain and adjust a continuous flow of data. Therefore, a Stop-and-Wait protocol would be a particular case of such protocols, where a credit of one packet is given to the source.
These issues are illustrated in applets 1.1-1.4.
TCP implements an algorithm for flow control called Sliding Window; the reader will surely be familiar with this kind of algorithms which are used for flow control at the data link control layer of some protocols as well, so only a short explanation is provided here.
The "window" is the maximum amount of data we can send without having to wait for ACKs. In summary, the operation of the algorithm is as follows:
When we wait for an acknowledgement to a packet for some time and it has not arrived yet, the packet is retransmitted. When the acknowlegement arrives, it causes the window to be repositioned and the transmission continues from the packet following the one transmitted last.
...3 2 1 0
[0 1 2 3] 4 5 6 7 8 9 --------------->
(slid window) <-------------- ...4 0 [1 2 3 4] 5 6 7 8 9 --------------> ACK4... (slid window) <-------------- ...8 7 6 5 0 1 2 3 4 [5 6 7 8] 9 --------------> ACK7,set WIN=2 ... (slid window) <-------------- ...9 0 1 2 3 4 5 6 7 [8 9] --------------> (timeout)
...9 8 0 1 2 3 4 5 6 7 [8 9] -------------->
TCP achieves following objectives by using the sliding window algorithm:
The first two issues can be easily visualized by the following example:
Applet n. 1.5
This animation file: gzipped.
The script used: ack_clock.tcl
Parameters for this animation: ns ack_clock.tcl 0.2Mb 40ms 100 20 2
Links n0-n1 and n2-n3 have 1Mbit/s bandwidth and 10ms delay. Link n1-n2 has 0.2Mbit/s bandwidth, 40ms delay. Queue size at n0 is 100 segments and TCP uses a 20-segment window size.
For the sake of simplicity, the TCP in n0 models the first TCP especification which sends as much data as the send window allows at the beginning of the transmission. Current TCP implementations use a less aggressive start (slow-start).
After the connection establishment, we start sending as much data as the send window allows us. Note how the queue grows fast at the beginning of the bottleneck. Around t=3.8 we have sent the first 20 segments that the window allows us, then we run out of window and we have to wait for the first acknowledgement to open the window again. From now on, arrival of each new acknowledgement triggers the transmission of a new segment, the TCP source adapts to the effective capacity of the path, and and we can see how the queue at the bottleneck remains stable.
Self-clocking is an interesting property of TCP that allows automatic adjustment of the transmission speed to the bandwidth and delay of the path. Therefore, it makes possible for TCP to operate over links with very different speeds. As an example, we can make the link 1-2 in the previous applet more than 700 times faster and TCP will still work without any problems (Applet 1.6).
We can make the bottleneck slower, and TCP will still work as well (Applet 1.7). However in this case we discover the effect of the 'data in transit' limitation discussed in (2): as long as the window size is smaller than the actual size of the path we send bursts of data rather than a continuous flow. Therefore, we are not able to use the link efficiently.
TCP uses cumulative acknowledgements, that is, it can acknowledge up to a certain sequence number in the same ACK. Until the moment we have seen the receiver acknowledge each segment received individually, but it is possible to wait a certain amount of time before sending the acknowledgement in order to acknowledge several segments in the same ACK. Applet 1.8 shows it.
For a link with bandwidth B (bits/s) and delay D (seconds), the product B*D (bits) is called the size of the pipe measured in bits.
Supposed that the segments are immediately acknowledged and that it takes nearly zero-time to transmit them, we have to wait approximately 2*D seconds since we finish putting a segment on the wire until we see its acknowledgment arriving; that is, the so-called Round Trip Time (RTT) is, approximately 2*D. That means that during that time we can put approximately 2*B*D segments on the link. If our send window is lower, we are sending less than we can, and therefore we make inefficient use of the link, as shown in Applet 1.3; actually, we are limiting our bandwidth to WINDOW_SIZE_in_bits/RTT bits/s.
Paths with high bandwidth x delay product are called long, fat pipes; networks containing these paths are called long, fat networks (LFN's). These were a problem for the first TCP implementations whose maximum send window was 64 KBytes (they encoded the window size in plain 16 bits, so that the maximum allowed size was 2^16 bytes, and therefore their effective bandwidth was limited to 2^16/RTT Bytes/s). Imagine a 155Mbit/s satellite link; RTT in this scenario can be as high as 300ms, and therefore a window size of about 5MBytes is required for each connection!. In such a scenario a 64KByte window size would perform poorly. These problems have been solved in recent TCP implementations with a TCP window scale option which you can find documented in the RFC 1323.
To be added