Chalmers -    
Computer Engineering

Karn's Algorithm


Motivation

TCP adjusts the timeouts in agreement with the Round-Trip-Time (RTT) estimation for the current end-to-end path. RTT estimation is often called Smoothed-Round-Trip-Time (SRTT) and is obtained by averaging the measured RTT for the packets sent so far (the details will be described in the next section). Then, in the original RFC 793 TCP, the retransmit timeout (RTO) is set as twice the SRTT .

A problem arises from the way samples are taken for the calculation of the average: TCP is unable to distinguish two separate acknowledgements for the same sequence number. Therefore, we have the following problem:

Left: sender, right: receiver. A timeout occurs before an ACK is received, and PKT1 is retransmitted. The ack for the first PKT1 arrives a bit later and the source measures a wrong value for the RTT.

That is, sender's TCP has the risk of making the mistake to think that the acknowledgment of a recently retransmitted packet is one sent for the packet transmitted earlier. Applet 2.1 shows this effect.

Applet 2.1

This animation file: gzipped.

The script used: karn.tcl

Parameters for this animation: ns karn.tcl false false

Description:

Link n0-n1 is 1Mbit/s, but has an incredibly high delay (10s).

We start trying to stablish a connection at t=0.1s. After six seconds we have not seen the acknowledgement for the first packet sent, so we retransmit it. This will happen several times; the fourth one is done at t=18.1s and we receive the acknowledgement for the first one at t=20.1s; the source thinks then that the round-trip time is 20.1-18.1=2.0s and updates the SRTT with that RTT sample. Both measured RTT and estimated SRTT are plotted below. Note that as the SRTT has been updated to 2s and we are using a RFC 793 TCP, the RTO is set to 2*2s=4s.

Algorithm Description

Phil Karn, radio-packet enthusiast (KA9Q :-) proposed the following modification to TCP:

  1. Do not take into account the RTT sampled from packets that have been retransmitted.
  2. On sucesive retransmissions, set each timeout to twice the previous one.

Justification

The reason for the first point is to face the previously stated problem of the uncertainy about matching the retransmitted packets and their acknowledgements. But if we are discarding RTT samples from those packets that were retransmitted, an additional mechanism is needed to feed the TimeOut estimation when that happens; this is explained with the help of the following applet:

Applet 2.2

This animation file: gzipped.

The script used: karn.tcl

Parameters for this animation: ns karn.tcl true false

Description:

The scenario is the same as in Applet 2.1. This time the source discards RTT values sampled from packets that have been retransmitted. We begin sending the connection establishment packet at t=0.1s; 6 seconds later (the initial value chosen for the Retransmit Timeout), a timeout forces its retransmission. Around t=20s we receive the acknowledgement for that first packet; this time we do not take it into account, as the packet was retransmitted once. Therefore, the TimeOut value remains the same (that is, too low, but at least we are not lowering it even more in a mistake, as it was done in Applet 2.1). We send a complete window of packets (RFC 793) and after six seconds, timeout expires for the first packet of the window; we resend it, two times. Finally, at t=40s, a burst of acknowledgements arrive and we send another window of packets.

In the previous applet we saw that it was necessary to make the TimeOut grow, but we could not trust RTT values sampled from retransmitted packets, so we cannot use them to update the SRTT. This problem is solved with the use of exponential binary backoff on successive TimeOuts. Refer to [Jacobson88] for a discussion on why it was chosen to be exponential, and not slower.

Improvements Obtained

Applet 2.3

This animation file: gzipped.

The script used: karn.tcl

Parameters for this animation: ns karn.tcl true true

Description:

The scenario is again the same as in Applet 2.1. This time we implement both RTT sample discard on retransmissions and exponential binary backoff on timeout.

We send the first packet (connection establishment) at t=0.1s; a timeout expires at t=6.1s and we retransmit it. We will discard RTT values sampled from all the acknowledgements for this packet. The new timeout interval is set to twice the previous one, that is, 12 seconds, and to expire at t=18.1s. We retransmit it again and set the timeout interval to 24 seconds. At t=20.1s, the first acknowledgement for the first packet arrives; we do not use it to update the SRTT and send a window of packets (RFC 793). The timeout interval is large enough to allow the transmission of that window of packets without any unnecessary retransmission. A burst of acknowledgements arrives around t=40s. They are acks from not retransmitted packets, so we can use the measured RTT for them to update the SRTT (and so the retransmission timeout); the following graph shows the evolution of both the measured RTT and the estimated SRTT.