Fast Bulk Transfer starring *UDP*

Faster Bulk Transfer Starring: UDP

Study and Evaluation of 3 UDP protocols

1. Introduction
Why is the speed at which data is transferred over networks seemingly not keeping up with the touted improvements in the hardware through which this data travels?? We have optical networks limited only by the speed of light, 10GbE switches, and multiple, hyperthreaded Ghz processors inside our computers. Yet it is a verifiable fact that today's applications are having trouble delivering needed thruput even as network bandwidth increases and hardware --both inside and outside our own computers--increases in capacity and power. Here at Oak Ridge National Laboratory we are particularly interested in improving bulk transfers over high-bandwidth, high-latency networks because of our involvment in storage and in the transfer of data for cutting-edge scientific applications. It is important to remember however that aggressive tools, such as the ones described in this review and including parallel TCP, are intended for high-speed, dedicated links or links over which quality of service is available--not for general use over the broader Internet!

2. Background
2.1. Tuning TCP
ORNL has looked at many ways to address this problem. Extensive tests have been conducted attempting to pinpoint problems and suggest possible solutions for the "fixable" problems found. We are partners in the NET100 / WEB100 project which is an effort to look "inside" the kernel by expanding the kernel instrumentation set and making these kernel variables available for reading/tuning through the linux /proc file interface. We have written and modified a number of tools which make use of this newly accessible information--the WAD, iperf100, ttcp100, WEBD, TRACED, and a web100-enabled bandwidth tester . We have also simulated TCP using the TCP-over-UDP test harness, atou , so that we might see the effects of changes to TCP proposed by Sally Floyd and others. The claim is that TCP, which has been the workhorse of the past, is a part of the problem in today's high-bandwidth, high-latency networks primarily because some of the very algorithms which have served it well in the past are unable to keep the pipe full when the bandwidth is high, round trip times are long, and there is often re-ordering/loss in the path.
Slow Start is the way TCP initiates data flow across a connection. Congestion Avoidance is the way TCP deals with lost packets. Congestion Avoidance and Slow Start require that two variables be kept for each connection--a congestion window(cwnd) and a slow start threshold size(ssthresh).

Slow Start operates by observing that the rate at which new packets should be injected into the network is the rate at which the acknowledgments are returned by the other end. Because of this, when round-trip times are long, much more time elapses before the capacity of the path is reached. TCP continues to double cwnd for each ACK received until a timeout occurs or duplicate ACKs are received indicating a lost packet.

Upon receiving 3 duplicate ACKs, cwnd is halved and that value is stored in ssthresh. If there is a timeout, cwnd is reset to 1 or 2 segments. If cwnd is less than or equal to ssthresh, TCP is in Slow Start; else Congestion Avoidance takes over and cwnd is incremented by 1/cwnd for each ACK. This is an additive increase, compared to Slow Start's exponential increase. Again, a flow is penalized when round-trip times are long and the large capacity of the path may never be reached.

Some researchers have concluded that the solution requires a new, udp-like protocol. NETBLT, a transport level protocol, has been proposed to permit the transfer of very large amounts of data between two clients. NETBLT is designed to have features which minimize the effects of packet loss, delays over satellite links, and network congestion.

2.2. NETBLT
NETBLT is a bulk data transfer protocol described in rfc998 and proposed in the 1985-87 timeframe as a transport level protocol having been assigned the official protocol number of 30. NETBLT is included in the background section because at least two of the newer UDP protocols have acknowleged their debt to the design of NETBLT and the design of a third is clearly based on the NETBLT ideas. However the resistance to new kernel-level protocols plus the lengthy approval process seems to have influenced the authors of the new UDP protocls to implement their designs at the application-level.

NETBLT was designed specifically for high-bandwidth, high-latency networks including satellite channels. NETBLT differs from TCP in that it uses a rate-based flow control scheme rather than TCP's window-based flow control. The rate control parameters are negotiated during the connection initialization and periodically throughout the connection. The sender uses timers rather than ACKS to maintain the negotiated rate. Since the overhead of timing mechanisms on a per packet basis can lower performance, NETBLT's rate control consists of a burst size and a burst rate with burst_size/burst_rate equal to the average transmission time per packet. Both size and rate should be based on a combination of the capacities of the end points as well as that of the intermediate gateways and networks. NETBLT separates error control and flow control so that losses and retransmissions do not affect the flow rate. NETBLT uses a system of timers to ensure reliability in delivery of control messages and both sender and receiver send/receive control messages.

rfc998 gives the following explanation of the protocol: "the sending client loads a buffer of data and calls down to the NETBLT layer to transfer it. The NETBLT layer breaks the buffer up into packets and sends these packets across the network in datagrams. The receiving NETBLT layer loads these packets into a matching buffer provided by the receiving client. When the last packet in the buffer has arrived, the receiving NETBLT checks to see that all packets in that buffer have been correctly received. If some packets are missing, the receiving NETBLT requests that they be resent. When the buffer has been completely transmitted, the receiving client is notified by its NETBLT layer. The receiving client disposes of the buffer and provides a new buffer to receive more data. The receiving NETBLT notifies the sender that the new buffer is ready and the sender prepares and sends the next buffer in the same manner."

As described, the NETBLT protocol is "lock-step". However, a multiple buffering capability together with various timeout/retransmit algorithms give rise to the claim that NETBLT gets good performance over long-delay channels without impairing performance over high-speed LANs. NETBLT, however, is not widely implemented.

3. Recent UDP application level protocols
The following protocols have not yet been described in an rfc as far as could be determined so the overview of each is taken from papers or documentation produced by the authors of the protocols. In some cases, where descriptions did not seem clear or documentation was sketchy, we did look at the code for further clarification. These protocols are still in transition. This means, among other things, that the details which follow are a snapshot in time and not a final picture.

SABUL, RBUDP, FOBS and TSUNAMI all address the problem of limited thruput over high-speed, high-bandwidth networks with proposals which are similar in concept but slightly different in implementation. RBUDP(QUANTA) will not be evaluated at this time. QUANTA has some very interesting ideas such as forward error correction but has not implemented this and does not yet do file transfers as it is still in the very early stages of development.

3.1. Use of TCP/UDP channel(s)
All three proposals use one or more TCP connections for sending/receiving various control information and one UDP connection for sending/receiving the actual data. Once the connections are established, the control packets are sent from receiver to sender for SABUL and TSUNAMI but control packets are sent both ways in FOBS . For all three, the data is sent only one way--from sender to receiver.

3.2. Rate-Control Algorithm
Rate control is seen as a way to control the burstiness so often observed in TCP flows. This burstiness may cause losses as router queues suddenly fill up and packets are dropped or network interface cards cannot keep up. Also, rate control allows a flow to more quickly fill a pipe without going through the initial ramping-up process characteristic of TCP. Rate control or inter-packet delay, which adjusts to packet loss and/or network congestion as reported by the receiver, has been added in some form to all the UDP protocols to counter the charges of unfair use of capacity and potential to create network problems.
....rate calculation...
TSUNAMI gives the user the ability to initialize many parameters including UDP buffer size, tolerated error rate, sending rate and slowdown/speedup factors with the 'set' command. If the user does not set sending rate however, it starts out at 1000Mbs with a default tolerated loss rate of 7.9%. Since a block of file data(default 32768) is read and handed to UDP/IP, the rate control is actually implemented per block rather than per packet. The receiver uses the combination of the number of packets received (a multiple of 50) and a timed interval (>350ms) since the last update to determine when to send a REQUEST_ERROR_RATE packet containing a smoothed error rate. If the error rate is greater than the maximum tolerated rate, the sending rate is decreased; if it is less, the sending rate is increased.
FOBS asks for the local and remote network interface card speed which it uses to determine a maximum beginning rate. The default tolerated loss rate is 1%. FOBS calculates a table of rates during sender initialization linking these rates to a network state machine. After a segment of data(about 10000 1466-byte packets) has been transferred, the sender requests an update from the receiver. The reported packet loss from the receiver is used then to calculate the current bandwidth. The current bandwidth is compared against the pre-calculated table values to determine the current state of the network and pull a corresponding rate from the table.
SABUL begins with a preset IPD (inter-packet delay) of 10 usec which it converts to CPU cycles. The receiver generates a SYN packet based on a timed interval(200ms) which signals the sender to use both the number of lost packets and the number of packets--including retransmits--sent since the last SYN time to calculate a current loss rate. This loss rate is then input to a weighted moving average formula to give a history-rate. If the history-rate is greater then a preset limit(.001), the IPD is increased; if less than the limit, the IPD is decreased; if equal, .1 is added. In a former release, SABUL attempted to keep the loss rate between an upper and lower limit. The latest implementation is similar in concept to TSUNAMI's in that both keep the delay between blocks/packets between an upper and lower limit.
...rate implementation...
SABUL is the only one of the three to implement the IPD(inter-packet delay) between individual packets as opposed to groups of packets. The delay is implemented by repeated calls to rtdsc() until the requisite number of clock cycles have passed. FOBS checks the sending rate after a burst of packets(25) and implements the delay with a gettimeofday calculation until time to send the next burst. TSUNAMI uses a timed select() to implement the delay between blocks of data.

4. Test Results and Analysis
conditions of the tests
The tests were run morning and afternoon over a period of several months. The results varied widely and so the ones included are members of best/worst case scenarios. Tests were also run using NISTNET for a sanity check since we did not have access to simulator code for the protocols.
characteristics of the computers at each end
On the ORNL side, firebird is a Dell Precision 330 single-processor host running a 2.4.20-web100 linux kernel with a SysKonnect Gigabit Ethernet network interface card connected to a Fast Ethernet network(CCS 192 LAN). Firebird has a 1.4GHz Intel Pentium IV processor, a 24GB Maxtor hard drive and 512MB of memory. Tests indicate speeds of approximately 4400Mbs+ writing to and 12000Mbs+ reading from disk.
net100.lbl.gov is a single processor host running a 2.4.10-web100 linux kernel with a NetGear|GA630 Gigabit Ethernet interface card connected to a Fast Ethernet network. net100.lbl has a AMD Athlon(tm) 4 Processor stepping 02 running at 1.4GHz and 256MB of memory. Tests indicate speeds of approximately 73.276Mbps+ writing to and 47.9Mbps+ reading from disk.
characteristics of network between
The results of ping tests:

ping -c 10 131.243.2.93

round-trip min/avg/max/mdev = 67.209/67.490/67.657/0.324 ms

ffowler@net100(145)>ping -c 10 160.91.192.165

rtt min/avg/max/mdev = 67.287/67.748/68.219/0.333 ms

Running traceroute shows use of the ESnet OC48 backbone with basic route symmetry although variations can occur.

4.1. Internet Memory-to-memory tests
200 MByte Memory-to-memory tests were run from ORNL to LBL with the following results (Lost is the total figure the receiver reports to the sender as lost--based on how the application reports losses, the same losses could be reported more than once):

ORNL to LBL:
Application     Rate(Mbs)       Loss/Retrans/PktsSent/PktsRecv/Dups
			BEST CASE
*QUANTA		358.28		  73/     73/  137742/  137669/0
FOBS		317.17		2816/   6720/  143147/  138436/2009  
SABUL		343.49		  15/     15/  140015/  140005/5
**TSUNAMI	307.23		  55/    259/   13361/   13102/204
***iperf	489.0		   0/      0/  145499/  145499/0  
iperf100 -P 3   279.7		   6/	   6/  171303/  171299/2

LBL to ORNL:
		        WORST CASE
*QUANTA		310.37		4415/   4415/  142157/  137742/0
FOBS		256.17		2170/   6103/  142530/  138484/2057
SABUL		151.76	       47838/  44895/  184895/  142441/2441
**TSUNAMI	243.92          1689/   1689/   14963/   13330/228
***iperf	420.0		   3/	   0/  142846/  142843/0
iperf100 -P 3   229.8		   5/	   7/  145491/  145484/2

*QUANTA, as shipped, does not adjust the rate even if there are losses!
**TSUNAMI was giving 16384 byte blocks to ip and letting ip fragment
***iperf does not buffer nor retransmit losses and so should be faster

TSUNAMI does not provide a memory-to-memory transfer option so the file accesses--fseek, fread, fwrite--were commented out for this test.
4.2. Internet File transfer tests
File transfer tests to transfer a 214649928 Byte file from LBL to ORNL:

Application     Rate(Mbs)       Lost/NewDataSent/PktsRcvd/Dups
			BEST CASE
FOBS+            45.01	        2668/     146421/  151624/5203   
SABUL		 52.14	        5463/     146220/  178297/32077
TSUNAMI*	 47.17	       44400/     146419/  181840/35421	
GRIDFTP	         33.79 (4224.00KB/sec) 3 streams
		        WORST CASE
FOBS		 28.52	        3503/     146421/  147787/1366
SABUL		 37.25	        4200/     146220/  216482/70262
TSUNAMI*	  8.2--had to Ctrl/C
GRIDFTP          13.12 (1638.31KB/sec) 1 stream

*TSUNAMI was sending 1466 byte blocks
+FOBS would have achieved 243.4Mbs if not counting the wait for
   file writes!!
QUANTA does not have a file transfer option at this time

File transfers to pcgiga at Cern gives a longer RTT with the following results:

			BEST CASE   
Application     Rate(Mbs)       Lost/Retrans/PktsSent/PktsRcvd/Dups
iperf		469.0		   0/      0/  147563/  147563/0 
FOBS            247.78            78/  14188/  160609/  155792/9371 
SABUL           250.88             3/      3/  146223/  146222/2
TSUNAMI         270.74             0/   4980/  146419/  146901/482
                        WORST CASE
FOBS		153.03		  23/  14245/  160666/  156133/9712 
*SABUL		139.52		1708/  91856/  238076/  191202/44982
TSUNAMI  	  4.30--had to Ctrl/C
*SABUL had 4 EXP events in which all packets "in flight" are assumed
 lost

4.3. Tests using a LAN, a private network and NISTNET
To validate results obtained over the broader Internet where conditions are unpredictable and constantly changing, tests were performed under more controlled conditions using NISTNET. NISTNET is running on an old, slow Gateway machine, viper, with 64MB of memory and two 100Mbs Network Interface cards. One NIC is connected into a NETGEAR Fast Ethernet Switch and one into a local area network. The other two machines involved are dual processor pcs previously used in a cluster and have 512MB of memory and a 100Mbs Network Interface card. Pinto is connected into the same local area network and pinto10 is connected into the NETGEAR Switch. Conditions are not completely controlled in the local area network, but are observed to be mostly stable with rare exceptions.
With NISTNET configuration:

	pinto.ccs.ornl.gov pinto10 --delay 35.000 
	pinto10 pinto.ccs.ornl.gov --delay 35.000

pings from pinto to pinto10 gave the following:

[ffowler@pinto ffowler]$ ping -c 3 10.10.10.2
round-trip min/avg/max = 70.5/70.7/71.0 ms
[ffowler@pinto10 ffowler]$ ping -c 3 160.91.192.110
round-trip min/avg/max = 70.6/70.6/70.6 ms

4.3.1. NISTNET Memory-to-Memory Transfers
Below are the results of 200MByte(+/-) memory-to-memory tests with NISTNET configured as above with a drop rate of 0.0:

Application	Rate		Lost/Retrans/PktsSent/Dups
iperf(TCP) -P 3	78.8 Mbits/sec     7/      7/  150571/0
iperf   (UDP)	90.5 Mbits/sec     0/      0/  153849/0
FOBS		80.8 Mbits/sec     1/    143/  136570/142 
QUANTA		89.9 Mbits/sec     0/      0/  137742/0
SABUL           87.1 Mbits/sec 37480/  17843/  157823/2832
*TSUNAMI	83.1 Mbits/sec     0/      1/   13103/1 
*1 16K block is about 12 packets

The results of 200MByte(+/-) memory-to-memory tests with NISTNET configuration:

    pinto.ccs.ornl.gov pinto10 --delay 35.000 --drop 0.0107
    pinto10 pinto.ccs.ornl.gov --delay 35.000 --drop 0.0107

Application	Rate		Lost/Retrans/PktsSent/Dups
iperf(TCP) -P 3	44.9 Mbits/sec    20/     20/  159662/0
iperf(UDP)	90.4 Mbits/sec    17/      0/  153849/0
FOBS		79.3 Mbits/sec    16/   1215/  137642/1199
QUANTA		87.5 Mbits/sec    15/     15/  137727/0
SABUL           83.5 Mbits/sec 47744/  14172/  154172/4230
*TSUNAMI	82.5 Mbits/sec    18/     18/   13151/49 
*18 16K blocks are about 12*18 = 216 packets
 13151 16K blocks are about 12*13151 = 157812 packets
 49 16K blocks are about 12*49 = 588 packets

4.3.2. NISTNET File Transfers. For real file transfers over the NISTNET testbed from pinto to pinto10, NISTNET was configured as above. The file transferred is 214649928 bytes. The drop rate--a little over .01%--consistently results in 16 to 20 packets being dropped. This is confirmed by results from FOBS which reported 17 lost packets, iperf-UDP reported 16 packets lost and iperf-TCP reported 20 packets lost.

A summary of the results from file transfers using NISTNET are shown in the table below:

Application     Rate            Lost/Retrans/PktsSent/PktsRecd/Dups
FOBS*		71.4		  17/   4754/  151175/  150537/4116
SABUL**		87.2	       11544/  19911/  166131/  148317/2097
TSUNAMI***      33.2	      181210/ 207317/  354961/  176184/29765

*FOBS sends 1466 byte data packets 
**SABUL sends 1468 byte data packets
***TSUNAMI was configured for a blocksize of 1466 bytes so comparisons 
could be made with the other two protocols but as shown in the tables 
above, TSUNAMI gets better thruput when using a larger block size and 
letting IP fragment--this is partly because TSUNAMI is written so that 
"blocksize" controls not only the size of the blocks sent but also the 
size of the file reads and writes.

4.3.3. Analysis of LAN test results
One of the obvious differences shown by the tests is the number of lost packets as reported by the three protocols.

Figure 1. Losses as reported by the receiver
The FOBS client reports losses upon notification from the server that a segment or chunk of data has been sent. The client sends a bitmap indicating the status of all packets it expects to receive in the current transfer. The graph illustrates the reporting of losses at the end of each segment. FOBS reported a total of 35 lost packets which is within the range expected.
SABUL notifies the server of losses in 2 ways. As soon as the SABUL client receives a packet number greater than the one expected, the server is notified of the loss. Also, every 20ms a collective loss report is sent. The client in this instance reported losses of 11544 packets and re-reported 8497 of these losses again in the periodic reports. Why are there so many losses when you would expect 16-20? Good Question!! The IPD(inter-packet delay) starts out at 10us but in this transfer, was slowly increased to 118 us and generally stayed between 110 and 117 us. Many of the losses occur at the beginning of the transfer when the sending rate is high. The IPD cannot be adjusted by the user.
The TSUNAMI client also keeps a tally of lost packets and sends retransmit requests after receiving a multiple of 50 blocks if a preset interval(>350ms) has passed since the last request. Before sending the requests, the list of lost blocks is cross-checked with a tally of all blocks received so far to eliminate any lost blocks that may have come in. Even with the cross check, the client in this transfer issued 181210 retransmit requests. As shown in the graph, many of the requests were repeats and a large portion occur at the end of the transfer. Looking at the data reveals that 223 packets sprinkled between 142033 and 146411 were received over 100 times--each was sent 722 times--with the last packet, 146419, being received 708 times--it was sent 1616 times!

Figure 2. Duplicates as reported by the receiver
Figure 2 shows the shape of duplicates for all three protocols. Most of SABUL's duplicates seem to come at the beginning of the flow, TSUNAMI's come at the end, and FOBS sends extra packets at the end of each chunk of data while waiting for the client to write the preceding chunk to the file.
All the servers actually transmit more packets than the client ever receives because, in all three protocols, once the client has received all the file data, it quits and does not receive the last group of packets sent. Looking at the duplicates reported by the SABUL client, each of the 2097 duplicate packets were received exactly twice but the server sent the 19911 retransmits anywhere from 2 to 10 times! This may be a result of duplicate reporting or the fact that acks are perhaps spaced too far apart??-- acks are sent every 100ms.
FOBS keeps sending packets while waiting to hear from the client that a "chunk" of data has been written to the file. This means that the duplicates are mostly clustered at the end of each chunk.
The graph of TSUNAMI's duplicates seems to indicate some confusion or mis-communication near the end of the transfer.

Figure 3. Average bandwidth(receiving)
Even with the wasting of bandwidth SABUL achieved 87.2 Mbs, FOBS got 71.4 Mbs and TSUNAMI 33.2 Mbs. The SABUL sender implements a delay between each packet whereas the other two applications implement a delay between blocks of packets similar to NETBLT. Even more important with regard to comparisons, SABUL attempts to adjust the sending rate about every 200ms based on lost-packet information. As mentioned above, the IPD starts out at 10us and, in this case, gradually increases to 118us. It then settles in at 110-118us for the main part of the transfer. There were 132 rate calculations performed by the sender during the transfer time of 18.79 seconds. The other two protocols operate in a more lock-step manner by transmitting new data in blocks or groups of packets before doing any retransmits or rate adjustments. FOBS transmits 10000 packets(size 1466) before recalculating the sending rate and a chunk-size of new data before doing any retransmits. FOBS actually calculates the bandwidth for the last 25 packets sent and, if it is greater than the desired rate, spins until it is less than or equal to the desired rate. In this case, the desired sending rate started out at 86 Mbs and because of the low packet loss, did not change. There were 17 rate calculations in the transfer time of 22.95 seconds.
TSUNAMI's algorithm is a little more obscure in that the receiver sends a rate request after a combination of 50 blocks(you define block size) and a preset interval(>350ms) has passed since the last rate request. The receiver in this example sent 87 rate requests in a transfer time of 51.72 seconds. The sender checks the sending rate between each block. If blocksize is the default(32768), this means the delay is implemented every 23 or so packets. If blocksize is 1466, as in this example, the delay becomes a true inter-packet delay. The max rate can be set by the user with the TSUNAMI client set command. In this case the rate was set to 80 Mbs giving a beginning inter-block delay of 438 us. This is not implemented directly however. After calculating the sending time for the last block, the delay is calculated as:

  delay = ((sending-time + 50) < ipd_current) ? (ipd_current - delay - 50)) : 0;

A select() call is used to implement the delay.
In the LAN file transfers above, SABUL usually wins the bandwidth prize. The exceptions occur when there are EXP(expiration) events. These are generated by the server if no ACK or ERR packet has been received during a specified interval(1000ms). The server then assumes all packets sent in the current period have been lost and adds them to the lost list. In runs using the NISTNET testbed, throughputs of 84.1, 87.2, and 87.4 Mbs were observed. But in one case where EXP events were generated, SABUL only achieved 30.9 Mbs.
FOBS' file transfer performance suffers because it is lock-step. In doing actual file transfers, the sender transmits one chunk of data and then waits for the receiver to signal that it has written the whole chunk to disk. Depending on RTT and disk speed, that can add up. With the 70ms RTT, FOBS acheived 79.9Mbs if the time spent writing the file was not counted. To be fair however, FOBS is the most consistent in it's performance with measured throughput of 72.1, 71,4, and 72.1 Mbs on runs in the NISTNET testbed.
TSUNAMI gives the least consistent performance even in the semi-controlled NISTNET environment. Using NISTNET, throughputs of 18.7, 33.2, 42.6, and 84.1 Mbs were observed. The graph illustrates the prolonged sending at the end of the transfer that has been mentioned before. For a more detailed graphical comparison of NistNet file transfers involving the protocols, see NistNet Graphs .

5. Summary Comparison of Protocol Features

PROTOCOL COMPARISONS

Features Sabul Tsunami Fobs

TCP Control Port Yes--Control packets are sent from receiver to sender
The sender can also generate and process a pseudo control packet upon the expiration of a timer Yes--Control packets are sent from receiver to sender Yes--2 control ports are used
Control packets are sent both ways

UDP Data Port Yes--Data is sent from sender to receiver Yes--Data is sent from sender to receiver way Yes--Data is sent from sender to receiver

Threaded Application Main thread does file I/O
2nd thread keeps track of timers and sends/receives packets
Server forks a process to handle receiver's request
Receiver creates a thread for disk I/O The sender and receiver are NOT threaded
fobsd is but was not used for these tests

Rate Control Yes--Inter-packet delay implemented by continuous calls to rtdsc() Yes--Inter-block delay implemented by calls to gettimeofday() and select() Yes--Inter-block delay implemented by continuous calls to gettimeofday()

Tolerated Loss 0.1% user can set--7.9% default 1.0%

Authentication No Yes--via a shared secret No

Packet Size No Yes--via a shared secret No

Socket Buffers No Yes--via a shared secret No

Congestion Control No Yes--via a shared secret No

Reorder resilience No Yes--via a shared secret No

Duplicates No Yes--via a shared secret No

Diagnostics No Yes--via a shared secret No

PROTOCOL COMPARISONS
Features	*Sabul*	*Tsunami*	*Fobs*
TCP Control Port	Yes--Control packets are sent from receiver to sender The sender can also generate and process a pseudo control packet upon the expiration of a timer	Yes--Control packets are sent from receiver to sender	Yes--2 control ports are used Control packets are sent both ways
UDP Data Port	Yes--Data is sent from sender to receiver	Yes--Data is sent from sender to receiver way	Yes--Data is sent from sender to receiver
Threaded Application	Main thread does file I/O 2nd thread keeps track of timers and sends/receives packets	Server forks a process to handle receiver's request Receiver creates a thread for disk I/O	The sender and receiver are NOT threaded fobsd is but was not used for these tests
Rate Control	Yes--Inter-packet delay implemented by continuous calls to rtdsc()	Yes--Inter-block delay implemented by calls to gettimeofday() and select()	Yes--Inter-block delay implemented by continuous calls to gettimeofday()
Tolerated Loss	0.1%	user can set--7.9% default	1.0%
Authentication	No	Yes--via a shared secret	No
Packet Size	No	Yes--via a shared secret	No
Socket Buffers	No	Yes--via a shared secret	No
Congestion Control	No	Yes--via a shared secret	No
Reorder resilience	No	Yes--via a shared secret	No
Duplicates	No	Yes--via a shared secret	No
Diagnostics	No	Yes--via a shared secret	No

6. Observations
TSUNAMI, SABUL and FOBS all do well as long as there is little reordering or loss. Reordering and/or loss seem to cause them all to transmit uneeded duplicates. FOBS gives priority to new data and transmits a CHUNK(default 100MB) of data before doing retransmits. TSUNAMI and SABUL give priority to requests for retransmits. The SABUL client reports any missing packets immediately as well as periodically every 20ms. The TSUNAMI client reports missing packets after the requirements for numbers of packets received (a multiple of 50) and amount of time since the last report (more than 350ms) have been satisfied.
In addition, TSUNAMI continues to retransmit the last block/packet until receiving a REQUEST_STOP from the receiver. In one short, loss-free transfer from ORNL to LBL of 8,135 blocks/packets of size 1466, the last block was observed being transmitted 11,876 times. The client actually received 7,835 of the 11,876 before it quit. REQUEST_RESTART requests seem to cause instability. When REQUEST_RESTARTs are sent by the receiver, often both sender and receiver had to be manually stopped as both somehow seemed to get confused and the rate fell below 4Mbps. This happened regularly on transfers from ORNL to LBL with files of 100mb or more. The message, "FULL! -- ring_reserve() blocking" also appeared at the client regularly during the transfer of large files. In order to complete the necessary file transfers with TSUNAMI, the retransmit table was enlarged and a block size of 16384 used in an attempt to eliminate REQUEST_RESTART requests.
FOBS also transmits many unecessary packets. In a file transfer involving 146,421 data packets(214649928 Bytes) and no losses, FOBS actually sent 150,100 packets. These packets are apparently sent while the sender is awaiting instructions from the receiver telling it what needs to be done next. In this case, after sending the first chunk, packets 70000-71229 were re-transmitted. Similar re-transmissions occurred after the second and third chunks. The receiver read all 71527 packets in the first chunk, sent a COMPLETEDPKT and got ready for the next chunk. Before reading the first packet in the next chunk, packets 70000-71229 were read and thrown away. A 'scaleFactor' is used to keep the packets in synch with the correct iteration or the receiver might assume these retransmits are part of the next chunk.
SABUL also has problems when there is significant reording/loss. Since the client reports every perceived loss immediately, this can mean a lot of control packets. In one case, the sender was observed having to deal with one control packet for almost every data packet it was sending. Since a tabulation of lost packets is also sent every 20ms, the same loss may be reported more than once and the sender will count it again for rate control purposes which gives rise to an interesting phenomenon. If the tabulation of losses for the last period is greater than the number of packets sent during the same period, the sender solves the problem by assuming 100% loss for that period.
With a shorter RTT--this scenario was with 150 usecs--perhaps not as many unecessary packets would be sent but it is assumed these UDP protocols are meant for transferring large files on high-bandwidth, high-delay networks.
After much testing and studying, it is still not clear why SABUL and TSUNAMI report so many losses. One clue may be that the protocols that wait 10000 packets(FOBS) or more(iperf & quanta) to report losses do much better. That would seem to indicate that packets may not be received in strict order but more study needs to be done on this problem.

6. Conclusions
fairness
TCP/Internet friendly?? Possibly...Possibly not...SABUL is the only one that makes any real claims of fairness. The efficiency and fairness of SABUL's rate-control algorithm is discussed in Rate Based Congestion Control over High Bandwidth/Delay Links. TCP friendliness is defined as: "A protocol P is TCP friendly if the coexisting TCP connections share approximately the same amount of bandwidth as if P is a TCP connection." The paper presents charts of experiments done to show that SABUL is TCP friendly and therefore fair.
The paper, An Analysis of SABUL Data Transfer Protocol presents the results of NS2 simulations involving multiple SABUL connections and also multiple SABUL and TCP connections. The conclusions reached using their model are that:

Particularly in a long-haul network, SABUL should be able to get higher throughput than TCP because SABUL uses multiplicative increase while TCP uses additive increase and, when losses occur, SABUL reduces the sending rate in proportion to the loss while TCP reduces the sending rate by half
However they found that when competing with TCP in a short-haul network, TCP could give higher thruput than SABUL.
They observed that SABUL has a self-fairness property but "may be unfair to TCP"
SABUL has an inherent property of oscillation but under certain conditions, this property breaks down and it can lock into the increase mode resulting in unfairness. They plan further analysis of this property.

A concern expressed by Stanislov Shavolov regarding Tsunami could be applied to the other UDP protocols as well.

For a more detailed outline of each of the four protocols, look at SABUL , TSUMAMI , FOBS , and RBUDP .

limitations
The UDP protocols were basically designed to work over networks with high bandwidth, long delay and little or no loss or reordering. With the possible exception of SABUL, they were not meant to share bandwidth fairly with other streams but were designed to run in reserved slots where they could run as fast as the hardware and routers involved will allow. As results show, at present they do not have the "smarts" for dealing effectively with loss or reordering.

TCP rising to the challenge?
As mentioned, there is a lot of work going on in an effort to "spruce up" TCP to meet today's demands. One one of the topics under consideration is the size of the maximum transmission unit(MTU). Matt Mathis at the Pittsburgh SuperComputing Center is providing proposals for discussion . Also, Stanislav Shalunov discusses the virtual MTU.
Using rate-based rather than window-based transmittal of packets means the UDP protocols do not have to "ramp-up" at the beginning nor "cut the window in half" when a loss is perceived. There is ongoing work, however, to experiment with modification of these aspects of TCP with Sally Floyd's proposals for TCP with large congestion windows and Tom Kelly's scalable TCP algorithm.

future planned tests
Since these tests were done, new versions of SABUL and its derivative, UDP-based Data Transport Protocol(UDT), have been developed; a new version of QUANTA/RBUDP has been made available; implementations in user space and the linux kernel of Datagram Congestion Control Protocol(DCCP) are available for download. We hope to look at these new implementations under a variety of conditions and particularly with tests planned on the new ???...Have Steven write a few words about the tests he and Nagi have planned...

Faster Bulk Transfer Starring: *UDP*

Faster Bulk Transfer Starring: UDP