Faster Bulk Transfer Starring: *UDP*
Study and Evaluation of 3 UDP protocols
3.1. SABUL
PSockets, an earlier library designed by the authors of Sabul,
was entered in the Network Challenge at SC2000. According to
the paper,
Simple Available Bandwidth Utilization Library for High-Speed Wide Area Networks , the performance of PSockets in this Challenge
was disappointing due to a number of factors including a
misconfigured router and background packet loss and jitter.
One of the main conclusions of the subsequent research
conducted to try
and understand the problems encountered at SC2000, was that rate control
is necessary to achieve good throughput. This research was the
impetus for the design of Sabul.
SABUL is an application library implemented in C++ and designed to:
- Provide reliable data transfer
- Be implemented and tuned at the application layer
- Minimize packet and computation overhead
- Fill available bandwidth
- Share the bandwidth with other connections fairly
- Adapt to changes in the network rapidly and reliably
- Be efficient with memory use through memory copy avoidance
1. use of TCP/UDP channel(s)
SABUL uses two connections:
Data is sent from sender to receiver over the UDP channel while control
information, containing the current state of the transfer, is sent
over the TCP channel from receiver to sender. The setsockopt() call
is used to set the UDP socket SO_RCVBUF(256000), the SO_SNDBUF(256000),
and SO_RCVTIMEO(100us). The TCP channel is polled.
The sender and receiver each use 2 threads--1 thread to keep track
of timers and send/receive control and data packets while the
main thread does the file I/O. While control packets are being
processed, sending/receiving of data is stopped until the processing
is finished.
The TCP control channel makes possible reliable, continuous
updating of state information which enables the rate-control algorithm
to be adaptive to the present state of the network. The expiration of
timers generates control events. The receiver keeps three timers:
m_iACKInterval(100000)
m_iERRInterval(20000)
m_iSYNInterval(200000)
and the sender keeps one:
m_iEXPInterval(1000000)
TCP control packets:
SC_ACK: notifies the sender that all packets prior to the seqno in
m_lAttr have been successfully received
m_lAttr = the highest seqno received so far if there is
no loss
m_lAttr = the smallest lost seqno if there is loss
SC_ERR: notifies the sender of packets missing at the receiver
to be added to a CLossList and retransmitted
m_lAttr = the total number of lost packets (max 367)
m_plData = the loss list
SC_EXP: pseudo packet generated and processed by the sender upon
expiration of a timer if no ACK or ERR packet
has been received during a specified interval(1000000)
SC_SYN: signal for the sender to recalculate the inter-packet delay
m_lAttr = number of packets received since the last SC_SYN
packet
2. rate-control algorithm
The interval between packets starts out at a pre-set value of 10 us.
Upon the receipt of each SYN packet from the receiver, the IPD
(inter-packet delay) is calculated by functions roughly
similar to those described in section 3.6 of
SABUL: A High Performance Data Transfer Protocol .
The calculation uses the number of packets reported as lost since the
last SYN time and the the number of packets sent since the last SYN
event.
currlossrate = m_lLocalERR / double(m_lLocalSend)
This current loss rate is then used as input to a weighted moving
average formula to calculate a new sending rate.
Upper and lower bounds are pre-set and SABUL attempts to keep the
IPD between these two boundaries. (This is a change from the
previous release which attempted to keep the history loss rate between
upper and lower boundaries)
Initial values:
m_dWeight=0.1
m_dLossRateLimit=0.001
m_dInterval = 10.0
m_dLowerLimit = 1000.0
m_dUpperLimit = 10.0
m_dHistoryRate = 0.0
m_dHistoryRate = m_dHistoryRate * m_dWeight + currlossrate * (1 - m_dWeight);
if (m_dHistoryRate > m_dLossRateLimit)
m_dInterval = m_dInterval * (1. + 0.1 * (m_dHistoryRate - m_dLossRateLimit)) + 0.5;
else if (m_dHistoryRate < m_dLossRateLimit)
m_dInterval = m_dInterval * (1. + 10.0 * (m_dHistoryRate - m_dLossRateLimit));
else
m_dInterval += 0.1;
/* make sure the m_dInterval is still in range */
if (m_dInterval > m_dLowerLimit)
m_dInterval = m_dLowerLimit;
else if (m_dInterval < m_dUpperLimit)
m_dInterval = m_dUpperLimit;
The sending interval is also increased if ERR packets report what
SABUL considers "too much loss".
The following sending/receiving algorithms are contained in
SABUL: A High Performance Data Transfer Protocol.
3. data sending algorithm
The sender manages a data buffer(40MB) and a retransmission queue
which is a linked list of the sequence numbers of lost packets from
ERR feedbacks and EXP events.
The main thread reads the file into block-sized(7.2MB) buffers and
inserts each buffer into a linked list ready to be sent out.
The sender's two main functions are to send out data packets waiting
an IPD(inter-packet delay) between each packet and to process the
feedback from the receiver.
The sending algorithm is:
- initialize
- get current time
- poll the tcp channel
- receive a control packet and process it, if any
- otherwise, if the interval since the last ACK or ERR packet
was received is greater than an expiration threshold, generate
an EXP packet and process it (an EXP event means the sender
assumes all packets since the last ACK are lost and adds
these sequence numbers to the loss queue)
- if the loss queue is not empty, retransmit the proper packet,
remove the sequence number from the queue and then go to step 6
- if there is more data to send, send a new packet and update snd_nxt
- spin until time to send the next packet then go to step 2
4. data receiving algorithm
The receiver manages a buffer of size 40 MB containing received packets, a linked
list of lost sequence numbers, and a flag array kept for packet
reordering/loss detection.
The receiver uses "readv" to receive packets, putting them according
to sequence number into its own buffer or an application buffer.
The three types of control packets generated by the receiver--ACK,
SYN, and ERR-- are sent based on their own timers or conditions as
noted above.
The receiving algorithm is:
- initialize
- generate a control packet
- get current time
- generate an ACK packet if an "ack interval" has passed since the last
ACK event or the receiving buffer is full
- generate an ERR packet if an "err interval" has passed since the last
ERR event and the loss list is not empty
- generate a SYN packet if a "syn interval" has passed since the last
SYN event
- start a timed recv()--on expiration, go to step 2
- read the sequence number in the data packet and calculate the offset
since the last acked sequence number
- if the offset is greater than or equal to the size of the sequence
number array, generate an ACK packet and go to step 2
- if the offset is less than 0, go to step 2
- if the offset is greater than that expected, update the loss list and
generate an ERR packet
- update the next expected offset number
- update the largest received sequence number
- update the flag array and go to step 2
Graphs illustrating a representative data transfer over NistNet using SABUL
can be found here
5. unique features
***Use of rdtsc() for timing:
Rather than always calling gettimeofday(), Sabul converts
time into cpu cycles and uses the time-stamp counter to
implement rate control for data packets and the generation of
control events/packets by the sender and receiver on a periodic basis.
***Control packets are generated by timers
In most of the UDP protocols, control packets are generated after
sending/receiving a specified amount of data. In Sabul, ACK, ERR
and SYN packets are generated each according to its own timer.
In addition, the receiver generates an ERR packet as soon as a missing
packet is detected and an ACK packet can be generated when the end of
of a user buffer is reached. This allows for a TCP-like response to
network conditions.
***sender generation of its own expiration event(EXP)
The sender generates and then processes an EXP event if it has not
received an ACK or ERR message from the receiver within a preset
interval of time. The sender assumes all packets since the last
ACK have been lost and inserts these sequence numbers in its loss
queue. The sending rate is NOT recalculated at this time however.