Parallel TCP

Parallel advantages

slow-start is faster, like a k x virtual MSS
can work around OS limitations on max TCP buffer sizes, k x buffer size
recovery is faster compared to single stream with giant window, because
- target recovery point is k x smaller, even if all k streams experience loss
- only one stream may experience loss, so multiplicative decrease is effectively 1/(2k) rather than 1/2.
- only one stream would may experience timeout
k-control loops instead of one
for single stream throughput is proportional to 1/p**2, where p is the loss probability, for k streams loss rate can be k*p higher
may avoid slow-start burst losses of single giant flow and generally distributes the packet flow more evenly over a RTT (maybe ?)

Parallel disadvantages

requires changes to the application to support parallel streams
may perform worse if loss is due to congestion
may add to congestion
selecting the buffer size and number of streams is problematic
exploits TCP's fairness, so may be unfair to other flows

Case studies

Parallel Flows: slow start and recovery
Parallel flows can speedup both slow start and recovery. The effect of k parallel flows is almost like having a single flow with a virtual MSS of k times the default MSS. So the initial slow start begins with k segments instead of one. If a flow experiences a loss, then the recovery rate is k segments per RTT rather than one (if all the streams are in linear recovery). If only one flow experiences a loss then, multiplicative decrease is 1/(2k) rather than 1/2. The following plot illustrates the congestion window for each of 4 parallel streams as well as the aggregate.

All 4 streams experience a loss in slow start in the first few seconds, the effective multiplicative decrease for the aggregate is 1/2, but the additive increase for the aggregate is 4x that of standard TCP. At time 23, one of the streams experiences a loss. The effect on the aggregate is not to cut the aggregate window by 1/2, but only by 1/8.

Parallel Flows and fairness
TCP is very good about sharing the bandwidth between users. The following graph (ns-2 simulation) shows Bob transferring data at the link bandwidth speed. After 10 seconds, Alice starts her transfer, and TCP splits the bandwidth between Alice and Bob. After 50 seconds, Alice is done, and TCP lets Bob have all the bandwidth.

In the next two plots, we see Alice understands that TCP splits the bandwidth among all the flows. In the next figure, Alice starts up 3 parallel TCP streams, and now Bob ends up with only 25% of the bandwidth. The final figure shows an even greedier Alice starting 9 flows. Poor Bob is left with under 10% of the bandwidth, but Alice is happy.

The following plot shows a series of tests using 8 parallel streams and one WAD-tuned AIMD (0.06,8) stream from ORNL to LBNL. The single stream uses 4MB buffers and the AIMD values are selected to be "equivalent" to 8 parallel streams (see multcp). Differing buffer sizes are used for each of the 8-stream tests. We are using our WAD to explore various ways of tuning the buffer sizes for parallel streams. The optiumum number of streams and their buffer sizes is an open research question.

streams Mbs buffer cong events rexmits 1 140 4M 8 412 AIMD (0.06,8) 8 224 250K 8 38 8 272 500K 20 201 8 288 1M 22 148 8 141 2M 30 159

SLAC's Parallel Flow tests
SLAC's Cottrell has some perl scripts to drive iperf, testing window sizes with various number of parallel streams. Les's 9/26/01 results are from SLAC (OC3) to sunbirdj at ORNL. Below is a similar plot from later the same day from sunbird to NERSC.

Here are some earlier parallel streams results.

parallel flows papers and links

Last Modified thd@ornl.gov (touches: )
back to Tom Dunigan's page or the Net100 page or the ORNL home page