ORNL WAD summary 8/11/03 Production version: written in C uses Web100 API web100_var_find() web100_attach() web100_group_find() web100_connection_lookup() walk /proc/web100 operates in polling mode if non-su (0.1 second), otherwise uses netlink kernel event notification on connection open/close utilizes a config file (see below) option to split buffers among flows (parallel streams) has builtin stop-list of ports to ignore (e.g., ssh) have polled dynamic tuning option of Floyd AIM, uses thread/flow to tune AIMD based on cwnd (deprecated by kernel implementation) no logging (see other tools below) earlier version would periodically fetch path bandwidth variables from NTAF and use for tuning (buggy, no longer included) distributed to 'bout 6 other researchers Toy version: based on LBL's python poll or su-event walks /proc/web100/ config file Other tools: tracer.py and traced.c -- polling, collect plaintext, space-separated web100 variables for plots etc. -------- config file -------- # Configuration file for WAD # # # [Destination Hostname] (separates entries) # src_addr: 0.0.0.0/32(localhost/any) (could bind to a particular # i'face) # src_port: 0 (any) # dst_addr: 128.55.128.74/32 (destination address--can be a host # as shown--or a network: 160.91.194.0/24) # *note: destinations are kept in config file order so, in general, # list hosts before networks(or connections will be matched # with their network before reaching their host entry) # dst_port: 0 (any) # mode: 1 (0-wad does not tune-- # web100 auto-tuning turned on for all flows to this # destination: X_SBufMode & X_RBufMode set to 1 # 1-wad tunes using buffer sizes from config file-- # web100 auto-tuning turned off for all flows to this # destination: X_SBufMode & X_RBufMode set to 0 # 2-wad tunes using dynamic buffer sizes from NTAF # database as available--(no longer used) # web100 auto-tuning turned off ) # sndbuf: 2000000 (in bytes, send buffer to adjust to) # rcvbuf: 2000000 (in bytes, receive buffer to adjust to) # wadai: 6 (additive increase factor) # wadmd: .3 (multiplicative decrease factor) # maxssth: 100 (slow start change by Sally Floyd) # kai: 0.0 (Scalable TCP--additive increase) # wad_ifq: 1 (1=disable enter_cwr when IF queue is full # 0=do not disable) # divide: 1 (1 yes, 0 no, divide buffer amt between all # flows to this destination/allocate total # buffer amount to each flow) # floyd: 2 (0 set aimd from values in config file(if > 0) # in wad at start of flow only--no calculation # 1 do floyd aimd in kernel(set WAD_FloydAIMD=1) # 2 calc aimd in wad continuously based on cwnd # 3 calc aimd in wad at beginning of flow only-- # recalc only if bufferspace reallocated [firebird.ccs.ornl.gov] src_addr: 0.0.0.0/32 src_port: 0 dst_addr: 160.91.192.165/24 dst_port: 0 mode: 1 sndbuf: 94894680 rcvbuf: 8365440 wadai: 6 wadmd: .3 maxssth: 100 kai: 0.02 wad_ifq: 1 divide: 1 floyd: 2 [kirana.psc.edu] src_addr: 0.0.0.0/32 dst_addr: 192.88.115.178/24 dst_port: 0 mode: 1 sndbuf: 2938380 rcvbuf: 5085880 wadai: 6 wadmd: .3 maxssth: 127 kai: 0.02 wad_ifq: 1 divide: 1 floyd: 2 [net100.lbl.gov] src_addr: 0.0.0.0/32 src_port: 0 dst_addr: 131.243.2.93/32 dst_port: 0 mode: 1 sndbuf: 2813940 rcvbuf: 4052159 wadai: 6 wadmd: .3 maxssth: 100 kai: 0.02 wad_ifq: 1 divide: 1 floyd: 2 --------- HOW WAD WORKS ----------------- wad is a network daemon - when wad starts up, it * reads a configuration file--wad.conf which contains destinations to be monitored along with their corresponding optimum buffer sizes and other tuning variables--and stores this information in a linked list of Dest structures. (Send the SIGHUP signal to the process to make wad re-read the file and update the information--adds/changes) * sets up web100 structures--web100_agent, web100_group, web100_var-- in order to gain access to the web100 variables to be tuned * if any of the destinations in wad.conf have floyd set to 2-- a thread is created to periodically check the linked list of destinations and when a destination has floyd set to 2 (do floyd aimd calculations every .1 second), then for each active flow to this destination, read CurrentCwnd from the /proc/web100/cid/read file and, if the value has changed so that a new additive increase/multiplicative decrease is needed, retrieve values for WAD_AI and WAD_MD from a table and write them to the /proc/web100/cid/tune file variables. * (deprecated)if any of the destinations in wad.conf have mode set to 2, wad creates a thread which alternates sleeping naptime seconds and sending xmlrpc requests to the net100 xmlrpc servers to obtain the most recent values of the optimum buffer sizes for the paths between those hosts and localhost. Due to instability of the xmlrpc servers, fault tolerance was built in as follows: if an error is returned from an xmlrpc call, that host is marked as down and a counter started. During a predetermined period of time, no calls are made to that host. Once the wait time has elapsed, xmlrpc calls to that host are again attempted. wad is notified of beginning and ending connections in 2 ways: - notification via netlink when run as root: * the daemon issues a blocking read on a netlink socket waiting for an event: 0 means a tcp connection has started up 1 means a tcp connection has closed and its associated cid (connection id) for that connection - notification via /proc/web100 when NOT run as root: * periodically the daemon walks the directory tree comparing connections found with those in a linked list of active cids. Those not found in the list are added as new connections. Those found are marked as still active and, after the walk, those not marked are assumed closed and are removed. The "period" may be set as command line argument but otherwise has a default of .1 second. Upon notification that a new tcp connection has started up, wad: - sets up a new web100_connection structure - adds the connection to a linked list of active cids - checks this connection against those it read from the configuration file (wad.conf) and if there is a match and the local/remote port is not on a "STOP LIST" (list includes well known ports such as 22-ssh, 80-http, etc.), it is linked into a queue for the matched destination. * Then: if a match was found and if the connection is still in an established state and if mode > 0: - linux auto-tuning is set off - wad tunes the connection using the values stored in matching Dest structure and the web100 interface. If the Dest structure has "divide" set on, the buffer space from the config file is divided up between flows as they start and reallocated among the remaining flows as connections close. * Otherwise, linux auto-tuning is set on and wad does not tune the flows to this destination Upon notification that a new tcp connection has ended, wad: * locates the active_cid for this connection and removes it from the linked list * if needed, removes it from the Dest linked list * free the active_cid structure associated with this cid WAD structures CidRoot struct active_cid _______________________________ | int cid; |__ | unsigned int stage; | |__ | unsigned int sbuf, rbuf; | | |__ | int up; | | | | | int cwnd_ndx; | | | | | struct active_cid *next_cid; --->linked list of all active cids | struct active_cid *same_dest;--->linked list of cids to same destination | struct Dest *cid_dest; --->pointer to Dest struct for this cid |_____________________________| | | | |_____________________________| | | |_____________________________| | |_____________________________| DestRoot struct Dest ___________________________________ | u_int32_t localadd, remoteadd; |__ | u_int32_t localmsk, remotemsk; | |__ | u_int16_t localport, remoteport;| | |__ | unsigned int sbuf, rbuf; | | | | | int mode; | | | | | int wad_ai; | | | | | int wad_md; | | | | | int wad_maxssthresh; | | | | | int wad_kaicnt; | | | | | int wad_ifq; | | | | | int divide; | | | | | int floyd; | | | | | int nconnects; | | | | | struct active_cid *qfront; --->pointer to cid queue front | struct active_cid *qback; --->and cid queue back for this destination | struct Dest *next_dest; --->linked list of connections as read | | | | | from wad.conf |_________________________________| | | | |_________________________________| | | |_________________________________| | |_________________________________|