Re: URL cycling with staggered URLs
Status: Alpha
Brought to you by:
coroberti
From: Robert I. <cor...@gm...> - 2007-06-24 11:54:35
|
Hi BuraphaLinux Server, Thank you for using the PRF. On 6/24/07, BuraphaLinux Server <bur...@gm...> wrote: > CURL-LOADER VERSION: 0.32, released 21/06/2007 > > HW DETAILS: CPU/S and memory are must: > processor : 0 > MemTotal: 1030596 kB > LINUX DISTRIBUTION and KERNEL (uname -r): > BLS 1.0.072 (http://www.buraphalinux.org/) > 2.6.21.5 Interesting, I'll look into this distro. > GCC VERSION (gcc -v): > gcc version 4.0.4 > COMPILATION AND MAKING OPTIONS (if defaults changed): > I had to apply this patch: > -LIBS= -ldl -lpthread -lrt -lidn -lcurl -levent -lz -lssl -lcrypto #-lcares > +LIBS= -ldl -lpthread -lrt -lcurl -levent -lz -lssl -lcrypto #-lcares -lidn OK > curl-loader -f monster.conf -v -u > > CONFIGURATION-FILE (The most common source of problems): > > Place the file inline here: > ########### GENERAL SECTION ################################ > > BATCH_NAME= monster > CLIENTS_NUM_MAX=50 # Same as CLIENTS_NUM > CLIENTS_NUM_START=10 > CLIENTS_RAMPUP_INC=10 > INTERFACE=eth0 > NETMASK=32 > IP_ADDR_MIN=10.16.68.197 > IP_ADDR_MAX=10.16.68.197 > CYCLES_NUM=-1 > URLS_NUM=6 > > ########### URL SECTION #################################### > > URL=http://10.16.68.186/ftp/openoffice/stable/2.2.1/OOo_2.2.1_Win32Intel_install_wJRE_en-US.exe > FRESH_CONNECT=1 > URL_SHORT_NAME="url 1" > REQUEST_TYPE=GET > TIMER_URL_COMPLETION = 0 # In msec. When positive, Now it is enforced > by cancelling url fetch on timeout > TIMER_AFTER_URL_SLEEP =1000 > TIMER_TCP_CONN_SETUP=50 > > URL=ftp://anonymous:joe%040@10.16.68.186/debian/pool/main/g/gimp/gimp_2.2.15.orig.tar.gz > FRESH_CONNECT=1 > URL_SHORT_NAME="url 2" > TIMER_URL_COMPLETION = 0 # In msec. When positive, Now it is enforced > by cancelling url fetch on timeout > TIMER_AFTER_URL_SLEEP =1000 > TIMER_TCP_CONN_SETUP=50 > > URL=http://10.16.68.186/ftp/ruby/1.8/ruby-1.8.6.tar.bz2 > FRESH_CONNECT=1 > URL_SHORT_NAME="url 3" > REQUEST_TYPE=GET > TIMER_URL_COMPLETION = 0 # In msec. When positive, Now it is enforced > by cancelling url fetch on timeout > TIMER_AFTER_URL_SLEEP =1000 > TIMER_TCP_CONN_SETUP=50 > URL=ftp://anonymous:joe%040@10.16.68.186/apache/ant/binaries/apache-ant-1.7.0-bin.tar.bz2 > FRESH_CONNECT=1 > URL_SHORT_NAME="url 4" > TIMER_URL_COMPLETION = 0 # In msec. When positive, Now it is enforced > by cancelling url fetch on timeout > TIMER_AFTER_URL_SLEEP =1000 > TIMER_TCP_CONN_SETUP=50 > > URL=http://10.16.68.186/ftp/ftp.postgresql.org/postgresql-8.2.4.tar.bz2 > FRESH_CONNECT=1 > URL_SHORT_NAME="url 5" > REQUEST_TYPE=GET > TIMER_URL_COMPLETION = 0 # In msec. When positive, Now it is enforced > by cancelling url fetch on timeout > TIMER_AFTER_URL_SLEEP =1000 > TIMER_TCP_CONN_SETUP=50 I do not believe, that TCP handshake and resolving are taking up to 50 seconds and this is OK. > URL=ftp://anonymous:joe%040@10.16.68.186/apache/httpd/httpd-2.2.4.tar.bz2 > FRESH_CONNECT=1 > URL_SHORT_NAME="url 6" > TIMER_URL_COMPLETION = 0 # In msec. When positive, Now it is enforced > by cancelling url fetch on timeout > TIMER_AFTER_URL_SLEEP =1000 > TIMER_TCP_CONN_SETUP=50 > > DESCRIPTION: > I have noticed the disk drive on my server is not active much during testing > with curl-loader. I looked at the curl-loader log file and I think I > know what is happening, but not how to change it. Let me describe > what I think it is doing, and then what I would like it to do. > > What do I think it is doing now? > If I cycle through N URLs with 100 clients, the curl-loader will setup > all 100 clients to process the first URL, then it has them all do the > second URL, then it has them all do the third URL, etc. This means > that all clients are normally fetching the same file (I am using 100MB > files for testing). This means that I am testing networking but all > clients are pulling the same file so all but one of them are just > pulling from > the cached copy. It also stresses either http or ftp (whatever the current > URL is) but not both. Am I wrong? Correct. > > QUESTION/ SUGGESTION/ PATCH: > > What I want > If I have N URLs and many clients, I would like curl-loader to > if (process % N) == 0 then start on URL 0 > if (process % N) == 1 then start on URL 1 > if (process % N) == 2 then start on URL 2 > (and then if I have more processes than URLs, wrap back to URL 0 > when I reach URL N-1) > > Why do I want this? > This means that if I have a large set of URLs (too big for server > file cache), > I can force the server to work hard at loading files from disk and get > a more realisitic load for my server (which will be a mirror archive that > I expect many people to use as their mirror source). This means that > normally the file a client wants is probably NOT in cache, and with > a collection of ISO images the filesystem cache will not be able to > hold everything and the disk will be busy. > > Can curl-loader do this already? Y can create 3 conf files with different BATCH_NAME and run from 3 consoles 3 loads with each one having another sorting of the urls. Not a very convenient way - agree. > My workaround is to run many separate instances of curl-loader at once > instead of one large, combined load, but then all the statistics > and logs are separate. Exactly. We have the two features in our RoadMap/TODO list: http://curl-loader.svn.sourceforge.net/viewvc/curl-loader/trunk/curl-loader/TODO?view=markup 2. An option to download a url not only once a cycle, but according to its "Weight", probability. Weight can be less than 1, e.g. 0.3, which means, that there is a 30 % probability, that a client will load this url. Weight 3 means, that there is 300 % probability, that a url will be fetched, which practically means, that a url will be fetched several times by clients prior to going to next and with an average of fetching it 3-times. 11. Usage of random time intervals, e.g 100-200 (from 100 to 200 msec); Thus, the clients will be less synchronized and after a couple cycles will be de-focussed. By the way can also achieve this in part by starting from a single client and placing a adding a client per second to de-focus them. CLIENTS_NUM_START=1 CLIENTS_RAMPUP_INC=1 It looks like with your 50 clients it could be done well (or well-done). I do not know, when we'll get to add these 2 feature, which are not complex. If you wish to volunteer and add them, please, let me know to guide you in our code. -- Sincerely, Robert Iakobashvili, coroberti %x40 gmail %x2e com ........................................................... http://curl-loader.sourceforge.net A web testing and traffic generation tool. |