[OpenSTA-users] Severe load ramp-up limitation encountered
Brought to you by:
dansut
|
From: Dan D. <ddo...@me...> - 2007-07-12 18:04:35
|
Hi opensta gurus (you know who you are): After several years of using opensta successfully on dozens of customer load tests, in the last 2 weeks I have attempted two projects with aggressive load ramp-ups where opensta flattened my hefty l= oad server and reported *unrealistically high* response times even at low loa= d -- and thus failed miserably (to great embarrassment with the customer--we eventually accomplished this using LR). I seek your advice as to whether (a) there is/are tool or system config settings I need to make, (b) there is something silly I hav= e simply missed, (c) there a bug lurking I should document and report, (d) I= have hit an architectural limitation of the tool, (e) other.=A0 Your advice is= appreciated. First failed project specifics (second one similar): The Goal:=A0 Test the response rate of Apache on each of two Sun servers (V440 - 16GB, 2 CPUs &= V240 - 8GB, 2 CPUs). The Script:=A0 Could not be simpler -- a single page with nothing more than the customer's logo.=A0 Here it is: PRIMARY GET URI "http://syn1.sellpoint.net/QA/smloadtest.html HTTP/1.0" ON 1=A0=A0 HEADER= DEFAULT_HEADERS ,WITH {"{the usual stuff here"} 4 innocuous GET URIs follow. The Test:=A0 Two load servers (dual 1Ghz P4s, 1 GB memory).=A0 On each, test had 20 Task Groups with this one script, in v-user "burst" groups of 50, 100, 150,...1000, doing a single iteration.=A0 Each group t= o be launch 3 minutes apart (allowing the web server to settle from the previous burst).=A0=A0 Ramp r= ate of 300 users per minute (batch settings:=A0 1/5/1 (interval btw batches/vusers per batch/batch ramp time). The result:=A0 (1) Test appeared to start up and run smoothly, until Group 8 (400-user burst= ) launched--at which point Error Log reports "Failed processing for TOF record for script line 59" (the PRIMARY GET); and what's worse, the Timer= shows times at 50 users of about 2 seconds, rapidly climbing to 3.5 secon= ds at 100, to 10 seconds at 200, 20 seconds at 200, 30 seconds at 350...and so f= orth all the way to 60 seconds. These timer results were proven "false" when (a) customer (and us) could hit the page manually and get <5 seconds response at something under 1000 v-users, and (b) subsequent test with LR= showed these times to range from .7 seconds at just under 150 users, 5 se= conds at 1000 users, and up to 9 seconds at 2000 users. Furthermore, perfmon showed that out dual-cpu server shoots up to 100% utilization when Task Group8 (400 users) starts--and st= ays there; and memory pages/sec. shoots up from an average of 8 to around 50,= which spikes to 400 (I have a screen shot that shows this if want to see it). Questions:=A0 (1) Why is the memory paging rate so strongly affected? (2) What other perfmon co= unters should I monitor to further diagnose the issue (I look at many including System/threads & processor queue length, Memory/MB available, Physical Disk/& Disk Time & Avg Queue length, Network Interface/Total Bytes/sec,... and could not find any other counters "out of kilter"); (3) Is there a known limitation of opensta's overhead in allocating and assigning threads that puts an upper limit to the ramp rate it can handle= ? Thoughts:=A0 I thought of implementing a Rendezvous (as per FAQ) to initialize all v-use= rs before launching them to the main app url -- thinking that this may overc= ome the 'startup overhead' that may be infecting the response results -- but h= ave not tried this yet. Any insights to this apparent limitation, suggestions about what else to do to further diagnose, or requests for more detail, g= reatly appreciated. I believe understanding of and/or resolution of this apparent 'load-driving bottleneck' to be of pivotal interest to all serio= us opensta users.=20 =20 =2E..Dan =20 Dan Downing =20 www.mentora.com =20 |