[OpenSTA-users] Severe load ramp-up limitation encountered

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Hi opensta gurus (you know who you are):

After several years of using opensta successfully on
dozens of customer load tests, in the last 2 weeks I have attempted two
projects with aggressive load ramp-ups where opensta flattened my hefty l=
oad
server and reported *unrealistically high* response times even at low loa=
d --
and thus failed miserably (to great embarrassment with the customer--we
eventually accomplished this using LR).

I seek your advice as to whether (a) there is/are tool or
system config settings I need to make, (b) there is something silly I hav=
e
simply missed, (c) there a bug lurking I should document and report, (d) I=
 have
hit an architectural limitation of the tool, (e) other.=A0 Your advice is=
 appreciated.

First failed project specifics (second one similar):

The Goal:=A0 Test the
response rate of Apache on each of two Sun servers (V440 - 16GB, 2 CPUs &=

V240 - 8GB, 2 CPUs).

The Script:=A0 Could
not be simpler -- a single page with nothing more than the customer's
logo.=A0 Here it is:

PRIMARY GET URI
"http://syn1.sellpoint.net/QA/smloadtest.html HTTP/1.0" ON 1=A0=A0 HEADER=

DEFAULT_HEADERS ,WITH {"{the usual stuff here"}

4 innocuous GET URIs follow.

The Test:=A0 Two load
servers (dual 1Ghz P4s, 1 GB memory).=A0 On
each, test had 20 Task Groups with this one script, in v-user "burst"
groups of 50, 100, 150,...1000, doing a single iteration.=A0 Each group t=
o be launch 3 minutes apart
(allowing the web server to settle from the previous burst).=A0=A0 Ramp r=
ate of 300 users per minute (batch
settings:=A0 1/5/1 (interval btw
batches/vusers per batch/batch ramp time).

The result:=A0 (1)
Test appeared to start up and run smoothly, until Group 8 (400-user burst=
)
launched--at which point Error Log reports "Failed processing for TOF
record for script line 59" (the PRIMARY GET); and what's worse, the Timer=

shows times at 50 users of about 2 seconds, rapidly climbing to 3.5 secon=
ds at
100, to 10 seconds at 200, 20 seconds at 200, 30 seconds at 350...and so f=
orth
all the way to 60 seconds.

These timer results were proven "false" when
(a) customer (and us) could hit the page manually and get <5 seconds
response at something under 1000 v-users, and (b) subsequent test with LR=

showed these times to range from .7 seconds at just under 150 users, 5 se=
conds
at 1000 users, and up to 9 seconds at 2000 users.

Furthermore, perfmon showed that out dual-cpu server
shoots up to 100% utilization when Task Group8 (400 users) starts--and st=
ays
there; and memory pages/sec. shoots up from an average of 8 to around 50,=
 which
spikes to 400 (I have a screen shot that shows this if want to see it).

Questions:=A0 (1) Why
is the memory paging rate so strongly affected? (2) What other perfmon co=
unters
should I monitor to further diagnose the issue (I look at many including
System/threads & processor queue length, Memory/MB available, Physical
Disk/& Disk Time & Avg Queue length, Network Interface/Total
Bytes/sec,... and could not find any other counters "out of kilter");
(3) Is there a known limitation of opensta's overhead in allocating and
assigning threads that puts an upper limit to the ramp rate it can handle=
?

Thoughts:=A0 I
thought of implementing a Rendezvous (as per FAQ) to initialize all v-use=
rs
before launching them to the main app url -- thinking that this may overc=
ome
the 'startup overhead' that may be infecting the response results -- but h=
ave
not tried this yet.

Any insights to this apparent limitation, suggestions
about what else to do to further diagnose, or requests for more detail, g=
reatly
appreciated.

I believe understanding of and/or resolution of this
apparent 'load-driving bottleneck' to be of pivotal interest to all serio=
us
opensta users.=20

=20

=2E..Dan
=20

Dan Downing
=20

www.mentora.com
=20