Re: [OpenSTA-users] Performance testing with Open STA

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

> <Chris writes>
>
> Hi there,
> Thanks for your response, so does this mean that running the test for 10
> users simultaneously should not cause time out errors. The developers
> think it may be because it is an unrealistic test i.e. In the real
> world, I don't think we will have 10 users running the same test and
> performing the same action at the same time for an hour, I think it is
> probably best to ramp the test up such that 1 user is added every 30
> seconds with a 10 second delay until I get to the maximum number of
> users, do you think this is a more realistic test?
>
> Although running the same test now with 10 simultaneous users even
> though it displays timeout errors it still creates record in the
> database.
>

Chris,

We've certainly left the realm of OpenSTA related questions and moved into a 
discussion of performance testing. Its a slow day, I'll bite.

There are three major areas of performance testing. Different people use 
different terminology, so you'll have to put up with mine understanding that 
it might not jive completely with what others say. Still, its the goal of 
the testing that is important, not what you call it.

If your goal is to do CAPACITY PLANNING, the you should create a "realistic" 
workload. A mix of the most popular transactions plus those deemed critical 
presented to the server(s) under test in a realistic fashion. This is easy 
to say, and I've seen 3 day seminars and countless books dedicated to how to 
do this "correctly". For the most part this boils down to picking a 
manageable (in terms of time to develop vs. budget, goals, etc) set of 
transactions to emulate, determining the % probability of executing each 
transaction, and the overall arrival rate, and also the "success criteria" 
for the transactions (i.e. response time limits, throughput goals, etc). 
Collectively, I'll refer to these attributes as the "workload definition". 
One implement a given workload description is to create a master script 
which is assigned to each VU and have it generate random numbers and then 
call other scripts (that model the workload transactions) based on a table 
of probabilities. The scripts should modeled with think times consistent 
with the way your users will interact with the system. This varies greatly 
from one app to another and unless you are mining logs from an application 
already in use, is somewhat subjective. The best advice I can give is be 
conservative, but no so much so that the sum of all your conservative 
decisions is pathological.

Once you have a workload that has pacing (think times) you are comfortable 
with, then increase the number of users and monitor how response times, 
server resource utilization (CPU, IO rate, Network, and memory), and 
throughput (number of tasks completed system wide) vary with the increased 
load. You might set up your test so you ramp up to a specific number of 
users, then let them run for a while, and repeat as necessary. This way, you 
capture the behavior of the system various steady states. The length of time 
to allow a particular number of users to run varies with a number of factors 
including how different the transactions are from one another in terms of 
resource utilization and response time. If you can't get repeatable results, 
your steady state interval might be too small. I've seen intervals as small 
as 10 minutes work and other workloads that require an interval of hours to 
useful.

That's a rough outline of one approach to capacity planning which in summary 
is an attempt to load up the system with VUs in a way that a VU is 
indistinguishable from a "real user". Again, much easier said then done. 
Pick the wrong workload, and your results might be worthless. The end game 
here is to increase load until response times become excessive (whatever 
that means to you, but it needs to be defined.. again, tons of material to 
read about this) at which point you have found a limit to system capacity. 
This limit will be due to either a hardware or software bottleneck. Now, if 
you are on a tuning expedition, then analyze the performance metrics 
captured and either do some tuning, code optimization, or add some hardware 
resources and repeat as necessary until you either meet throughput goals, 
find the limits to the architecture, or run out of time (happens more then 
most performance engineers would like).

The same scripts can be used for SOAK TESTING, where you load up the system 
at close to it's maximum capacity and let it run for hours, days, etc. This 
is a great way to spot stability problems that only occur after the system 
has been running a long time (memory leaks are a good example of things you 
will find).

Run a long test and start failing components (servers, routers, etc) to see 
how response times are effected and how long the system takes to return to a 
steady state and you are on your way towards FAILOVER TESTING. You can find 
reams of material to read about failover testing and high availability as 
well.

If you goals is to determine where or how the system will fail, then you are 
doing STRESS TESTING. One way to do this is to comment out the think times 
and increase VUs until something  (hopefully not your emulator!) breaks. 
This is just one form of stress testing, a valuable aspect of performance 
testing, but not the same as capacity planning. How the VUs compare to "real 
users" may be irrelevant as you are trying to determine how the system 
behaves when pushed past its limits.

So I guess only you can answer your question. Decide what your goals are 
(capacity planning, stability testing, failover testing, or stress testing) 
and then see if your script and test behavior is aligned with the goal(s).

-Bernie

www.iPerformax.com