Re: [PerfSuite-users] monitoring the same process

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

I'll defer to Rick, but I think that the perfsuite tools use statistical 
sampling, and thus especially over short time courses, there will be 
significant differences in reporting just because of the sampling interval.  
Over infrequently used routines in some profiling I've done recently, the 
values varies over 30% on identical runs (of ~10s of CPU time).

The following is perhaps slightly off-topic, but it does a lot to explain what 
profilers will give you and what they cannot and why:

Rob Fowler <rj...@ri...> kindly wrote in response to a similar query from me:
=====
I haven't tried to digest all of the details, but things like this are the
reason why we tend to emphasize loop-level analysis and to take 
statement-level
numbers with a grain of salt.

There are three primary contributions for  phenomena like this:

* First, optimizing compilers will be aggressively rearranging the code. The
generated code is a shuffle of instructions from different statements.

* Second, remember that the performance of modern CPUs depends on a lot of 
instruction level parallelism and there can be dozens of instructions "in
flight" at a time and instructions can be issued and completed out of order.

* Third, when an event occurs, processors tend to be sloppy w.r.t. the
attribution of the event to a specific instruction. The reported program 
counter
for any one event is subject to "skew and smear", i.e. it's likely to be
attributed to some nearby instruction that is currently in the pipeline.
For example, it might be the most recent instruction to enter the pipeline.
Thus,if you look at the instruction level, you can see seemingly nonsensical
stuff like "loads" being charged to floating point instructions, visa versa,
etc.

These three components all contribute to making instruction and statement 
level counts imprecise.  On the other hand, averaged over hundreds of 
instructions, the aggregate numbers are very stable and reliable.

**  A brief religious statement:  On encountering the attribution problem for
deeply pipelined, out-of-order processors, some architects have just chosen
to ignore it.  Others, i.e., the Alpha architects, abandoned
conventional event counts in favor of other mechanisms.  Still others have
risen to the challenge and have implemented clever and relatively expensive
mechanisms to restore precise attribution, i.e., Power 5.  Since performance
issues on high-ILP machines are not a matter of any single instructions, 
rather
"how they play with a few dozen of their closest friends",  I believe that
while precise attribution may be an admirable goal, it is not necessary and
not worth paying a lot for, at least not for the kinds of analyses we do and
certainly not for tools that use coarse-grain calipers for measurement.

A recurring scenario that we've run into is a loop in which, say, 80% of the
cost (or other measures ) has been attributed to a single statement.  The
developer sees a big potential for improvement and rearranges the code to try
to get a big win by reducing the cost of that one statement.  The
confusing resultis that the overall cost is unchanged, but now a different 
statement gets charged the 80%.

I hope this helps.

-- Rob

On Tuesday 10 January 2006 05:58, Rick Kufrin wrote:
> On Tue, 10 Jan 2006, Giuseppe Grieco wrote:
> > I would like to know why it happens that monitoring the same process,
> > the number of total floating point operations changes.
> > I guess it should remain the same. Is it?
> > Thanks, Giuseppe
>
> Giuseppe,
>
> I think it depends on a number of factors.  If you are using default
> configuration files for PerfSuite, remember that multiplexing (timesharing
> of the available performance counter registers) will be occurring.  By
> nature this results in estimates of the true number of event occurrences
> and will be inexact, so the counts can be expected to vary from run to
> run.
>
> Also, depending on the CPU type and the underlying access method (Perfmon
> or PAPI), the "floating point operation" count may also include vector
> operations (e.g. x86 SSE), which are not all strictly floating point
> operations.  Some things are compiler-dependent.
>
> Finally, things can vary to some extent just to different runtime
> conditions although it's hard to say since you don't indicate to what
> degree you are seeing variation.
>
> Rick
>
>
>
>
>
> -------------------------------------------------------
> This SF.net email is sponsored by: Splunk Inc. Do you grep through log
> files for problems?  Stop!  Download the new AJAX search engine that makes
> searching your log files as easy as surfing the  web.  DOWNLOAD SPLUNK!
> http://ads.osdn.com/?ad_id=7637&alloc_id=16865&op=click
> _______________________________________________
> PerfSuite-users mailing list
> Per...@li...
> https://lists.sourceforge.net/lists/listinfo/perfsuite-users

-- 
Cheers, Harry
Harry J Mangalam - 949 856 2847 (vox; email for fax) - hj...@ta... 
            <<plain text preferred>>