From: Harry M. <hj...@ta...> - 2006-01-10 16:11:59
|
I'll defer to Rick, but I think that the perfsuite tools use statistical sampling, and thus especially over short time courses, there will be significant differences in reporting just because of the sampling interval. Over infrequently used routines in some profiling I've done recently, the values varies over 30% on identical runs (of ~10s of CPU time). The following is perhaps slightly off-topic, but it does a lot to explain what profilers will give you and what they cannot and why: Rob Fowler <rj...@ri...> kindly wrote in response to a similar query from me: ===== I haven't tried to digest all of the details, but things like this are the reason why we tend to emphasize loop-level analysis and to take statement-level numbers with a grain of salt. There are three primary contributions for phenomena like this: * First, optimizing compilers will be aggressively rearranging the code. The generated code is a shuffle of instructions from different statements. * Second, remember that the performance of modern CPUs depends on a lot of instruction level parallelism and there can be dozens of instructions "in flight" at a time and instructions can be issued and completed out of order. * Third, when an event occurs, processors tend to be sloppy w.r.t. the attribution of the event to a specific instruction. The reported program counter for any one event is subject to "skew and smear", i.e. it's likely to be attributed to some nearby instruction that is currently in the pipeline. For example, it might be the most recent instruction to enter the pipeline. Thus,if you look at the instruction level, you can see seemingly nonsensical stuff like "loads" being charged to floating point instructions, visa versa, etc. These three components all contribute to making instruction and statement level counts imprecise. On the other hand, averaged over hundreds of instructions, the aggregate numbers are very stable and reliable. ** A brief religious statement: On encountering the attribution problem for deeply pipelined, out-of-order processors, some architects have just chosen to ignore it. Others, i.e., the Alpha architects, abandoned conventional event counts in favor of other mechanisms. Still others have risen to the challenge and have implemented clever and relatively expensive mechanisms to restore precise attribution, i.e., Power 5. Since performance issues on high-ILP machines are not a matter of any single instructions, rather "how they play with a few dozen of their closest friends", I believe that while precise attribution may be an admirable goal, it is not necessary and not worth paying a lot for, at least not for the kinds of analyses we do and certainly not for tools that use coarse-grain calipers for measurement. A recurring scenario that we've run into is a loop in which, say, 80% of the cost (or other measures ) has been attributed to a single statement. The developer sees a big potential for improvement and rearranges the code to try to get a big win by reducing the cost of that one statement. The confusing resultis that the overall cost is unchanged, but now a different statement gets charged the 80%. I hope this helps. -- Rob On Tuesday 10 January 2006 05:58, Rick Kufrin wrote: > On Tue, 10 Jan 2006, Giuseppe Grieco wrote: > > I would like to know why it happens that monitoring the same process, > > the number of total floating point operations changes. > > I guess it should remain the same. Is it? > > Thanks, Giuseppe > > Giuseppe, > > I think it depends on a number of factors. If you are using default > configuration files for PerfSuite, remember that multiplexing (timesharing > of the available performance counter registers) will be occurring. By > nature this results in estimates of the true number of event occurrences > and will be inexact, so the counts can be expected to vary from run to > run. > > Also, depending on the CPU type and the underlying access method (Perfmon > or PAPI), the "floating point operation" count may also include vector > operations (e.g. x86 SSE), which are not all strictly floating point > operations. Some things are compiler-dependent. > > Finally, things can vary to some extent just to different runtime > conditions although it's hard to say since you don't indicate to what > degree you are seeing variation. > > Rick > > > > > > ------------------------------------------------------- > This SF.net email is sponsored by: Splunk Inc. Do you grep through log > files for problems? Stop! Download the new AJAX search engine that makes > searching your log files as easy as surfing the web. DOWNLOAD SPLUNK! > http://ads.osdn.com/?ad_id=7637&alloc_id=16865&op=click > _______________________________________________ > PerfSuite-users mailing list > Per...@li... > https://lists.sourceforge.net/lists/listinfo/perfsuite-users -- Cheers, Harry Harry J Mangalam - 949 856 2847 (vox; email for fax) - hj...@ta... <<plain text preferred>> |