From: Rick K. <rk...@nc...> - 2007-01-12 12:08:28
|
Dawei, I'm glad to hear you are making a little progress. Regarding the reason=20 for the zeroes appearing in your later tests: if you examine the output of= =20 psprocess you will notice that it indicates that multiplexing was enabled= =20 for the run with zeroes, but not for the run that returned actual event=20 counts. I believe that this is the underlying reason that zeroes were=20 reported. On an Itanium 2 system such as you are using, there are only four actual=20 registers available for the performance counters, which limits the number= =20 of events that can be counted at any given time. When you supply a=20 configuration file that requires more events than the underlying hardware= =20 supports, PerfSuite switches into multiplexed operation (through PAPI).=20 What that means is that the available registers are "timeshared" among the= =20 events: first one event is counted for a short period of time, then the=20 next, then the third, and so on. At the end of the measurement, the=20 events that were accumulated are scaled up accordingly and that is what is= =20 reported as the final result. This is an approximation, and for=20 longer-running programs is usually quite good, but for shorter-running=20 programs it can yield unexpected results. In the extreme, it is possible= =20 that the program being measured completed before one or more of the=20 specified events had a chance to be "made active" even once. I believe the current implementation of multiplexing in PAPI only allows=20 for one event being active in any timeslice, even if the processor=20 actually supports more than one. So my guess is that what is occurring here is that your program is not=20 running for long enough to accumulate any event counts when multiplexing=20 is enabled. The run that did not report zeroes only ran for ~36,000=20 cycles, which is much less than a second on your system. My recommendation is to limit your runs to those that use configurations=20 for which multiplexing is not required. This should at least give results= =20 that are greater than zero. Whether or not runs of these lengths provide= =20 information that is useful for analysis is another matter, but that's a=20 judgement to be made by the end user. There is currently no command-line utility in PerfSuite that allows one to= =20 query a given configuration file to learn if it would require multiplexing= =20 (this would be a useful addition, I think), so the easiest way is to=20 experiment with commands like "ls" in combination with test configuration= =20 files and examine the output to see if multiplexing was used. Of course,= =20 it's good to know how many registers are available on your system in=20 advance, as that lets you know the maximum, but you still have to do test= =20 runs because some PAPI events ("derived events") are actually composed of= =20 more than one underlying native event. Rick On Fri, 12 Jan 2007, liudawei wrote: [ ... ] > What is the problem happen ? Waiting your reply. > > > Best Reagard > Dawei Liu > Renmin University of China > 100872 Beijing,China > > > --------------------------------- > =C7=C0=D7=A2=D1=C5=BB=A2=C3=E2=B7=D1=D3=CA=CF=E4-3.5G=C8=DD=C1=BF=A3=AC20= M=B8=BD=BC=FE=A3=A1 |