From: Rick K. <rk...@nc...> - 2009-01-05 22:40:39
|
Naveen, Thanks for digging into this more. I hope this message gets out to the mailing list (along with yours), as your experiments are interesting and useful. I think your tests reported below are a slightly different matter than the earlier one (CPU > wall clock), but they are both related. These more recent test results seem to point more to overhead introduced in libpshwpc, especially when measuring at fine granularity, which it sounds like you have in your tests. There is definitely additional work introduced when using the libpshwpc API (above and beyond any work going on in PAPI). A second point that we have been discussing following your earlier reports arises from an issue that comes when one is measuring two things simultaneously (as libpshwpc does): the hardware counters and the time stamp counter. These have to be ordered in some way and the choice is made - I will paraphrase Luiz DeRose of Cray here - whether one wants to "time the counters" or "count the timers". That is, do you first read the counter values and then the time stamp or vice versa? Reviewing the relevant code in libpshwpc (the relevant source file, by the way, is src/libpshwpc/hwpc.c), the current implementation chooses to "count the timers". For most purposes, this gives reasonable results but I think you are pushing the envelope here. I would be interested in hearing your viewpoint if the reverse should be done, which may affect the CPU vs. wall clock issue. On the more general overhead issue: we will try to reproduce by writing test cases here. If it would be possible to send me a copy of the small test program you are using, that would help us work with you on this. Thanks again for following up on this, Rick Naveen Parihar wrote: > Rick, > > Thanks for the quick response and suggestions. > > It seems like a bug in Perfsuite and not PAPI. Here are the numbers > for PAPI_TOT_CYC from a small test program. > > Using Perfsuite libpshwpc > <http://perfsuite.ncsa.uiuc.edu/libpshwpc/> : 31,257,943,365 > Using PAPI high-level interface: 8,628,425,548 > > Please note that ps_hwpc_start() and ps_hwpc_suspend() are called > thousands of times within a large program to benchmark a specific > function. The time granularity of this function is very small and > hence, I have been using rdtsc to get independent timing measurements. > The timing measurement using independent rdtsc are closer to one > measured directly using PAPI. > > I also observed similar behavior with PAPI_TOT_INS, so it is likely > that this bug is potentially influencing other events as well. > > I do not mind investing some time in finding this bug and fixing it > in Perfsuite code base and will appreciate if you can tell me the > potential places where I should be looking. > > -Naveen > > On Mon, Jan 5, 2009 at 3:30 PM, Rick Kufrin <rk...@nc... > <mailto:rk...@nc...>> wrote: > > Naveen, > > Thanks for sending the document. We (of course) get the same > results from post-processing it here as you do, but cannot easily > reproduce this behavior with our own executables/tests on systems > we have available. > > There are two suggestions I can think of to try offhand: > > 1. Add the "time" command to your run, i.e. "time psrun a.out", to > get a 3rd-party opinion of the wall clock and CPU time > > 2. PerfSuite supports using the "gettimeofday" system call for > wall-clock timing (by default, on a machine like yours, it uses > the "rdtsc" asm instruction). To use gettimeofday, one has to > reconfigure PerfSuits with the option "--enable-rtc=gettimeofday". > Then make clean and remake as normal. > > If you come across any more info looking into this, we are of > course very interested to address. Thanks for reporting. > > Rick > > Naveen Parihar wrote: > > RIck, > > The XML document is attached with this email. I'm trying to > debug the problem by directly using high-level PAPI interface. > Will let you know my findings later. > > -Naveen > > On Mon, Jan 5, 2009 at 2:00 PM, Rick Kufrin > <rk...@nc... <mailto:rk...@nc...> > <mailto:rk...@nc... <mailto:rk...@nc...>>> > wrote: > > Naveen, > > That certainly does sound like a bug, or at least unexpected > behavior... > > Would you please send a copy of the XML document that you > used to > obtain this output? It may help to look at its contents > closer to > track down what is going on. > > Rick > > Naveen Parihar wrote: > > Dear Perfsuite users, > > I'm a new user of Persuite/PAPI and would appreciate > comments > on my query below. > > On a quad core intel runing Fedora Core 6 (kernel > 2.6.18), I > get the following numbers while runing a *single* threaded > program: > CPU time (seconds) 47.966 > Wall clock time (seconds) 30.582 > > Since the CPU time > Wall clock time, one conclusion > might me > a bug in Perfsuite/PAPI or somewhere else. I double > checked > the CPU > time by dividing PAPI_TOT_CYC count by CPU frequency, and I > arrive at the same number. Any ideas on what might be > going on > or what might be the best approach to debugging this > problem > is appreciated. > > Thanks, > -Naveen > > ------------------------------------------------------------------------ > > > ------------------------------------------------------------------------------ > > ------------------------------------------------------------------------ > > _______________________________________________ > PerfSuite-users mailing list > Per...@li... > <mailto:Per...@li...> > <mailto:Per...@li... > <mailto:Per...@li...>> > > > https://lists.sourceforge.net/lists/listinfo/perfsuite-users > > > > > |