From: Rick K. <rk...@nc...> - 2005-10-14 21:36:14
|
Frederique, > I am trying to use Perfsuite to profile a multi-threaded program on a=20 > quadripro. I am particularly interested in knowing how much time a=20 > thread is spending on each processor. I believe that perfsuite can do=20 > that but I can't install PAPI (I am not root on the quadripro and so I= =20 > can't path the kernel with perfctr). You're right in thinking that PerfSuite (psrun and libpshwpc) should be=20 able to do this, however you are also right in thinking that it requires=20 some other mechanism for sampling than the basic support that is availabl= e=20 in the absence of access to the performance counters. I am guessing that= =20 you are on an x86 system since you mention perfctr. PerfSuite (version 0.6.2) can access performance counters using either=20 PAPI or Perfmon, the latter being relevant to Itanium (IA-64) systems. So if you're on an Itanium system, you don't need PAPI, you can get by=20 with using Perfmon directly. On x86, though, you need the kernel patched= =20 with perfctr and PAPI installed (I am guessing you know this, but just=20 clarifying). Without any counter support, you can only do profiling and that only in=20 one of two ways: either by the profil() routine available in glibc (this=20 would be the default in such an installation, using a configuration file=20 that is called "profil.xml"), or through interval timers and interrupt handlers that PerfSuite installs. The itimer approach is more flexible and can be selected using a sample XML configuration file called=20 "itimer.xml" that should be installed in PREFIX/share/perfsuite/xml/pshwp= c These two methods use signals to trigger the interrupt, though, and=20 signals are a process-wide mechanism, and so they are really not useful i= n=20 the case of a multithreaded application (at least not in the present=20 versions of PerfSuite). The counter-based techniques interrupt on counte= r=20 overflow, and that can be kept isolated within a particular thread since=20 the underlying drivers support this. I am guessing that this is the major problem you are running into, and=20 unfortunately I don't have a solution apart from having to patch your=20 kernel. The other issue (not getting file/line information) I think might be=20 related to not compiling the application with symbols enabled (-g option). Finally, about resource collection: that is done, when using the -r optio= n=20 to psrun, for the main thread only, and does not track time spent on a=20 per-CPU basis. So I am afraid that again, it is not what you might=20 have been hoping for. Sorry for the bad news - please let me know if I have misunderstood the=20 question.... Rick >=20 > I then installed perfsuite without PAPI. And I am getting some results=20 > running: >=20 > $ psrun -r -p myprogram >=20 > The resulting files are for example myprogram.*.19416.mymachine.xml and= =20 > also myprogram.19416.mymachine.res.xml. >=20 > When I watch the results with psprocess >=20 > $ psprocess -e myprogram myprogram.*.19416.mymachine.xml >=20 > I get this: >=20 >=20 > File Summary > -----------------------------------------------------------------------= ---------Samples=20 > Self % Total % File >=20 > 1519 100.00% 100.00% ?? >=20 > Function Summary > -----------------------------------------------------------------------= ---------Samples=20 > Self % Total % Function >=20 > 1519 100.00% 100.00% ?? >=20 > Function:File:Line Summary > -----------------------------------------------------------------------= ---------Samples=20 > Self % Total % Function:File:Line >=20 > 1519 100.00% 100.00% ??:??:0 >=20 > So there is no detail. >=20 > And when I watched the resource utilization, I get this: >=20 > $ psprocess myprogram.19416.mymachine.res.xml > CPU time (user, seconds) : 3.69 > CPU time (system, seconds) : 0.10 >=20 > but there is no distinction between CPUs and no distinction between thr= eads. >=20 > Is there any way for me to get more information using perfsuite without= =20 > PAPI than what I already have now? Especially with the names of files,=20 > functions and the function lines? >=20 > Is there a way to get the time spent by each thread on each processor? >=20 > Thank you, > Fr=E9d=E9rique. >=20 |