You can subscribe to this list here.
2004 |
Jan
|
Feb
(2) |
Mar
(2) |
Apr
|
May
|
Jun
(1) |
Jul
(6) |
Aug
(3) |
Sep
|
Oct
(1) |
Nov
|
Dec
(2) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2005 |
Jan
(2) |
Feb
(2) |
Mar
|
Apr
(6) |
May
|
Jun
(4) |
Jul
(3) |
Aug
|
Sep
|
Oct
(2) |
Nov
(12) |
Dec
(10) |
2006 |
Jan
(27) |
Feb
(4) |
Mar
(3) |
Apr
(5) |
May
(5) |
Jun
(1) |
Jul
(2) |
Aug
|
Sep
(7) |
Oct
(5) |
Nov
(11) |
Dec
(5) |
2007 |
Jan
(15) |
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
(3) |
Sep
(1) |
Oct
|
Nov
(1) |
Dec
|
2008 |
Jan
(7) |
Feb
(9) |
Mar
(2) |
Apr
(1) |
May
|
Jun
(6) |
Jul
(2) |
Aug
|
Sep
|
Oct
(1) |
Nov
(3) |
Dec
(1) |
2009 |
Jan
(11) |
Feb
|
Mar
(2) |
Apr
(1) |
May
(8) |
Jun
(11) |
Jul
(9) |
Aug
(12) |
Sep
(1) |
Oct
(3) |
Nov
(10) |
Dec
|
2010 |
Jan
(3) |
Feb
(1) |
Mar
(5) |
Apr
|
May
|
Jun
|
Jul
|
Aug
(2) |
Sep
(1) |
Oct
(1) |
Nov
|
Dec
|
2011 |
Jan
(2) |
Feb
(2) |
Mar
(1) |
Apr
|
May
(1) |
Jun
|
Jul
|
Aug
(1) |
Sep
|
Oct
(2) |
Nov
|
Dec
|
2012 |
Jan
|
Feb
|
Mar
|
Apr
|
May
(1) |
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2013 |
Jan
(1) |
Feb
|
Mar
|
Apr
(3) |
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
(1) |
Nov
(1) |
Dec
(1) |
2014 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
(1) |
Oct
|
Nov
|
Dec
|
2015 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
(1) |
Aug
|
Sep
|
Oct
(2) |
Nov
|
Dec
|
From: Rick K. <rk...@nc...> - 2008-01-29 17:12:11
|
Sanjay, PerfSuite uses a self-monitoring (first person) model, rather than one that monitors external processes/threads. I'm afraid the style of measurement you're describing is not likely to be available through PerfSuite. There may be other tools available out there that can do the sort of thing you're interested in, however. There are certainly tools that can "attach" to external processes, but the tricky part is obtaining the periodic sampling as the process runs. Rick Sanjay H A wrote: > Hello Sir > > I tested sampler program, it worked. > > 1. One thing I noticed is, inside busyloop(), if I insert printf statement or some sleep statement , the performance data will not > be printed on standard output. May I know the reason? > > 2. If I want to run some external program instead of busy loop (Using system command). The performance data will not be printed . sample code given below > > int main(int argc, char **argv) > { > process_options(argc, argv); > > setup_pshwpc(); > > setup_alarm(seconds); > > // busyloop(); > system("../loop 10000 >> temp "); // calling my program > shutdown_pshwpc(write_results); > > exit(0); > } > > What I assumed is, PID of the process will be different from the parent process (Thats why it is not printing the performance data) Is it true? Is there any solution for this type of problem. > > My point is I dont want to alter the code of my application. Instead I want to call my application from sampler program. I want to periodicaly measure the performance parameters ! > of my application > > Thanx > Regards > > Sanjay > This SF.net email is sponsored by: Microsoft > Defy all challenges. Microsoft(R) Visual Studio 2008. > > > > > > ---------[ Received Mail Content ]---------- > > Subject : Re: [PerfSuite-users] Periodic measurement > > Date : Sat, 26 Jan 2008 10:26:54 -0600 (CST) > > From : Rick Kufrin <rk...@nc...> > > To : Sanjay H A <san...@ly...> > > Cc : per...@li... > > > Sanjay, > > This type of periodic sampling is not supported directly by the psrun > command. It can be done via the PerfSuite API, however output format is > up to your own code. There is an example contained in the PerfSuite > distribution called "sampler" that shows how to set up interval timers > that trigger a read of the current performance data values. You can find > the example in the directory $PREFIX/share/perfsuite/examples/sampler > (where $PREFIX is the value supplied to the configure option --prefix, or > /usr/local if nothing was supplied). > > Rick > > On Sat, 26 Jan 2008, Sanjay H A wrote: > > >> Hello Sir >> Thanx for informative reply. >> After commenting the events in the XML file >> that are not supported on my system, psrun worked. >> >> I want to measure >> performance events periodicaly for an long running application. Means If >> I have application runs for 1 hour, I want to measure these performace >> parameters for every 5 minutes.Is there any options in perfsuite? What I >> noticed is, if I run application with psrun, it creates xml file when >> the application finishes. >> >> Thanx again >> >> Regards >> Sanjay >> >> >> >> >> > > ------------------------------------------------------------------------- > This SF.net email is sponsored by: Microsoft > Defy all challenges. Microsoft(R) Visual Studio 2008. > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > _______________________________________________ > PerfSuite-users mailing list > Per...@li... > https://lists.sourceforge.net/lists/listinfo/perfsuite-users > > |
From: Rick K. <rk...@nc...> - 2008-01-26 16:26:53
|
Sanjay, This type of periodic sampling is not supported directly by the psrun command. It can be done via the PerfSuite API, however output format is up to your own code. There is an example contained in the PerfSuite distribution called "sampler" that shows how to set up interval timers that trigger a read of the current performance data values. You can find the example in the directory $PREFIX/share/perfsuite/examples/sampler (where $PREFIX is the value supplied to the configure option --prefix, or /usr/local if nothing was supplied). Rick On Sat, 26 Jan 2008, Sanjay H A wrote: > Hello Sir > Thanx for informative reply. > After commenting the events in the XML file > that are not supported on my system, psrun worked. > > I want to measure > performance events periodicaly for an long running application. Means If > I have application runs for 1 hour, I want to measure these performace > parameters for every 5 minutes.Is there any options in perfsuite? What I > noticed is, if I run application with psrun, it creates xml file when > the application finishes. > > Thanx again > > Regards > Sanjay > > > > |
From: Rick K. <rk...@nc...> - 2008-01-25 15:27:20
|
Sanjay, Good to hear you are making progress. This latest issue you report is somewhat common, but should be easy to fix. The error message "hardware event not available" is pretty self-explanatory: it means that one or more of the events listed in the configuration file that psrun is using are not defined for your CPU. As PAPI evolves and improves, its standard, or "preset", events can change over time such that new ones may be added or existing ones removed. To address this in your installation, you should edit the default configuration file. This is an XML document that is listed by running the command "psrun -i". There should be a message in the output similar to this: Standard configuration file directory for psrun on this computer: /usr/local/share/perfsuite/xml/pshwpc Default configuration file for this computer: papi3_p4.xml ...which means that /usr/local/share/perfsuite/xml/pshwpc/papi3_p4.xml is the XML document that lists the events psrun is to count. You will have to edit this file and delete or comment out one or more events that are not available on your system/installation. The command "psinv -p" provides a list of events that *are* available on your system and can be helpful in this instance. Rick On Fri, 25 Jan 2008, Sanjay H A wrote: > Hello SirAccording to your direction, I downloaded papi.tar.gz from CVS directory.And I installed PAPI, this time while configuring, there is no warning.Installation went fine.Then I tested example programs given in PAPI repository.ALL are executed, without error.Then I tested PAPI_ipc.c from outside the test-suite, by compiling dynamicaly"cc -o PAPI_ipc -L/usr/local/papi/lib -lpapi -I/usr/local/papi/include PAPI_ipc.c " and it executed well.Then I reinstalled perfsuite, with new version of PAPI.now If I run "/usr/local/perfsuite/bin/psrun gunzip demo.c"It gives"libpsrun fatal error: hardware event not available"The perfctr version is : perfctr-2.6.x ( which comes with papi-3.5.0)System Info: CentOS release 4.3 CPU : AuthenticAMD cpu family : 15 model : 67 model name : Dual-Core AMD Opteron(tm) Processor 1214perfex -i gives : PerfCtr Info: abi_version 0x05010501 driver_version 2.6.25 DEBUG cpu_type 15 (AMD K8 Revision C) cpu_features 0x7 (! > rdpmc,rdtsc,pcint) cpu_khz 2211393 tsc_to_cpu_mult 1 cpu_nrctrs 4 cpus [0,1], total: 2 cpus_forbidden [], total: 0ThanxRegardsSanjay > > > > |
From: Rick K. <rk...@nc...> - 2008-01-23 13:21:39
|
Sanjay - This is, I think, a situation that has been reported previously - see the following earlier post to the perfsuite-users mailing list: http://sourceforge.net/mailarchive/forum.php?thread_name=Pine.LNX.4.64.0701171838120.18837%40osage.ncsa.uiuc.edu&forum_name=perfsuite-users Briefly, one workaround is to add the option --disable-binutils to the "configure" command line and rebuild PerfSuite. An alternate approach is to create a symbolic link named "libbfd.so" that points to the real libbfd-XXXXX.so (where XXXXX will differ, depending on your Linux installation). BFD, for reasons that I don't know, doesn't follow the usual naming scheme for its shared library and therefore links against -lbfd attempt to use the static library, which is where the conflict arises. Note: you would need to have root privileges on your system in order to be able to create this link. Rick On Wed, 23 Jan 2008, Sanjay H A wrote: > Hello Sir > thanx > configure worked, sfter adding the paths to LD_LIBRARY_PATH, > > but when I make, I encountered these errors > > /bin/sh ../../../../libtool --tag=CC --mode=link gcc -g -O2 -o > libpsbfd.la -rpath /usr/local/perfsuite/lib/psbfd0.2 -lbfd -liberty > -version-info 1:0:0 libpsbfd_la-Bfd_control.lo libpsbfd_la-Bfd_init.lo > libpsbfd_la-Bfd_inquire.lo libpsbfd_la-Bfd_lookup.lo > gcc -shared .libs/libpsbfd_la-Bfd_control.o .libs/libpsbfd_la-Bfd_init.o > .libs/libpsbfd_la-Bfd_inquire.o .libs/libpsbfd_la-Bfd_lookup.o -lbfd > -liberty -Wl,-soname -Wl,libpsbfd.so.1 -o .libs/libpsbfd.so.1.0.0 > /usr/bin/ld: > /usr/lib/gcc/x86_64-redhat-linux/3.4.5/../../../../lib64/libbfd.a(bfd.o): > relocation R_X86_64_32 against `a local symbol' can not be used when > making a shared object; recompile with -fPIC > /usr/lib/gcc/x86_64-redhat-linux/3.4.5/../../../../lib64/libbfd.a: could > not read symbols: Bad value > collect2: ld returned 1 exit status > make[5]: *** [libpsbfd.la] Error 1 > make[4]: *** [all-recursive] Error 1 > make[3]: *** [all-recursive] Error 1 > make[2]: *** [all-recursive] Error 1 > make[1]: *** [all-recursive] Error 1 > make: *** [all] Error 2 > > > I am attaching Makefile > > Thanx > Regards > Sanjay > > > > > > > > > ---------[ Received Mail Content ]---------- > > Subject : Re: Perfsuite configuration error - PAPI libraries > > Date : Tue, 22 Jan 2008 17:54:32 -0600 (CST) > > From : Rick Kufrin <rk...@nc...> > > To : san...@ly... > > Cc : per...@li... > > > > Sanjay, > > > > It looks to me as though the perfctr user library is not > being picked up > > at runtime by the linker. As I recall, PAPI's test suite uses > static > > linking and therefore would not need to locate libperfctr.so, > however this > > is not the case with PerfSuite. > > > > You might try adding the directory or directories that > contain your builds > > and installs of PAPI's libraries and the Perfctr library > (these may be the > > same "lib" or "lib64" subdirectory) to your LD_LIBRARY_PATH, > and try > > again. If you still experience problems, then please advise > and we can > > investigate further. > > > > Rick > > > > -------------------------- > > From: Sanjay H A > > To: per...@li... > > Subject: Perfsuite configuration error - PAPI libraries > > > > Hi > > I went to install perfsuite (perfsuite-0.6.2a6) on AMD -64 > bit CENT > > OS > > > > Kernel version: 2.6.9-34.0.2.EL > > > > I downloaded the kernel-2.6.9-34.0.2.EL.src.rpm and > perfctr-2.6.25 > > and fallowed the instuction given by Install file of perfctr > to add > > the patch. > > ( before going to make mrproper, I am changing EXTRAVERSION > to > > -34.0.2.ELcustom) > > > > Then I rebooted the system. > > > > Then I installed perfctr. it went fine. > > > > Then I installed papi-3.5.0 by mentioning perfctr directory. > > That also went fine. > > > > Then I installed tdom, that also went fine. > > > > Then I went for configuring perfsuit with this command > > > > ./configure --with-papi=/usr/local/papi-3.5.0 > > --prefix=/usr/local/perfsuite --enable-mpi > > MPICPPFLAGS="-I/usr/local/mpich-1.2.6/ch_p4/include" > > --with-tdom=/usr/lib/tdom0.8.2 > > > > its end up with error. My config.log file says > > > > configure:25730: configuring PAPI support > > configure:25753: checking for PAPI_library_init in -lpapi > > configure:25796: gcc -o conftest -g -O2 conftest.c -lpapi > > -L/usr/local/papi-3.5.0/lib >&5 > > /usr/bin/ld: warning: libperfctr.so.6, needed by > > /usr/local/papi-3.5.0/lib/libpapi.so, not found (try using > -rpath or > > -rpath-link) > > /usr/local/papi-3.5.0/lib/libpapi.so: undefined reference to > > `vperfctr_open' > > /usr/local/papi-3.5.0/lib/libpapi.so: undefined reference to > > `vperfctr_stop' > > /usr/local/papi-3.5.0/lib/libpapi.so: undefined reference to > > `vperfctr_read_pmc' > > /usr/local/papi-3.5.0/lib/libpapi.so: undefined reference to > > `_vperfctr_open' > > /usr/local/papi-3.5.0/lib/libpapi.so: undefined reference to > > `vperfctr_read_ctrs' > > /usr/local/papi-3.5.0/lib/libpapi.so: undefined reference to > > `perfctr_info_cpu_name' > > /usr/local/papi-3.5.0/lib/libpapi.so: undefined reference to > > `vperfctr_iresume' > > /usr/local/papi-3.5.0/lib/libpapi.so: undefined reference to > > `vperfctr_control' > > /usr/local/papi-3.5.0/lib/libpapi.so: undefined reference to > > /usr/local/papi-3.5.0/lib/libpapi.so: undefined reference to > > `perfctr_info' > > /usr/local/papi-3.5.0/lib/libpapi.so: undefined reference to > > `vperfctr_read_tsc' > > /usr/local/papi-3.5.0/lib/libpapi.so: undefined reference to > > `rvperfctr_close' > > /usr/local/papi-3.5.0/lib/libpapi.so: undefined reference to > > `rvperfctr_stop' > > /usr/local/papi-3.5.0/lib/libpapi.so: undefined reference to > > `rvperfctr_read_ctrs' > > /usr/local/papi-3.5.0/lib/libpapi.so: undefined reference to > > `vperfctr_unlink' > > collect2: ld returned 1 exit status > > configure:25802: $? = 1 > > configure: failed program was: > > > > path is proper. > > > > > > Can you tell me where I am wrong. > > Any mistakes while adding the patch ? > > > > Thanx in advance > > Sanjay > > > > > |
From: Rick K. <rk...@nc...> - 2008-01-22 23:54:32
|
Sanjay, It looks to me as though the perfctr user library is not being picked up at runtime by the linker. As I recall, PAPI's test suite uses static linking and therefore would not need to locate libperfctr.so, however this is not the case with PerfSuite. You might try adding the directory or directories that contain your builds and installs of PAPI's libraries and the Perfctr library (these may be the same "lib" or "lib64" subdirectory) to your LD_LIBRARY_PATH, and try again. If you still experience problems, then please advise and we can investigate further. Rick -------------------------- From: Sanjay H A <san...@ly...> To: per...@li... Subject: Perfsuite configuration error - PAPI libraries Hi I went to install perfsuite (perfsuite-0.6.2a6) on AMD -64 bit CENT OS Kernel version: 2.6.9-34.0.2.EL I downloaded the kernel-2.6.9-34.0.2.EL.src.rpm and perfctr-2.6.25 and fallowed the instuction given by Install file of perfctr to add the patch. ( before going to make mrproper, I am changing EXTRAVERSION to -34.0.2.ELcustom) Then I rebooted the system. Then I installed perfctr. it went fine. Then I installed papi-3.5.0 by mentioning perfctr directory. That also went fine. Then I installed tdom, that also went fine. Then I went for configuring perfsuit with this command ./configure --with-papi=/usr/local/papi-3.5.0 --prefix=/usr/local/perfsuite --enable-mpi MPICPPFLAGS="-I/usr/local/mpich-1.2.6/ch_p4/include" --with-tdom=/usr/lib/tdom0.8.2 its end up with error. My config.log file says configure:25730: configuring PAPI support configure:25753: checking for PAPI_library_init in -lpapi configure:25796: gcc -o conftest -g -O2 conftest.c -lpapi -L/usr/local/papi-3.5.0/lib >&5 /usr/bin/ld: warning: libperfctr.so.6, needed by /usr/local/papi-3.5.0/lib/libpapi.so, not found (try using -rpath or -rpath-link) /usr/local/papi-3.5.0/lib/libpapi.so: undefined reference to `vperfctr_open' /usr/local/papi-3.5.0/lib/libpapi.so: undefined reference to `vperfctr_stop' /usr/local/papi-3.5.0/lib/libpapi.so: undefined reference to `vperfctr_read_pmc' /usr/local/papi-3.5.0/lib/libpapi.so: undefined reference to `_vperfctr_open' /usr/local/papi-3.5.0/lib/libpapi.so: undefined reference to `vperfctr_read_ctrs' /usr/local/papi-3.5.0/lib/libpapi.so: undefined reference to `perfctr_info_cpu_name' /usr/local/papi-3.5.0/lib/libpapi.so: undefined reference to `vperfctr_iresume' /usr/local/papi-3.5.0/lib/libpapi.so: undefined reference to `vperfctr_control' /usr/local/papi-3.5.0/lib/libpapi.so: undefined reference to /usr/local/papi-3.5.0/lib/libpapi.so: undefined reference to `perfctr_info' /usr/local/papi-3.5.0/lib/libpapi.so: undefined reference to `vperfctr_read_tsc' /usr/local/papi-3.5.0/lib/libpapi.so: undefined reference to `rvperfctr_close' /usr/local/papi-3.5.0/lib/libpapi.so: undefined reference to `rvperfctr_stop' /usr/local/papi-3.5.0/lib/libpapi.so: undefined reference to `rvperfctr_read_ctrs' /usr/local/papi-3.5.0/lib/libpapi.so: undefined reference to `vperfctr_unlink' collect2: ld returned 1 exit status configure:25802: $? = 1 configure: failed program was: path is proper. Can you tell me where I am wrong. Any mistakes while adding the patch ? Thanx in advance Sanjay |
From: Rick K. <rk...@nc...> - 2007-11-06 16:48:29
|
Greetings all, This is a note to let you know of a BOF (Birds of a Feather) session to be held at the upcoming Supercomputing conference in Reno, NV that may be of interest to list subscribers. This is a session to update and discuss developments in several related performance tool efforts, among them: - PerfSuite - PAPI (Innovative Computing Lab, Univ. Tenn) - TAU (Performance Research Lab, Univ. Oregon) - KOJAK/Scalasca (Research Centre Juelich, Germany) It is not intended to be just a "show and tell", but instead to learn more about current and prospective user experiences, input, requirements, and so on. All are invited and welcome to attend if you will be at the conference. Hope to see you there! http://sc07.supercomputing.org/schedule/event_detail.php?evid=11299 Rick |
From: Rick K. <rk...@nc...> - 2007-09-04 21:30:28
|
Daniel, SourceForge email control again ingraciously bounced your message, so I will forward on to the list, as again there's a lot of interesting stuff here. It sounds as though you've done some good investigative work and modifications, thanks very much. The results you report in memory consumption savings are indeed impressive: my calculations are that you report a reduction in incremental memory usage for profiling of over 95% for your test case. Apart from the specific results of the profiling with your patch that you report, it certainly motivates investigating a more memory-efficient approach. As an aside, I'll comment that the flat address space model for the profile buffers used in PerfSuite have some historical background from its development. For profil() and PAPI_sprofil() implementations, the underlying library routines expect a flat address space that they manage/modify, and those were the original mechanisms used. For itimer and IA-64/Perfmon implementations added later, there is no assumption of flat profiling buffers, but reusing the existing infrastructure simplified things. But as you demonstrate, nothing prevents a different approach. I should also mention that PAPI provides a way for the caller to install an overflow handler, but that has never been used in PerfSuite. I think you are right in that there are several different issues your experiments demonstrate. The issue regarding unallocated thread specific data is surprising, however I'm a bit suspicious of behavior of both setitimer() and profil() approaches in a multithreaded situation - the problem being that signals are not guaranteed to be delivered to any specific thread (in POSIX). I'm wondering if that might have something to do with it - although it could be any of a number of things. I'll take a look at the modified source you've sent and try a few things out with it on this end (may take some time, unfortunately, but I'm very interested), and get back to you directly. I'm not familiar with Ansys, it sounds as though it's hybrid MPI/threads, is that correct? Or is that related to the MPI implementation? Thanks again for the experiments, work, and info - very nice stuff. Rick On Tue, 4 Sep 2007, Daniel Thomas wrote: > Hi Rick, > > Attached the modified files to implement a Btree Mapping for Profiling with > Perfsuite > with a 4 CPU MPI Ansys job > No profiling Total RSS memory : 1.746 Gb > With current Perfsuite : 2.569 Gb > With new Perfsuite : 1.768 Gb > > BTW, Ansys exhibits serious psrun weakness handling the Ansys complicated > threads launch/exit. > the _ps_hwpc_profhandler is entering and there is no TSD data. I guess Ansys > works with not only the MPI > main process but also with some ghost Pthreads. I guess an Itmer interrupt > from the main process is actually triggered > by the ghost process before it set its thread specific stuff. I fixed it by > just ignoring the interrupt (return instead exit). Latter > even the ghost thread finally got its TSD data > This allow to run up to the end and to get reports. Unfortunately for some > reason I haven't found, > the reports miss the interesting data. So to profile Ansys I just wrote a > l"ight_psrun" that just run ps_hwpc_init, ps_hwpc_start at MPI_Init and call > ps_hwpc_stop at MPI_Finalize. > Then I got what I need. But even serious this is an other issue so back to > memory usage. > > I made the dynamic mapping the default as "as is" perfsuite cannot simply be > used with the kind of jobs we run at SGI > To return to direct mapping things must be compiled with > -DPS_DIRECT_SAMPLES_ACCESS > (so there is no Makefile change to get the dynamic mapping) > > -----------hwpc.h: > add some structures to handle the Btree mapping > ------------hwpc.c > malloc only space for a Tree head instead of full mapping space > in values->samples[map] (f non PAPI profiling) > Note that ps_hwpc_sample_t is not changed and casts are made when used in the > new context > so this is compatible with Papi profiling that is still using the full > mapping. > -----------profile.c > defines inc_samples routine that does the Btree update > call inc_samples(ps_hwpc_short_sample_head_t *) samples[map],pcoff): > instead of > samples[map][pcoff]++; > in _ps_jwpc_prof_handler and _ps_perfmon_handler > ----------hpc-xml.c and hpc-text.c > add routines and calls them (if non PAPI profling) in oder to go through the > sample in address order instead > of reading sequentially the mapping and to retain non zero samples only > > Hope this helps. please come to me if you want further explanations. > > Thanks > Daniel > > > |
From: Rick K. <rk...@nc...> - 2007-08-29 16:22:27
|
Daniel - thanks for the message. I think you are not a subscriber and SourceForge bounced the email, but I am forwarding on to the list because I think you raise some good points. Comments follow your message. I also add some information about the general status and future of PerfSuite in case that is of interest to the list and users. On Wed, 29 Aug 2007, Daniel Thomas wrote: > Hi, > > I am trying to profile a 128 Core MPI application with Psrun on an Linux SGI > x86_64 machine. > The system is not patched with Perfctr nor with Perfmon and so I am reducing > my usage to the > /usr/local/share/perfsuite/xml/pshwpc/itmer.xml config file. > My problem is not related to the number of CPU but to the additional memory > Perfsuite is using on > each node of the cluster. This application uses a lot of memory for data and > also uses several xx.so > libraries that are themselves big. With the Perfsuite Itimer the application > is swapping. Shortly said it > cannot be used. > Is there a way to get equivalent information than the one the itimer.xml > profiling is providing with an other > config file on such non patched cluster that would use far less memory ? > > As I am suspecting I have few chance to get a positive answer to this last > question. I had a look to the Perfsuite > source. It seems to me looking at the get_pc, bin_pc routines in profile.c > that a memory location is reserved > for any possible PC address of the program. With my big text program too much > memory is allocated. > This is very sad as at the end only few counters are non zero (say some > hundreds). > It seems to me that it would not be an "Everest" work to manage dynamically ( > hashing or Btree) the PC locations instead > of performing the direct samples[map][pcoff]++ increment > Then it would not be too difficult to change the report routines to agree > with this new format. > This would introduce a little overhead to manage this indirect structure by > experiences with other profilers > learn us that this overhead is acceptable especially with HPC applications( > my domain) where the probability to be > interrupted inside the same inner loop is very high ( and so the ability to > retrieve immediately the corresponding counter). > I may think to implement it if I can get support from the Perfsuite > developers > > Thanks, > Daniel > > You are absolutely correct that memory consumption when profiling is not as efficient as it could be. Not only for the reason you point out (total mapping of program text and shared libraries), but additional details as well. One solution, as you point out, could be a different arrangement of data structures internally. The primary reason why it is as simple as it is in PerfSuite is for clarity and ease of understanding (also to avoid overhead as you mention, but that's a secondary reason). Another approach that I have considered is to allow the user to selectively include or exclude particular shared libraries from the profile. This would save any memory currently used always for all libraries that are in the load map at program startup. I've had requests from both sides of the fence ("I want more/less profiling of shared libraries"), each for legitimate reasons. I'd be interested to learn of people's opinions on possible approaches in the future. Regarding the more general state of PerfSuite's "future": to date, general development has been largely unfunded however we've had a need for these sorts of capabilities at NCSA (an HPC center at Illinois), and so there has been informal internal support for the work. Recently, however, it appears positive that outside funding through a grant may be coming to help support continued development/improvement of PerfSuite. This is welcomed news, of course, and would assure enhancements/maintenance for some time to come. I will update the list as things progress. In the meantime, to answer your direct question, please feel free to try your approach out if you have the time - and also feel free to assume that any support/questions you might have will be welcomed... happy to help where possible (and appreciate your willingness to dig in to help improve PerfSuite). It's been a while since there has been a new release of PerfSuite, and work is currently underway to update, with the most notable enhancement being support for Intel Core/Core 2 platforms. Hope this helps, Rick |
From: Rick K. <rk...@nc...> - 2007-08-15 19:40:16
|
Arindam, Sorry for the delay in response... it may be that your LD_LIBRARY_PATH environment variable does not include the location where the necessary shared libraries are installed (possibly /usr0/lib/lib). There are "setup" scripts that are installed in the PerfSuite "bin" directory called psenv.sh and psenv.csh. The purpose of these scripts is to set various things in your environment properly. Before running psrun, try "source"ing one of these scripts (.sh for Bash/Bourne, .csh for tcsh/csh) and see if you continue to receive this error. Rick On Thu, 9 Aug 2007, Arindam Mallik wrote: > Hi, > I have installed perfsuite as per the documentation. However, when I am > trying to run it it gives me the follwoing error. any help would be greatly > appreciated. > > /usr0/local/bin/psrun grep "perf" * > ERROR: ld.so: object 'libpsrun.so.0' from LD_PRELOAD cannot be preloaded: > ignored. > > Thanks, > Arindam > > -- > " I hear and I forget. I see and I remember. I do and I understand." > - Confucius > |
From: Arindam M. <ari...@gm...> - 2007-08-09 16:47:17
|
Hi, I have installed perfsuite as per the documentation. However, when I am trying to run it it gives me the follwoing error. any help would be greatly appreciated. /usr0/local/bin/psrun grep "perf" * ERROR: ld.so: object 'libpsrun.so.0' from LD_PRELOAD cannot be preloaded: ignored. Thanks, Arindam -- " I hear and I forget. I see and I remember. I do and I understand." - Confucius |
From: Rick K. <rk...@nc...> - 2007-01-18 00:50:38
|
Steve, Coincidentally, another user reported this to the perfsuite-users mailing list on SourceForge, so I will answer and hopefully move towards solving two problems at once... This error is similar to one that has occurred on IA-64 machines where 32-bit libraries are mixed with 64-bit. It comes into play when the BFD extension that is used by psprocess is being built (libbfd.a). I had difficulty resolving this on an x86-64 machine as well, and as I recall the solution I ended up using was to avoid attempting to build that particular extension for PerfSuite. This is done by adding --disable-binutils to the configure command line, and rebuilding the package. It wouldn't hurt to do "make distclean" first, to get everything back to its original state. There should be no loss of functionality when configured with --disable-binutils, but what will happen is that PC->source line mapping will be done by an external "addr2line" process rather than purely through psprocess/Tcl. It may also help to explicitly request the "bitness" of the build, which I believe is done through the gcc option "-m32" or "-m64", depending on which you want. This is also done at configure time, through the CFLAGS variable, e.g.: ./configure [options] CFLAGS="-m32 <others>" Rick On Wed, 17 Jan 2007, Steven R Brandt wrote: > I'm having some trouble getting perfsuite to build on an x86_64 machine. I > get the following error: > > gcc -shared .libs/libpsbfd_la-Bfd_control.o .libs/libpsbfd_la-Bfd_init.o > .libs/libpsbfd_la-Bfd_inquire.o .libs/libpsbfd_la-Bfd_lookup.o -lbfd > -liberty -Wl,-soname -Wl,libpsbfd.so.1 -o .libs/libpsbfd.so.1.0.0 > /usr/bin/ld: > /usr/lib/gcc/x86_64-redhat-linux/3.4.6/../../../../lib64/libbfd.a(bfd.o): > relocation R_X86_64_32 against `a local symbol' can not be used when making a > shared object; recompile with -fPIC > /usr/lib/gcc/x86_64-redhat-linux/3.4.6/../../../../lib64/libbfd.a: could not > read symbols: Bad value > > My config: > export PATH="/home/packages/ActiveTcl-8.4/bin:$PATH" > ./configure \ > --with-wish=/home/packages/ActiveTcl-8.4/bin/wish \ > --with-tclinclude=/home/packages/ActiveTcl-8.4/include \ > --with-tdom=/home/packages/ActiveTcl-8.4/lib/tdom0.8.1/ \ > --with-papi=/home/packages/papi-3.5.0 \ > -prefix=/home/packages/perfsuite-0.6.2a6 > > $ uname -a > Linux celeritas.cct.lsu.edu 2.6.9-prep #1 SMP Thu Dec 7 20:32:47 CST 2006 > x86_64 x86_64 x86_64 GNU/Linux > > I tried to get around this by configuring with --disable-shared. > Unfortunately, while this compiles, it does not work. I get this error when I > attempt to run: > > $ psrun ls > ERROR: ld.so: object 'libpsrun.so.0' from LD_PRELOAD cannot be preloaded: > ignored. > > $ find /home/packages/perfsuite-0.6.2a6/ -name libpsrun.\* > /home/packages/perfsuite-0.6.2a6/lib/libpsrun.la > /home/packages/perfsuite-0.6.2a6/lib/libpsrun.a > > Any ideas? > > Thanks, > Steve > |
From: liudawei <dbm...@ya...> - 2007-01-17 10:10:20
|
When I try to make Perfsuite on an "AMD" machine based I got the following error message ----------- Making all in bfd /bin/sh ../../../../libtool --tag=CC --mode=link gcc -g -O2 -o libpsbfd.la -rpath /usr/local/lib/psbfd0.2 -lbfd -liberty -version-info 1:0:0 libpsbfd_la-Bfd_control.lo libpsbfd_la-Bfd_init.lo libpsbfd_la-Bfd_inquire.lo libpsbfd_la-Bfd_lookup.lo gcc -shared .libs/libpsbfd_la-Bfd_control.o .libs/libpsbfd_la-Bfd_init.o .libs/libpsbfd_la-Bfd_inquire.o .libs/libpsbfd_la-Bfd_lookup.o -lbfd -liberty -Wl,-soname -Wl,libpsbfd.so.1 -o .libs/libpsbfd.so.1.0.0 /usr/bin/ld: /usr/lib/gcc-lib/x86_64-redhat-linux/3.2.3/../../../../lib64/libbfd.a(bfd.o): relocation R_X86_64_32 can not be used when making a shared object; recompile with -fPIC /usr/lib/gcc-lib/x86_64-redhat-linux/3.2.3/../../../../lib64/libbfd.a: could not read symbols: Bad value ---------------- Do not know how to resolve it . --------------------------------- 雅虎免费邮箱-3.5G容量,20M附件 |
From: Rick K. <rk...@nc...> - 2007-01-16 14:37:03
|
Dawei, I can point out a few things that may help you think about your experiments: - The wall clock time reported is just that: the amount of time elapsed from when the measurement started until it stopped. There is not necessarily a correspondence between the wall clock time and amount of "work" done (whatever you define "work" to be) - By default, PerfSuite measures a process in "user" mode, and does not account for "system", or "kernel" time, that is, time spent executing in system calls on behalf of the user process. If a process spends a lot of time in requests for system services (for example, I/O), you may get a large underestimate in the resulting XML documents. One can request a different counting mode through either the psrun option "-d" or the environment variable PS_HWPC_DOMAIN. See the psrun man page or the output of "psrun -h" for syntax. - It helps to know a little bit about the CPU you are working with and what events mean. Usually, one has to refer to the CPU vendor documentation that describes the implemented events. In the case of the Itanium 2, one unusual thing is that some data bypasses the level 1 cache altogether. This has been discussed on this list a few times in the past, when people wonder why there are more level 2 cache misses than level 1 cache misses for runs on Itanium systems. I hesistate to post a URL for the reference to performance events on Itanium 2 because it seems to me that it has been moved around in the past, but some searching at Intel's web site should eventually turn it up. Rick On Sun, 14 Jan 2007, liudawei wrote: > Dear Rick: > > I am still confused by my monitor work, this time is not a simple test program, this is a real senario. > 1. "psrun -f ttDaemonAdmin -start" // start the Timesten database server.here the ttDaemonAdmin is a script file. the file contents listed in the end of this letter. > 2. "ps -A" // to show you that once the database server is start, there will be five processes started. > ------ > 13366 ? 00:00:00 timestend > 13369 ? 00:00:00 timestensubd > 13370 ? 00:00:00 timestensubd > 13371 ? 00:00:00 timestensubd > 13372 ? 00:00:00 timestensubd > ------ > 3. "tpcb" // This will start a test program which will connect the datbaseerver and send a few of queries to database,indeed, the connect request will be replied by a process (13372), it will fetch data from the database and > return the results to the request. I use the command "top" to show a snapshot when the tpcb programm is running. > 4 "top" > -------------- > PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND > 13400 root 25 0 65680 46m 36m R 49.8 1.2 0:01.49 tpcb > 13372 root 25 0 63712 14m 12m S 5.4 0.4 0:00.17 timestensub > 269 root 15 0 0 0 0 D 2.0 0.0 0:03.82 kjournald > 13366 root 25 0 26672 7248 3856 S 0.2 0.2 0:00.01 timestend > -------------- > Through this snapshot, it can clearly seen that it is the process(13372) which is serving tpcb to access data stored in database. I just to show you that this process(13372) will do a lot of work than others. > 5. "ttdaemonAdmin -stop" //This will stop the database server and all the processes(the five)will dead. So, my monitor work is finished, and I got the following monitor results files > --------------- > -rw-r--r-- 1 root root 2466 Jan 14 20:32 sh.13361.Node4.xml > -rw-r--r-- 1 root root 2464 Jan 14 20:32 sh.13362.Node4.xml > -rw------- 1 root root 2463 Jan 14 20:32 timestend.13364.Node4.xml > -rw------- 1 root root 2470 Jan 14 20:32 timestend.13365.Node4.xml > -rw-rw-rw- 1 root root 2482 Jan 14 20:43 timestensubd.13369.Node4.xml > -rw-rw-rw- 1 root root 2482 Jan 14 20:43 timestensubd.13370.Node4.xml > -rw-rw-rw- 1 root root 2482 Jan 14 20:43 timestensubd.13371.Node4.xml > -rw-rw-rw- 1 root root 2482 Jan 14 20:43 timestensubd.13372.Node4.xml > -rw-r--r-- 1 root root 2455 Jan 14 20:32 tr.13363.Node4.xml > -rw-r--r-- 1 root root 2471 Jan 14 20:32 ttDaemonAdmin.13360.Node4.xml > --------------- > 7. Here, do you think there will be much more differences between "timestensubd.13372.Node4.xml" and others? > I think so ,because it is this process (133702) really do the data acess work, while others do not. but when > I use "psprocess" to watch the results, I found the differece in not distinguish. I do not know why. > I do not know wether the method I used to monitor a database server is right? It seem there is something wrong whith > the wall clock time of "13372". It have done some much work, but why there is no distinguish with other process ? > It is impossible that this process(13372) consume so little resource (I mean L1_ICM L1_DCM numbers) and is also approximate > to other processes. Attached the resutls below. > 8. Could I added up all the L1_DCM number caused by these processes,say N. Could I believe the total L1 cache miss numbers > caused by database server is N? > ---------------------------------- > PerfSuite Hardware Performance Summary Report > Version : 1.0 > Created : Sun Jan 14 08:48:42 PM CST 2007 > Generator : psprocess 0.3 > XML Source : timestensubd.13369.Node4.xml > [......] > Index Description Counter Value > ============================================================================================ > 1 Level 1 instruction cache misses................................. 6158 > 2 Level 1 data cache misses........................................ 13377 > 3 Level 2 instruction cache misses................................. 4452 > Event Index > ============================================================================================ > 1: PAPI_L1_ICM 2: PAPI_L1_DCM 3: PAPI_L2_ICM > Statistics > ============================================================================================ > Counting domain........................................................ user > Multiplexed............................................................ no > Wall clock time (seconds).............................................. 635.387 > > ----------------------------------- > PerfSuite Hardware Performance Summary Report > Version : 1.0 > Created : Sun Jan 14 08:48:43 PM CST 2007 > Generator : psprocess 0.3 > XML Source : timestensubd.13370.Node4.xml > Index Description Counter Value > ============================================================================================ > 1 Level 1 instruction cache misses................................. 6219 > 2 Level 1 data cache misses........................................ 13464 > 3 Level 2 instruction cache misses................................. 4465 > [.....] > Event Index > ============================================================================================ > 1: PAPI_L1_ICM 2: PAPI_L1_DCM 3: PAPI_L2_ICM > Statistics > ============================================================================================ > Counting domain........................................................ user > Multiplexed............................................................ no > Wall clock time (seconds).............................................. 635.411 > ----------------------------------------------------------------------------------------------- > PerfSuite Hardware Performance Summary Report > Version : 1.0 > Created : Sun Jan 14 08:48:43 PM CST 2007 > Generator : psprocess 0.3 > XML Source : timestensubd.13371.Node4.xml > [......] > Index Description Counter Value > ============================================================================================ > 1 Level 1 instruction cache misses................................. 5901 > 2 Level 1 data cache misses........................................ 13461 > 3 Level 2 instruction cache misses................................. 4430 > Event Index > ============================================================================================ > 1: PAPI_L1_ICM 2: PAPI_L1_DCM 3: PAPI_L2_ICM > Statistics > ============================================================================================ > Counting domain........................................................ user > Multiplexed............................................................ no > Wall clock time (seconds).............................................. 635.403 > ------------------------------------------------------------------------------------------------ > PerfSuite Hardware Performance Summary Report > Version : 1.0 > Created : Sun Jan 14 08:48:43 PM CST 2007 > Generator : psprocess 0.3 > XML Source : timestensubd.13372.Node4.xml > [......] > Index Description Counter Value > ============================================================================================ > 1 Level 1 instruction cache misses................................. 5962 > 2 Level 1 data cache misses........................................ 13524 > 3 Level 2 instruction cache misses................................. 4474 > Event Index > ============================================================================================ > 1: PAPI_L1_ICM 2: PAPI_L1_DCM 3: PAPI_L2_ICM > Statistics > ============================================================================================ > Counting domain........................................................ user > Multiplexed............................................................ no > Wall clock time (seconds).............................................. 635.394 > --------------------------------------------------------------------------------------------- > ---------ttDaemonAdmin.sh-------- > #!/bin/sh > # Copyright (C) 1999, 2006, Oracle. All rights reserved. > # > # Set the shared library search path environment variable > # > TIMESTEN_DIR=/opt/TimesTen/tt60ty > LD_LIBRARY_PATH=$TIMESTEN_DIR/lib${LD_LIBRARY_PATH:+:$LD_LIBRARY_PATH} > export LD_LIBRARY_PATH > > cmd=`echo $0 | tr ABCDEFGHIJKLMNOPQRSTUVWXYZ abcdefghijklmnopqrstuvwxyz` > case $cmd in > *ttadmin) > cmd=$TIMESTEN_DIR/bin/ttAdminCmd > ;; > *ttrepadmin) > cmd=$TIMESTEN_DIR/bin/ttRepAdminCmd > ;; > *ttstatus) > cmd=$TIMESTEN_DIR/bin/ttStatusCmd > ;; > *tttail) > cmd=$TIMESTEN_DIR/bin/ttTailCmd > ;; > *tttracemon) > cmd=$TIMESTEN_DIR/bin/ttTraceMonCmd > ;; > *ttisql) > cmd=$TIMESTEN_DIR/bin/ttIsqlCmd > ;; > *ttcheck) > cmd=$TIMESTEN_DIR/bin/ttCheckCmd > ;; > *ttxactlog) > cmd=$TIMESTEN_DIR/bin/ttXactLogCmd > ;; > *ttxactadmin) > cmd=$TIMESTEN_DIR/bin/ttXactAdminCmd > ;; > *ttbackup) > cmd=$TIMESTEN_DIR/bin/ttBackupCmd > ;; > *ttrestore) > cmd=$TIMESTEN_DIR/bin/ttRestoreCmd > ;; > *ttdestroy) > cmd=$TIMESTEN_DIR/bin/ttDestroyCmd > ;; > *ttschema) > cmd=$TIMESTEN_DIR/bin/ttSchemaCmd > ;; > *ttsize) > cmd=$TIMESTEN_DIR/bin/ttSizeCmd > ;; > *ttbulkcp) > cmd=$TIMESTEN_DIR/bin/ttBulkCpCmd > ;; > *ttdaemonlog) > cmd=$TIMESTEN_DIR/bin/ttDaemonLogCmd > ;; > *ttdaemonadmin) > cmd=$TIMESTEN_DIR/bin/ttDaemonAdminCmd > ;; > *ttmigrate) > cmd=$TIMESTEN_DIR/bin/ttMigrateCmd > ;; > *ttisqlcs) > cmd=$TIMESTEN_DIR/bin/ttIsqlCSCmd > ;; > *ttbulkcpcs) > cmd=$TIMESTEN_DIR/bin/ttBulkCpCSCmd > ;; > *ttmigratecs) > cmd=$TIMESTEN_DIR/bin/ttMigrateCSCmd > ;; > *ttschemacs) > cmd=$TIMESTEN_DIR/bin/ttSchemaCSCmd > ;; > *ttsyslogcheck) > cmd=$TIMESTEN_DIR/bin/ttSyslogCheckCmd > ;; > *ttthunk) > echo "ttThunk sets up the environment for TimesTen utility programs." > echo "It is not intended to be run directly." > exit 1 > ;; > *) > echo "$cmd is not a TimesTen utility program." > exit 1 > ;; > esac > exec $cmd "$@" > ------------------------------------------------------------------------------------------------------- > > Best regards > Dawei Liu > Renmin University of China > 100872 Beijing,China > > > --------------------------------- |
From: liudawei <dbm...@ya...> - 2007-01-14 13:26:57
|
Dear Rick: I am still confused by my monitor work, this time is not a simple test program, this is a real senario. 1. "psrun -f ttDaemonAdmin -start" // start the Timesten database server.here the ttDaemonAdmin is a script file. the file contents listed in the end of this letter. 2. "ps -A" // to show you that once the database server is start, there will be five processes started. ------ 13366 ? 00:00:00 timestend 13369 ? 00:00:00 timestensubd 13370 ? 00:00:00 timestensubd 13371 ? 00:00:00 timestensubd 13372 ? 00:00:00 timestensubd ------ 3. "tpcb" // This will start a test program which will connect the datbaseerver and send a few of queries to database,indeed, the connect request will be replied by a process (13372), it will fetch data from the database and return the results to the request. I use the command "top" to show a snapshot when the tpcb programm is running. 4 "top" -------------- PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 13400 root 25 0 65680 46m 36m R 49.8 1.2 0:01.49 tpcb 13372 root 25 0 63712 14m 12m S 5.4 0.4 0:00.17 timestensub 269 root 15 0 0 0 0 D 2.0 0.0 0:03.82 kjournald 13366 root 25 0 26672 7248 3856 S 0.2 0.2 0:00.01 timestend -------------- Through this snapshot, it can clearly seen that it is the process(13372) which is serving tpcb to access data stored in database. I just to show you that this process(13372) will do a lot of work than others. 5. "ttdaemonAdmin -stop" //This will stop the database server and all the processes(the five)will dead. So, my monitor work is finished, and I got the following monitor results files --------------- -rw-r--r-- 1 root root 2466 Jan 14 20:32 sh.13361.Node4.xml -rw-r--r-- 1 root root 2464 Jan 14 20:32 sh.13362.Node4.xml -rw------- 1 root root 2463 Jan 14 20:32 timestend.13364.Node4.xml -rw------- 1 root root 2470 Jan 14 20:32 timestend.13365.Node4.xml -rw-rw-rw- 1 root root 2482 Jan 14 20:43 timestensubd.13369.Node4.xml -rw-rw-rw- 1 root root 2482 Jan 14 20:43 timestensubd.13370.Node4.xml -rw-rw-rw- 1 root root 2482 Jan 14 20:43 timestensubd.13371.Node4.xml -rw-rw-rw- 1 root root 2482 Jan 14 20:43 timestensubd.13372.Node4.xml -rw-r--r-- 1 root root 2455 Jan 14 20:32 tr.13363.Node4.xml -rw-r--r-- 1 root root 2471 Jan 14 20:32 ttDaemonAdmin.13360.Node4.xml --------------- 7. Here, do you think there will be much more differences between "timestensubd.13372.Node4.xml" and others? I think so ,because it is this process (133702) really do the data acess work, while others do not. but when I use "psprocess" to watch the results, I found the differece in not distinguish. I do not know why. I do not know wether the method I used to monitor a database server is right? It seem there is something wrong whith the wall clock time of "13372". It have done some much work, but why there is no distinguish with other process ? It is impossible that this process(13372) consume so little resource (I mean L1_ICM L1_DCM numbers) and is also approximate to other processes. Attached the resutls below. 8. Could I added up all the L1_DCM number caused by these processes,say N. Could I believe the total L1 cache miss numbers caused by database server is N? ---------------------------------- PerfSuite Hardware Performance Summary Report Version : 1.0 Created : Sun Jan 14 08:48:42 PM CST 2007 Generator : psprocess 0.3 XML Source : timestensubd.13369.Node4.xml [......] Index Description Counter Value ============================================================================================ 1 Level 1 instruction cache misses................................. 6158 2 Level 1 data cache misses........................................ 13377 3 Level 2 instruction cache misses................................. 4452 Event Index ============================================================================================ 1: PAPI_L1_ICM 2: PAPI_L1_DCM 3: PAPI_L2_ICM Statistics ============================================================================================ Counting domain........................................................ user Multiplexed............................................................ no Wall clock time (seconds).............................................. 635.387 ----------------------------------- PerfSuite Hardware Performance Summary Report Version : 1.0 Created : Sun Jan 14 08:48:43 PM CST 2007 Generator : psprocess 0.3 XML Source : timestensubd.13370.Node4.xml Index Description Counter Value ============================================================================================ 1 Level 1 instruction cache misses................................. 6219 2 Level 1 data cache misses........................................ 13464 3 Level 2 instruction cache misses................................. 4465 [.....] Event Index ============================================================================================ 1: PAPI_L1_ICM 2: PAPI_L1_DCM 3: PAPI_L2_ICM Statistics ============================================================================================ Counting domain........................................................ user Multiplexed............................................................ no Wall clock time (seconds).............................................. 635.411 ----------------------------------------------------------------------------------------------- PerfSuite Hardware Performance Summary Report Version : 1.0 Created : Sun Jan 14 08:48:43 PM CST 2007 Generator : psprocess 0.3 XML Source : timestensubd.13371.Node4.xml [......] Index Description Counter Value ============================================================================================ 1 Level 1 instruction cache misses................................. 5901 2 Level 1 data cache misses........................................ 13461 3 Level 2 instruction cache misses................................. 4430 Event Index ============================================================================================ 1: PAPI_L1_ICM 2: PAPI_L1_DCM 3: PAPI_L2_ICM Statistics ============================================================================================ Counting domain........................................................ user Multiplexed............................................................ no Wall clock time (seconds).............................................. 635.403 ------------------------------------------------------------------------------------------------ PerfSuite Hardware Performance Summary Report Version : 1.0 Created : Sun Jan 14 08:48:43 PM CST 2007 Generator : psprocess 0.3 XML Source : timestensubd.13372.Node4.xml [......] Index Description Counter Value ============================================================================================ 1 Level 1 instruction cache misses................................. 5962 2 Level 1 data cache misses........................................ 13524 3 Level 2 instruction cache misses................................. 4474 Event Index ============================================================================================ 1: PAPI_L1_ICM 2: PAPI_L1_DCM 3: PAPI_L2_ICM Statistics ============================================================================================ Counting domain........................................................ user Multiplexed............................................................ no Wall clock time (seconds).............................................. 635.394 --------------------------------------------------------------------------------------------- ---------ttDaemonAdmin.sh-------- #!/bin/sh # Copyright (C) 1999, 2006, Oracle. All rights reserved. # # Set the shared library search path environment variable # TIMESTEN_DIR=/opt/TimesTen/tt60ty LD_LIBRARY_PATH=$TIMESTEN_DIR/lib${LD_LIBRARY_PATH:+:$LD_LIBRARY_PATH} export LD_LIBRARY_PATH cmd=`echo $0 | tr ABCDEFGHIJKLMNOPQRSTUVWXYZ abcdefghijklmnopqrstuvwxyz` case $cmd in *ttadmin) cmd=$TIMESTEN_DIR/bin/ttAdminCmd ;; *ttrepadmin) cmd=$TIMESTEN_DIR/bin/ttRepAdminCmd ;; *ttstatus) cmd=$TIMESTEN_DIR/bin/ttStatusCmd ;; *tttail) cmd=$TIMESTEN_DIR/bin/ttTailCmd ;; *tttracemon) cmd=$TIMESTEN_DIR/bin/ttTraceMonCmd ;; *ttisql) cmd=$TIMESTEN_DIR/bin/ttIsqlCmd ;; *ttcheck) cmd=$TIMESTEN_DIR/bin/ttCheckCmd ;; *ttxactlog) cmd=$TIMESTEN_DIR/bin/ttXactLogCmd ;; *ttxactadmin) cmd=$TIMESTEN_DIR/bin/ttXactAdminCmd ;; *ttbackup) cmd=$TIMESTEN_DIR/bin/ttBackupCmd ;; *ttrestore) cmd=$TIMESTEN_DIR/bin/ttRestoreCmd ;; *ttdestroy) cmd=$TIMESTEN_DIR/bin/ttDestroyCmd ;; *ttschema) cmd=$TIMESTEN_DIR/bin/ttSchemaCmd ;; *ttsize) cmd=$TIMESTEN_DIR/bin/ttSizeCmd ;; *ttbulkcp) cmd=$TIMESTEN_DIR/bin/ttBulkCpCmd ;; *ttdaemonlog) cmd=$TIMESTEN_DIR/bin/ttDaemonLogCmd ;; *ttdaemonadmin) cmd=$TIMESTEN_DIR/bin/ttDaemonAdminCmd ;; *ttmigrate) cmd=$TIMESTEN_DIR/bin/ttMigrateCmd ;; *ttisqlcs) cmd=$TIMESTEN_DIR/bin/ttIsqlCSCmd ;; *ttbulkcpcs) cmd=$TIMESTEN_DIR/bin/ttBulkCpCSCmd ;; *ttmigratecs) cmd=$TIMESTEN_DIR/bin/ttMigrateCSCmd ;; *ttschemacs) cmd=$TIMESTEN_DIR/bin/ttSchemaCSCmd ;; *ttsyslogcheck) cmd=$TIMESTEN_DIR/bin/ttSyslogCheckCmd ;; *ttthunk) echo "ttThunk sets up the environment for TimesTen utility programs." echo "It is not intended to be run directly." exit 1 ;; *) echo "$cmd is not a TimesTen utility program." exit 1 ;; esac exec $cmd "$@" ------------------------------------------------------------------------------------------------------- Best regards Dawei Liu Renmin University of China 100872 Beijing,China --------------------------------- 抢注雅虎免费邮箱-3.5G容量,20M附件! |
Rick Kufrin : Thank your very much for your kindness. I have understand what you say. I will keep you posted of my work. Best regards Dawei Liu Rick Kufrin <rk...@nc...> 写道: Dawei, I'm glad to hear you are making a little progress. Regarding the reason for the zeroes appearing in your later tests: if you examine the output of psprocess you will notice that it indicates that multiplexing was enabled for the run with zeroes, but not for the run that returned actual event counts. I believe that this is the underlying reason that zeroes were reported. On an Itanium 2 system such as you are using, there are only four actual registers available for the performance counters, which limits the number of events that can be counted at any given time. When you supply a configuration file that requires more events than the underlying hardware supports, PerfSuite switches into multiplexed operation (through PAPI). What that means is that the available registers are "timeshared" among the events: first one event is counted for a short period of time, then the next, then the third, and so on. At the end of the measurement, the events that were accumulated are scaled up accordingly and that is what is reported as the final result. This is an approximation, and for longer-running programs is usually quite good, but for shorter-running programs it can yield unexpected results. In the extreme, it is possible that the program being measured completed before one or more of the specified events had a chance to be "made active" even once. I believe the current implementation of multiplexing in PAPI only allows for one event being active in any timeslice, even if the processor actually supports more than one. So my guess is that what is occurring here is that your program is not running for long enough to accumulate any event counts when multiplexing is enabled. The run that did not report zeroes only ran for ~36,000 cycles, which is much less than a second on your system. My recommendation is to limit your runs to those that use configurations for which multiplexing is not required. This should at least give results that are greater than zero. Whether or not runs of these lengths provide information that is useful for analysis is another matter, but that's a judgement to be made by the end user. There is currently no command-line utility in PerfSuite that allows one to query a given configuration file to learn if it would require multiplexing (this would be a useful addition, I think), so the easiest way is to experiment with commands like "ls" in combination with test configuration files and examine the output to see if multiplexing was used. Of course, it's good to know how many registers are available on your system in advance, as that lets you know the maximum, but you still have to do test runs because some PAPI events ("derived events") are actually composed of more than one underlying native event. Rick On Fri, 12 Jan 2007, liudawei wrote: [ ... ] > What is the problem happen ? Waiting your reply. > > > Best Reagard > Dawei Liu > Renmin University of China > 100872 Beijing,China > > > --------------------------------- > 抢注雅虎免费邮箱-3.5G容量,20M附件! --------------------------------- 抢注雅虎免费邮箱-3.5G容量,20M附件! |
From: Rick K. <rk...@nc...> - 2007-01-12 12:08:28
|
Dawei, I'm glad to hear you are making a little progress. Regarding the reason=20 for the zeroes appearing in your later tests: if you examine the output of= =20 psprocess you will notice that it indicates that multiplexing was enabled= =20 for the run with zeroes, but not for the run that returned actual event=20 counts. I believe that this is the underlying reason that zeroes were=20 reported. On an Itanium 2 system such as you are using, there are only four actual=20 registers available for the performance counters, which limits the number= =20 of events that can be counted at any given time. When you supply a=20 configuration file that requires more events than the underlying hardware= =20 supports, PerfSuite switches into multiplexed operation (through PAPI).=20 What that means is that the available registers are "timeshared" among the= =20 events: first one event is counted for a short period of time, then the=20 next, then the third, and so on. At the end of the measurement, the=20 events that were accumulated are scaled up accordingly and that is what is= =20 reported as the final result. This is an approximation, and for=20 longer-running programs is usually quite good, but for shorter-running=20 programs it can yield unexpected results. In the extreme, it is possible= =20 that the program being measured completed before one or more of the=20 specified events had a chance to be "made active" even once. I believe the current implementation of multiplexing in PAPI only allows=20 for one event being active in any timeslice, even if the processor=20 actually supports more than one. So my guess is that what is occurring here is that your program is not=20 running for long enough to accumulate any event counts when multiplexing=20 is enabled. The run that did not report zeroes only ran for ~36,000=20 cycles, which is much less than a second on your system. My recommendation is to limit your runs to those that use configurations=20 for which multiplexing is not required. This should at least give results= =20 that are greater than zero. Whether or not runs of these lengths provide= =20 information that is useful for analysis is another matter, but that's a=20 judgement to be made by the end user. There is currently no command-line utility in PerfSuite that allows one to= =20 query a given configuration file to learn if it would require multiplexing= =20 (this would be a useful addition, I think), so the easiest way is to=20 experiment with commands like "ls" in combination with test configuration= =20 files and examine the output to see if multiplexing was used. Of course,= =20 it's good to know how many registers are available on your system in=20 advance, as that lets you know the maximum, but you still have to do test= =20 runs because some PAPI events ("derived events") are actually composed of= =20 more than one underlying native event. Rick On Fri, 12 Jan 2007, liudawei wrote: [ ... ] > What is the problem happen ? Waiting your reply. > > > Best Reagard > Dawei Liu > Renmin University of China > 100872 Beijing,China > > > --------------------------------- > =C7=C0=D7=A2=D1=C5=BB=A2=C3=E2=B7=D1=D3=CA=CF=E4-3.5G=C8=DD=C1=BF=A3=AC20= M=B8=BD=BC=FE=A3=A1 |
I have done a simple test step by step according your suggestion ,but unfortunely I encountered another problem. Let me descirbe my test here. step 1. I edit the configure file "papi3_itanium2.xml",just keep two entries left. papi3_itanium2.xml is my default configure file.I just let this file only includes two entry according your suggestion.The contents is here. ------------------------------------------------------------------------------------------------------------- <?xml version="1.0" encoding="UTF-8"?> <ps_hwpc_eventlist class="PAPI" generator="psconfig"> <ps_hwpc_event type="preset" name="PAPI_TOT_CYC" /> <ps_hwpc_event type="preset" name="PAPI_TOT_INS" /> </ps_hwpc_eventlist> ------------------------------------------------------------------------------------------------------------- step 2. I write a simple C program name "a.c" then I compile it into a executeble named "a.out". step 3. I start my test work via the command "psrun a.out",then an expected file is gotted. step 4. I use psrpocess to process this file via the command "a.out.27000.Node4.xml". I list the results as follows. The results is what I am expecting, just as you say, the "Total cycles" and "Instructions completed" are all not zero. I continue do my test work, pls see the step 5. --------------------------------------------------------------------------------------------------------------- PerfSuite Hardware Performance Summary Report Version : 1.0 Created : Fri Jan 12 10:57:12 AM CST 2007 Generator : psprocess 0.3 XML Source : a.out.25984.Node4.xml Execution Information ============================================================================================ Collector : libpshwpc Date : Fri Jan 12 10:56:49 2007 Host : Node4 User : root Command : a.out Processor and System Information ============================================================================================ Node CPUs : 2 Vendor : Intel Family : Itanium 2 CPU Revision : 2 Clock (MHz) : 1600.030 Memory (MB) : 4032.42 Pagesize (KB) : 16 Cache Information ============================================================================================ Cache levels : 3 -------------------------------- Level 1 Type : data Size (KB) : 16 Linesize (B) : 64 Assoc : 4 Type : instruction Size (KB) : 16 Linesize (B) : 64 Assoc : 4 -------------------------------- Level 2 Type : unified Size (KB) : 256 Linesize (B) : 128 Assoc : 8 -------------------------------- Level 3 Type : unified Size (KB) : 3072 Linesize (B) : 128 Assoc : 6 Index Description Counter Value ============================================================================================ 1 Total cycles..................................................... 36270 2 Instructions completed........................................... 25558 Event Index ============================================================================================ 1: PAPI_TOT_CYC 2: PAPI_TOT_INS Statistics ============================================================================================ Counting domain........................................................ user Multiplexed............................................................ no Graduated instructions per cycle....................................... 0.705 MIPS (cycles).......................................................... 1127.476 MIPS (wall clock)...................................................... 747.215 CPU time (seconds)..................................................... 0.000 Wall clock time (seconds).............................................. 0.000 % CPU utilization...................................................... 66.273 ------------------------------------------------------------------------------------------------------------------ step 5. I edit the "papi3_itanium2.xml" and added another entry(say, I want to know the total numbers of L1 data cache misses ). The configure file which is listed here again --------------------------------------- <?xml version="1.0" encoding="UTF-8"?> <ps_hwpc_eventlist class="PAPI" generator="psconfig"> <ps_hwpc_event type="preset" name="PAPI_L1_DCM" /> <ps_hwpc_event type="preset" name="PAPI_TOT_CYC" /> <ps_hwpc_event type="preset" name="PAPI_TOT_INS" /> </ps_hwpc_eventlist> ------------------------------------------------------------------------ step 6: psrun a.out step 7: psprocess a.out.27020.Node4.xml I got a result once more, this time I got the total numbers of L1 data cache misses caused by a.out. This results is also I expected. So I continue my work. step 8: "vi papi3_itanium2.xml" to added another entry which is an events I interested. so the configure file contents like this -------------------------------------------------------- <?xml version="1.0" encoding="UTF-8"?> <ps_hwpc_eventlist class="PAPI" generator="psconfig"> <ps_hwpc_event type="preset" name="PAPI_L1_DCM" /> <ps_hwpc_event type="preset" name="PAPI_L2_ICM" /> <ps_hwpc_event type="preset" name="PAPI_TOT_CYC" /> <ps_hwpc_event type="preset" name="PAPI_TOT_INS" /> </ps_hwpc_eventlist> ----------------------------------------------------- step 9: psprocess a.out.27038.Node4.xml but this time I did not got the results I expected. The results are list here. What is wrong with my work? The zero result are not expected. ---------------------------------------------------------------------------------------- PerfSuite Hardware Performance Summary Report Version : 1.0 Created : Fri Jan 12 03:39:48 PM CST 2007 Generator : psprocess 0.3 XML Source : a.out.27038.Node4.xml Execution Information =========================================================================================== Collector : libpshwpc Date : Fri Jan 12 15:39:35 2007 Host : Node4 User : telnet Command : a.out Processor and System Information =========================================================================================== Node CPUs : 2 Vendor : Intel Family : Itanium 2 CPU Revision : 2 Clock (MHz) : 1600.030 Memory (MB) : 4032.42 Pagesize (KB) : 16 Cache Information =========================================================================================== Cache levels : 3 -------------------------------- Level 1 Type : data Size (KB) : 16 Linesize (B) : 64 Assoc : 4 Type : instruction Size (KB) : 16 Linesize (B) : 64 Assoc : 4 -------------------------------- Level 2 Type : unified Size (KB) : 256 Linesize (B) : 128 Assoc : 8 -------------------------------- Level 3 Type : unified Size (KB) : 3072 Linesize (B) : 128 Assoc : 6 Index Description ounter Value =========================================================================================== 1 Level 1 data cache misses........................................0 2 Level 2 instruction cache misses.................................0 3 Total cycles.....................................................0 4 Instructions completed...........................................0 Event Index =================================================================================== 1: PAPI_L1_DCM 2: PAPI_L2_ICM 3: PAPI_TOT_CYC 4: PAP_TOT_INS Statistics =========================================================================================== Counting domain........................................................ user Multiplexed............................................................ yes MIPS (wall clock)...................................................... 0.000 CPU time (seconds)..................................................... 0.000 Wall clock time (seconds).............................................. 0.000 % CPU utilization...................................................... 0.000 ------------------------------------------------------------------------------------------------------------------step 10: I use "psinv -p " to see the availble events of my machine. PAPI Standard Event Details - Non-derived: PAPI_BR_INS: Branch instructions PAPI_BR_PRC: Conditional branch instructions correctly predicted PAPI_CA_SNP: Requests for a snoop PAPI_FP_OPS: Floating point operations PAPI_FP_STAL: Cycles the FP unit(s) are stalled PAPI_L1_DCA: Level 1 data cache accesses PAPI_L1_DCM: Level 1 data cache misses PAPI_L1_DCR: Level 1 data cache reads PAPI_L1_ICM: Level 1 instruction cache misses PAPI_L2_DCA: Level 2 data cache accesses PAPI_L2_DCR: Level 2 data cache reads PAPI_L2_DCW: Level 2 data cache writes PAPI_L2_ICA: Level 2 instruction cache accesses PAPI_L2_ICM: Level 2 instruction cache misses PAPI_L2_LDM: Level 2 load misses PAPI_L2_STM: Level 2 store misses PAPI_L2_TCA: Level 2 total cache accesses PAPI_L2_TCM: Level 2 cache misses PAPI_L2_TCW: Level 2 total cache writes PAPI_L3_DCA: Level 3 data cache accesses PAPI_L3_DCR: Level 3 data cache reads PAPI_L3_DCW: Level 3 data cache writes PAPI_L3_ICA: Level 3 instruction cache accesses PAPI_L3_ICH: Level 3 instruction cache hits PAPI_L3_ICM: Level 3 instruction cache misses PAPI_L3_ICR: Level 3 instruction cache reads PAPI_L3_LDM: Level 3 load misses PAPI_L3_STM: Level 3 store misses PAPI_L3_TCA: Level 3 total cache accesses PAPI_L3_TCM: Level 3 cache misses PAPI_L3_TCR: Level 3 total cache reads PAPI_L3_TCW: Level 3 total cache writes PAPI_LD_INS: Load instructions PAPI_RES_STL: Cycles stalled on any resource PAPI_SR_INS: Store instructions PAPI_STL_CCY: Cycles with no instructions completed PAPI_STL_ICY: Cycles with no instruction issue PAPI_TLB_DM: Data translation lookaside buffer misses PAPI_TLB_IM: Instruction translation lookaside buffer misses PAPI_TOT_CYC: Total cycles PAPI_TOT_IIS: Instructions issued Derived: PAPI_BR_MSP: Conditional branch instructions mispredicted PAPI_CA_INV: Requests for cache line invalidation PAPI_L1_DCH: Level 1 data cache hits PAPI_L1_ICA: Level 1 instruction cache accesses PAPI_L1_ICR: Level 1 instruction cache reads PAPI_L1_LDM: Level 1 load misses PAPI_L1_TCA: Level 1 total cache accesses PAPI_L1_TCM: Level 1 cache misses PAPI_L1_TCR: Level 1 total cache reads PAPI_L2_DCH: Level 2 data cache hits PAPI_L2_DCM: Level 2 data cache misses PAPI_L2_ICR: Level 2 instruction cache reads PAPI_L2_TCH: Level 2 total cache hits PAPI_L2_TCR: Level 2 total cache reads PAPI_L3_DCH: Level 3 data cache hits PAPI_L3_DCM: Level 3 data cache misses PAPI_L3_TCH: Level 3 total cache hits PAPI_TLB_TL: Total translation lookaside buffer misses PAPI_TOT_INS: Instructions completed ---------------------------------------------------------------------------------------------------------------- step 11: I tried to add other events, but I got the same results. all the results are zero step 12: Finished. What is the problem happen ? Waiting your reply. Best Reagard Dawei Liu Renmin University of China 100872 Beijing,China --------------------------------- 抢注雅虎免费邮箱-3.5G容量,20M附件! |
From: Rick K. <rk...@nc...> - 2007-01-11 13:25:16
|
Dawei, You may be interested in a paper that was given at the 2005 Linux Clusters= =20 Institute conference in which the step-by-step operation of the psrun=20 command is described. An electronic copy of the paper can be found here: http://perfsuite.ncsa.uiuc.edu/publications/LCI-2005.pdf One of the most important points is that the psrun command itself does not= =20 participate in the monitoring nor does it write out any of the XML=20 documents that result. psrun's only purpose is to interpret command line= =20 arguments and set up the runtime environment for the command that is being= =20 monitored. Strictly speaking, the psrun command does not need to be used= =20 at all to do performance monitoring if one sets up the environment "by=20 hand" properly. Regarding the XML documents attached in your message, I can see that they= =20 are monitoring several different executables: timestensubd, ttcserver, and= =20 ttDaemonAdmin, and it would appear that they correspond to the parent=20 process as well as spawned processes/threads, as you describe. However, they appear to have consumed very little CPU time, which could=20 certainly be the case for daemon-type threads that may primarily be=20 waiting on I/O. Additionally, they are not profiles but are documents=20 gathered in aggregate event count mode which by default uses counter=20 multiplexing from PAPI. If a process does not use much CPU time, there=20 will not be enough time for counts to accumulate across all the events=20 selected and the resulting numbers will be zero. You can also see this=20 behavior if you use psrun with a simple, short-running command like "ls": psrun ls One way to test this is to run psrun with a configuration file that does=20 not require multiplexing. You can do this by making a copy of the default= =20 configuration file ("psrun -h" will show you its location) and removing=20 all but one of the events listed (e.g. PAPI_TOT_CYC), then rerun psrun and see if the event counts are non-zero: setenv PS_HWPC_CONFIG your_new_configuration.xml psrun ls On a separate matter: if you are indeed interested in gathering a profile= =20 and not aggregate events, you may instead want to use a profiling=20 configuration, such as "papi_profile_cycles.xml" (contained in the=20 PerfSuite distribution). Again, however, if a process does not use much=20 CPU time during its lifetime, you may not get meaningful results from=20 profiling since there would not be enough time to gather samples from the= =20 statistical profiling method that PerfSuite uses. Keep in mind also that= =20 CPU time is different than wallclock time in that a process may be in=20 existence for a significant amount of wallclock time but still use very=20 little CPU time. Rick On Tue, 9 Jan 2007, liudawei wrote: > Thank you for your reply.I have tried according your suggestion,but probl= em does not resolved.I think I don not clearly described the problem, I don= not think it is the programe donot consume enough cpu tim. the reason why t= he profiled results is zero may be explain via the example here: > > consider the following command > > psrun -f p1.exe > > In fact, psrun will create a thread (say, t1) to execute p1.exe . > but if the p1 is a type of daemon process. what will happen? > t1 will create a new thread ( say,t2), when the t2 is created successful= ly, t2 will kill t1,this t2 just is the daemon. (This is how the daemon ty= pe process created.) > but when t1 is killed, psrun will finish to collect CPU parameters right = now. > is this right? > > -------------------------------------------------------------------------= -------------------------------------- > real application scenario: > > ttDaemonAdmin is a daemon type process,which is to start a database serve= r,I just want to profile it via psrun.so I use the following command: > > psrun -f ttDaemonAdmin -start > > here,ttDaemonAdmin is real type of daemon process, when it is started, I = will connect it from a remote client and send some query to server. This = ttDaemonAdmin will received my request from the client and produce a subdem= on process (is real type of daemon process too) to do a few works in the s= erver. when i finished my connection to the server. I will stop the ttDaemo= nAdmin. so the psrun will finished too. Then ,I will got a few of xml file= s. but after processed by psprocess. the all results is zero. > > Attached the profile results,there are six files. > the ttDaemonAdmin.15261.Node1.xml is the results correspond to the ttDaem= onAdmin process.The other five xml files is the child process produced by d= dDaemonAdmin, they are all the real type of daemon process too. > > anxious for how to profile these type of process? > > Best regards > > > > > > > > > > > > > > > > > > > > > > > > > Rick Kufrin <rk...@nc...> =D0=B4=B5=C0=A3=BA Dawei, > > One thing you might consider trying is to replace the executable that > corresponds to the daemon program with a wrapper script that invokes psru= n > on the true daemon program. For example, rename Child.exe to > Child.exe.orig and create a new file named Child.exe that is a shell > script that runs the command "psrun Child.exe.orig", passing along any > arguments necessary for Child.exe. This would be the quickest way to see > if things might work as a first cut. > > Also, psrun does not actually do any performance monitoring itself nor > does it write out the XML result documents; this is done through librarie= s > that are "inserted" into the program that is being monitored. > > The usual reason for zeroes being reported are that the program does not > consume enough CPU time to collect any data, and it sounds like this is > what you may be seeing here. > > Rick > > On Sun, 7 Jan 2007, liudawei wrote: > >> Hi: >> >> I have encountered a problem when I use psrun to profile a Demon process= and its child process is also Demon process. The example is here. >> >> Psrun -f Daemon.exe >> >> Iet me describe the how this Daemon work, I want to profile the Daemon = and its child process. here, all the child process is daemon process too. T= he problem I encountered is that all the results(xml file produced by psrun= ) is zero. I try to read the source code of the psrun. I guess that psrun w= ill fork a process to execute the program which is profiled (here is the Da= emon.exe). But as we know that if the program to be profiled is daemon proc= ess, I think when the daemon process is created successfully, it will kill = his father process, however, psrun is just waiting for this signal to finis= hed the profile. so, if the Daemon is start,the psrun will stop right now. = so, all the results is zero. >> >> Can any one give some suggestion? or give a patch to psrun, so let psrun= can monitor daemon type program. >> >> You can download the Oracle TimesTen , then use psrun to verify the prob= lem I encountred. >> >> >> Waiting for a reply. >> >> __________________________________________________ >> =B8=CF=BF=EC=D7=A2=B2=E1=D1=C5=BB=A2=B3=AC=B4=F3=C8=DD=C1=BF=C3=E2=B7=D1= =D3=CA=CF=E4? >> http://cn.mail.yahoo.com > > __________________________________________________ > =B8=CF=BF=EC=D7=A2=B2=E1=D1=C5=BB=A2=B3=AC=B4=F3=C8=DD=C1=BF=C3=E2=B7=D1= =D3=CA=CF=E4? > http://cn.mail.yahoo.com |
From: Rick K. <rk...@nc...> - 2007-01-08 17:08:36
|
Dawei, One thing you might consider trying is to replace the executable that=20 corresponds to the daemon program with a wrapper script that invokes psrun= =20 on the true daemon program. For example, rename Child.exe to=20 Child.exe.orig and create a new file named Child.exe that is a shell=20 script that runs the command "psrun Child.exe.orig", passing along any=20 arguments necessary for Child.exe. This would be the quickest way to see if things might work as a first cut. Also, psrun does not actually do any performance monitoring itself nor=20 does it write out the XML result documents; this is done through libraries= =20 that are "inserted" into the program that is being monitored. The usual reason for zeroes being reported are that the program does not=20 consume enough CPU time to collect any data, and it sounds like this is=20 what you may be seeing here. Rick On Sun, 7 Jan 2007, liudawei wrote: > Hi: > > I have encountered a problem when I use psrun to profile a Demon process = and its child process is also Demon process. The example is here. > > Psrun -f Daemon.exe > > Iet me describe the how this Daemon work, I want to profile the Daemon a= nd its child process. here, all the child process is daemon process too. Th= e problem I encountered is that all the results(xml file produced by psrun)= is zero. I try to read the source code of the psrun. I guess that psrun wi= ll fork a process to execute the program which is profiled (here is the Dae= mon.exe). But as we know that if the program to be profiled is daemon proce= ss, I think when the daemon process is created successfully, it will kill h= is father process, however, psrun is just waiting for this signal to finish= ed the profile. so, if the Daemon is start,the psrun will stop right now. s= o, all the results is zero. > > Can any one give some suggestion? or give a patch to psrun, so let psrun = can monitor daemon type program. > > You can download the Oracle TimesTen , then use psrun to verify the probl= em I encountred. > > > Waiting for a reply. > > __________________________________________________ > =B8=CF=BF=EC=D7=A2=B2=E1=D1=C5=BB=A2=B3=AC=B4=F3=C8=DD=C1=BF=C3=E2=B7=D1= =D3=CA=CF=E4? > http://cn.mail.yahoo.com |
From: liudawei <dbm...@ya...> - 2007-01-07 12:04:18
|
Hi: I have encountered a problem when I use psrun to profile a Demon process and its child process is also Demon process. The example is here. Psrun -f Daemon.exe Iet me describe the how this Daemon work, I want to profile the Daemon and its child process. here, all the child process is daemon process too. The problem I encountered is that all the results(xml file produced by psrun) is zero. I try to read the source code of the psrun. I guess that psrun will fork a process to execute the program which is profiled (here is the Daemon.exe). But as we know that if the program to be profiled is daemon process, I think when the daemon process is created successfully, it will kill his father process, however, psrun is just waiting for this signal to finished the profile. so, if the Daemon is start,the psrun will stop right now. so, all the results is zero. Can any one give some suggestion? or give a patch to psrun, so let psrun can monitor daemon type program. You can download the Oracle TimesTen , then use psrun to verify the problem I encountred. Waiting for a reply. __________________________________________________ 赶快注册雅虎超大容量免费邮箱? http://cn.mail.yahoo.com |
From: liudawei <dbm...@ya...> - 2007-01-07 07:03:22
|
I used perfsuite to profile a database server(Oracle TimesTen) process via the following command: psrun -f ttDaemonAdmin -start when the server stoped, I got a few of result file (xml file) which include the child process profile, but all the counters results is zero. ttDaemonAdmin is a command to start the database server, it will start a daemon process, which is a single multithreaded process. --------------------------------- Mp3疯狂搜-新歌热歌高速下 |
From: Rick K. <rk...@nc...> - 2007-01-04 16:38:45
|
Dawei, On Wed, 3 Jan 2007, liudawei wrote: > I have encountered the same problem.I am now using AMD too > the conflicts events is PAPI_BR_MSP and PAPI_L1_TCM > the two events can not be configured in the single xml configuration file. > Any comments? There were no updates to the AMD K8 default configuration file included in the most recent (0.6.2a6) release of PerfSuite, and so the behavior reported earlier still exists, as you've discovered. At present, the best way to address/fix is to manually edit the default configuration file and remove one of the offending events. If you run the command: psrun -h (after installation of PerfSuite), it should list the directory and name of the default configuration file on your system that can be tailored as needed. Rick |
From: Rick K. <rk...@nc...> - 2007-01-04 16:32:31
|
Chunhua, > I was trying to use some perfmon configuration files (e.g. pfm_ipc.xml) > with the libpshwpc library on cobalt, using the environment variable > PS_HWPC_CONFIG. But no result xml files were generated. The same > procedure works for regular PAPI configuration files (class=PAPI) though. > > I tested it using "psrun -c pfm_ipc.xml a.out" and got the following message > "libpsrun fatal error: a required software component not installed" > > How can I fix it? You may have figured this out already, but in case not: "Cobalt" is an SGI Altix operated at NCSA that runs SGI ProPack 4, which uses a 2.6-based Linux kernel. The current release of PerfSuite, while supporting 2.6 kernels, does not support direct use of the Perfmon driver and user library, which is necessary for XML configuration files of class "perfmon". There is a workaround that has been mentioned before, though. For reference, the thread at the SF mailing lists is: http://sourceforge.net/mailarchive/forum.php?thread_id=30932390&forum_id=39162 Rick |
From: liudawei <dbm...@ya...> - 2007-01-03 03:54:10
|
I have encountered the same problem.I am now using AMD too the conflicts events is PAPI_BR_MSP and PAPI_L1_TCM the two events can not be configured in the single xml configuration file. Any comments? per...@li... 写道: Send PerfSuite-users mailing list submissions to per...@li... To subscribe or unsubscribe via the World Wide Web, visit https://lists.sourceforge.net/lists/listinfo/perfsuite-users or, via email, send a message with subject or body 'help' to per...@li... You can reach the person managing the list at per...@li... When replying, please edit your Subject line so it is more specific than "Re: Contents of PerfSuite-users digest..." Today's Topics: 1. Re: psrun and 2.6 kernel (Niall Moran) 2. Re: psrun and 2.6 kernel (Niall Moran) 3. Re: psrun and 2.6 kernel (Rick Kufrin) 4. Re: psrun and 2.6 kernel (Niall Moran) ---------------------------------------------------------------------- Message: 1 Date: Mon, 20 Nov 2006 12:46:17 +0000 From: Niall Moran Subject: Re: [PerfSuite-users] psrun and 2.6 kernel To: Rick Kufrin Cc: per...@li... Message-ID: <200...@ic...> Content-Type: text/plain; charset="us-ascii" Hi Rick, > >I have just built and installed perfsuite 0.6.2 alpha5. I built it with > >papi support and built it against papi 3.2.1. I am running on an amd > >opteron with a perfctr patched SLES 2.6.5-7 kernel. Once built all the > >perfsuite tests completed correctly. However when I try to run psrun I > >get the following error: > > > > libpsrun fatal error: error reported by performance software layer > > > >I found on an earlier thread that this error was a > >result of using a 2.6 kernel when there was no support for it. However > >version 0.6.2 alpha 5 lists the 2.6 kernel. Any help or suggestions > >would be much appreciated. > > > > You're correct, the 2.6 kernel is now supported. My guess about what's > happening is that the events being selected by default are including > one that isn't available in PAPI 3.2.1. For your installation, the > defaults should be in the file: > > $PREFIX/share/perfsuite/xml/pshwpc/papi3_k8.xml > > I would suggest running "psinv -p" to verify which PAPI events are > available on your system and compare those with the events listed in > the above file. If there is one that does not match, just comment it out > in the file and try again. If that fixes the problem, then please let me > know and I'll make the appropriate change in the distribution for the next > release. If it doesn't then advise of that as well and we can look a > little closer to try and get you up and running. Thanks for getting back to me. I have checked the list of supported events from psinv -p against the events listed in the papi3_k8.xml configuration file and all the events in the config file are supported. I also ran psrun -i to ensure that it is using the papi3_k8.xml config file. Attached is the output of psinv -p and also the the config file papi3_k8.xml. Niall. -------------- next part -------------- System Information - Node Name: h1cu36 OS Name: Linux OS Release: 2.6.5-7.282-smp_perfctr OS Build/Version: #5 SMP Wed Nov 8 14:13:10 GMT 2006 OS Machine: x86_64 Processors: 2 Total Memory (MB): 3959.38 System Page Size (KB): 4.00 Processor Information - Vendor: AMD Processor family: K8 Brand: AMD Opteron(tm) Processor 250 Model: Opteron () Revision: 10 Clock Speed: 997.06 MHz Cache and TLB Information - Cache levels: 2 Caches/TLBs: 7 Cache Details - Level 1: Type: Data Size: 64 KB Line size: 64 bytes Associativity: 2-way set associative Type: Instruction Size: 64 KB Line size: 64 bytes Associativity: 2-way set associative Level 2: Type: Unified Size: 1.00 MB Line size: 64 bytes Associativity: 16-way set associative TLB Details - Level 1: Type: Data Entries: 32 Pagesize (KB): 4 Associativity: Fully associative Type: Instruction Entries: 32 Pagesize (KB): 4 Associativity: Fully associative Level 2: Type: Instruction Entries: 512 Pagesize (KB): 4 Associativity: 4-way set associative Type: Data Entries: 512 Pagesize (KB): 4 Associativity: 4-way set associative PAPI Version Information - Major version: 3 Minor version: 2 Revision: 1 PAPI Standard Event Information - Standard events: 42 Non-derived events: 29 Derived events: 13 PAPI Standard Event Details - Non-derived: PAPI_BR_INS: Branch instructions PAPI_BR_MSP: Conditional branch instructions mispredicted PAPI_BR_TKN: Conditional branch instructions taken PAPI_FAD_INS: Floating point add instructions PAPI_FML_INS: Floating point multiply instructions PAPI_FPU_IDL: Cycles floating point units are idle PAPI_FP_INS: Floating point instructions PAPI_FP_OPS: Floating point operations PAPI_HW_INT: Hardware interrupts PAPI_L1_DCA: Level 1 data cache accesses PAPI_L1_ICA: Level 1 instruction cache accesses PAPI_L1_ICR: Level 1 instruction cache reads PAPI_L1_LDM: Level 1 load misses PAPI_L1_STM: Level 1 store misses PAPI_L2_DCH: Level 2 data cache hits PAPI_L2_DCM: Level 2 data cache misses PAPI_L2_DCR: Level 2 data cache reads PAPI_L2_DCW: Level 2 data cache writes PAPI_L2_ICH: Level 2 instruction cache hits PAPI_L2_ICM: Level 2 instruction cache misses PAPI_L2_LDM: Level 2 load misses PAPI_L2_STM: Level 2 store misses PAPI_RES_STL: Cycles stalled on any resource PAPI_STL_ICY: Cycles with no instruction issue PAPI_TLB_DM: Data translation lookaside buffer misses PAPI_TLB_IM: Instruction translation lookaside buffer misses PAPI_TOT_CYC: Total cycles PAPI_TOT_INS: Instructions completed PAPI_VEC_INS: Vector/SIMD instructions Derived: PAPI_L1_DCH: Level 1 data cache hits PAPI_L1_DCM: Level 1 data cache misses PAPI_L1_ICH: Level 1 instruction cache hits PAPI_L1_ICM: Level 1 instruction cache misses PAPI_L1_TCA: Level 1 total cache accesses PAPI_L1_TCH: Level 1 total cache hits PAPI_L1_TCM: Level 1 cache misses PAPI_L2_DCA: Level 2 data cache accesses PAPI_L2_ICA: Level 2 instruction cache accesses PAPI_L2_TCA: Level 2 total cache accesses PAPI_L2_TCH: Level 2 total cache hits PAPI_L2_TCM: Level 2 cache misses PAPI_TLB_TL: Total translation lookaside buffer misses -------------- next part -------------- A non-text attachment was scrubbed... Name: papi3_k8.xml Type: application/xml Size: 2769 bytes Desc: not available Url : http://sourceforge.net/mailarchive/forum.php?forum=perfsuite-users/attachments/20061120/cc5c39d8/attachment.rdf -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: Digital signature Url : http://sourceforge.net/mailarchive/forum.php?forum=perfsuite-users/attachments/20061120/cc5c39d8/attachment.bin ------------------------------ Message: 2 Date: Mon, 20 Nov 2006 13:11:44 +0000 From: Niall Moran Subject: Re: [PerfSuite-users] psrun and 2.6 kernel To: Rick Kufrin Cc: per...@li... Message-ID: <200...@ic...> Content-Type: text/plain; charset="us-ascii" > > >I have just built and installed perfsuite 0.6.2 alpha5. I built it with > > >papi support and built it against papi 3.2.1. I am running on an amd > > >opteron with a perfctr patched SLES 2.6.5-7 kernel. Once built all the > > >perfsuite tests completed correctly. However when I try to run psrun I > > >get the following error: > > > > > > libpsrun fatal error: error reported by performance software layer > > > > > >I found on an earlier thread that this error was a > > >result of using a 2.6 kernel when there was no support for it. However > > >version 0.6.2 alpha 5 lists the 2.6 kernel. Any help or suggestions > > >would be much appreciated. > > > > > > > You're correct, the 2.6 kernel is now supported. My guess about what's > > happening is that the events being selected by default are including > > one that isn't available in PAPI 3.2.1. For your installation, the > > defaults should be in the file: > > > > $PREFIX/share/perfsuite/xml/pshwpc/papi3_k8.xml > > > > I would suggest running "psinv -p" to verify which PAPI events are > > available on your system and compare those with the events listed in > > the above file. If there is one that does not match, just comment it out > > in the file and try again. If that fixes the problem, then please let me > > know and I'll make the appropriate change in the distribution for the next > > release. If it doesn't then advise of that as well and we can look a > > little closer to try and get you up and running. > > > Thanks for getting back to me. I have checked the list of supported > events from psinv -p against the events listed in the papi3_k8.xml configuration file > and all the events in the config file are supported. I also ran psrun -i > to ensure that it is using the papi3_k8.xml config file. Attached is the > output of psinv -p and also the the config file papi3_k8.xml. I removed all events from the papi3_k8.xml config file and started adding them back in one by one. Everything works fine with all be events except for PAPI_L1_TCM and PAPI_L2_TCA even though both these events appear in the psinv -p output. Niall. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: Digital signature Url : http://sourceforge.net/mailarchive/forum.php?forum=perfsuite-users/attachments/20061120/72d5e98c/attachment.bin ------------------------------ Message: 3 Date: Mon, 20 Nov 2006 07:39:03 -0600 (CST) From: Rick Kufrin Subject: Re: [PerfSuite-users] psrun and 2.6 kernel To: Niall Moran Cc: per...@li... Message-ID: Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Niall, >> Thanks for getting back to me. I have checked the list of supported >> events from psinv -p against the events listed in the papi3_k8.xml configuration file >> and all the events in the config file are supported. I also ran psrun -i >> to ensure that it is using the papi3_k8.xml config file. Attached is the >> output of psinv -p and also the the config file papi3_k8.xml. > > I removed all events from the papi3_k8.xml config file and started > adding them back in one by one. Everything works fine with all be > events except for PAPI_L1_TCM and PAPI_L2_TCA even though both these > events appear in the psinv -p output. Sounds like some good detective work on your end, thanks for checking into things and reporting. Now, as to why this is going on I have to admit being stumped at the moment. This would seem at first glance to be something going wrong in either the PAPI or perfctr layers. Within the PAPI 3.2.1 distribution there is a utility called "papi_avail" that can print out information about mappings of PAPI events to the native counters that may provide a clue. It's invoked as "papi_avail -e EVENTNAME" where EVENTNAME is the name of the PAPI event one wants more information about. You might try that utility with the events above and see what the output is to get started. Unfortunately, I do not have access to an Opteron with the kernel support required to test this out, but if this limitation is a problem for you we can try to look further into it. Rick p.s. for detailed debugging information, PerfSuite's configure script supports an option --enable-debug and the software recognizes an environment variable PS_DEBUG which if set to 3 or higher will dump a lot of information about what's going on underneath to help locate sources of errors. ------------------------------ Message: 4 Date: Mon, 20 Nov 2006 15:11:50 +0000 From: Niall Moran Subject: Re: [PerfSuite-users] psrun and 2.6 kernel To: Rick Kufrin Cc: per...@li... Message-ID: <200...@ic...> Content-Type: text/plain; charset="us-ascii" > >I removed all events from the papi3_k8.xml config file and started > >adding them back in one by one. Everything works fine with all be > >events except for PAPI_L1_TCM and PAPI_L2_TCA even though both these > >events appear in the psinv -p output. > > Sounds like some good detective work on your end, thanks for checking into > things and reporting. Now, as to why this is going on I have to admit > being stumped at the moment. This would seem at first glance to be > something going wrong in either the PAPI or perfctr layers. Within the > PAPI 3.2.1 distribution there is a utility called "papi_avail" that can > print out information about mappings of PAPI events to the native counters > that may provide a clue. It's invoked as "papi_avail -e EVENTNAME" where > EVENTNAME is the name of the PAPI event one wants more information about. > You might try that utility with the events above and see what the output > is to get started. Unfortunately, I do not have access to an Opteron with > the kernel support required to test this out, but if this limitation is a > problem for you we can try to look further into it. > > p.s. for detailed debugging information, PerfSuite's configure script > supports an option --enable-debug and the software recognizes an > environment variable PS_DEBUG which if set to 3 or higher will dump a lot > of information about what's going on underneath to help locate sources of > errors. Running papi_avail for both these events works fine and says that they pass. I then tried adding both these events and these alone to the config file and it worked fine. It appears that there are conflicts between these events and some of the others. I ran into a similar problem using papi with itc on an itanium2. I will recompile perfsuite with the debugging enabled and will experiment with different configs to try to get an idea of which events are conflicting with which. Niall. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: Digital signature Url : http://sourceforge.net/mailarchive/forum.php?forum=perfsuite-users/attachments/20061120/770800ff/attachment.bin ------------------------------ ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys - and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV ------------------------------ _______________________________________________ PerfSuite-users mailing list Per...@li... https://lists.sourceforge.net/lists/listinfo/perfsuite-users End of PerfSuite-users Digest, Vol 4, Issue 5 ********************************************* __________________________________________________ 赶快注册雅虎超大容量免费邮箱? http://cn.mail.yahoo.com |
From: Chunhua L. <li...@cs...> - 2006-12-27 17:34:31
|
Hi, I was trying to use some perfmon configuration files (e.g. pfm_ipc.xml) with the libpshwpc library on cobalt, using the environment variable PS_HWPC_CONFIG. But no result xml files were generated. The same procedure works for regular PAPI configuration files (class=3DPAPI) thoug= h. I tested it using "psrun -c pfm_ipc.xml a.out" and got the following mess= age "libpsrun fatal error: a required software component not installed" How can I fix it? Thanks in advance! Chunhua Liao |