You can subscribe to this list here.
2004 |
Jan
|
Feb
(2) |
Mar
(2) |
Apr
|
May
|
Jun
(1) |
Jul
(6) |
Aug
(3) |
Sep
|
Oct
(1) |
Nov
|
Dec
(2) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2005 |
Jan
(2) |
Feb
(2) |
Mar
|
Apr
(6) |
May
|
Jun
(4) |
Jul
(3) |
Aug
|
Sep
|
Oct
(2) |
Nov
(12) |
Dec
(10) |
2006 |
Jan
(27) |
Feb
(4) |
Mar
(3) |
Apr
(5) |
May
(5) |
Jun
(1) |
Jul
(2) |
Aug
|
Sep
(7) |
Oct
(5) |
Nov
(11) |
Dec
(5) |
2007 |
Jan
(15) |
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
(3) |
Sep
(1) |
Oct
|
Nov
(1) |
Dec
|
2008 |
Jan
(7) |
Feb
(9) |
Mar
(2) |
Apr
(1) |
May
|
Jun
(6) |
Jul
(2) |
Aug
|
Sep
|
Oct
(1) |
Nov
(3) |
Dec
(1) |
2009 |
Jan
(11) |
Feb
|
Mar
(2) |
Apr
(1) |
May
(8) |
Jun
(11) |
Jul
(9) |
Aug
(12) |
Sep
(1) |
Oct
(3) |
Nov
(10) |
Dec
|
2010 |
Jan
(3) |
Feb
(1) |
Mar
(5) |
Apr
|
May
|
Jun
|
Jul
|
Aug
(2) |
Sep
(1) |
Oct
(1) |
Nov
|
Dec
|
2011 |
Jan
(2) |
Feb
(2) |
Mar
(1) |
Apr
|
May
(1) |
Jun
|
Jul
|
Aug
(1) |
Sep
|
Oct
(2) |
Nov
|
Dec
|
2012 |
Jan
|
Feb
|
Mar
|
Apr
|
May
(1) |
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2013 |
Jan
(1) |
Feb
|
Mar
|
Apr
(3) |
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
(1) |
Nov
(1) |
Dec
(1) |
2014 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
(1) |
Oct
|
Nov
|
Dec
|
2015 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
(1) |
Aug
|
Sep
|
Oct
(2) |
Nov
|
Dec
|
From: George M. <ge...@ma...> - 2009-06-11 20:08:22
|
Dear Rick, First of all thanks for the help, it was a simple problem but because I am trying a lot of stuff I forgot about it. The problem is solved. About my second question. Basically I want to measure only two hardware counters PAPI_FP_OPS and PAPI_TOT_CYC in order to take Mflops. I ask about how accurate it is because I compared results from profiling matrix multipication (C, MPI, ScalaPack with psrun) with another profiling tool and with perfsuite I had 960 mflops per cpu but with the other one almost 1020 mflops. Multiplexing was enabled, so is it possible to loose so many flops? Also PerfSuite measure also MPI command's flops (all_reduce etc?). I was trying to figure out why there was such a difference. If PerfSuite use statistical sampling then it is possible to loose some data? Best regards, George Markomanolis On Thu, Jun 11, 2009 at 8:17 PM, Rick Kufrin <rk...@il...> wrote: > George, > > Hopefully this is a simple problem to fix: my guess is that you configured > and built PerfSuite with a different Fortran compiler than the one you are > using to compile the NAS benchmarks. For example, you configured using the > default Fortran compiler (typically GNU g77 or gfortran) but built the > benchmarks with Intel's ifort compiler. If this is the case, then try > reconfiguring and building PerfSuite, this time specifying a compiler > explicitly with the variable F77. E.g., > > $ ./configure --prefix=XXX --with-papi=XXX F77=ifort > > If this doesn't solve the linking problem, please let us know. > > Regarding the second question: I think you may be referring to two > different modes of operation here. When doing profiling, PerfSuite does > indeed use statistical sampling. This usually requires a user-selected > configuration file, though, one with the document type <ps_hwpc_profile>. > When simply counting the total number of event occurrences (which is where > the term "multiplexing" applies), you have some control over whether > multiplexing is used based on the number of events you've requested. If you > only request a few events, you are more likely to get a non-multiplexed run. > If it doesn't detect that it needs to, PerfSuite will avoid enabling > multiplexing. You can always check whether multiplexing was enabled by > looking at the output of "psprocess" or viewing the output XML document > directly. Probably the shortest route to what you need is to use a > configuration file that only requests PAPI_FP_OPS or PAPI_FP_INS. There are > a number of example configuration files installed in the directory > $PREFIX/share/perfsuite/xml/pshwpc that may help in writing a new one that > fits your needs. > > Hope that helps clear things up. > > Rick > > George Markomanolis wrote: > >> Dear all, >> >> I am sending you an email because I have a problem to profile a fortran >> program with libpshwc. I can profile the program with psrun but because I >> want to profile specific blocks of the code only, I would like to use >> |PSF_hwpc_init(ierr) |etc. The program is one from NAS Parallel benchmarks >> (FT) and when I compile it I see the errors >> >> ft.f:(.text+0x24): undefined reference to `psf_hwpc_init_' >> ft.f:(.text+0x76): undefined reference to `psf_hwpc_start_' >> ft.f:(.text+0x2b7): undefined reference to `psf_hwpc_stop_' >> ft.f:(.text+0x471): undefined reference to `psf_hwpc_shutdown_' >> >> ||I have included fperfsuite.h and I have linked with -lpshwpc -lperfsuite >> >> Could you propose something to solve my problem? >> Also a simple question: I am confused, PerfSuite measures with statistic >> sampling? I now that with multiplexing is less accurate. I ask because I >> want to count flops and I don't want to loose any info. >> >> Thanks in advance, >> Best regards, >> George Markomanolis >> >> >> >> ------------------------------------------------------------------------------ >> Crystal Reports - New Free Runtime and 30 Day Trial >> Check out the new simplified licensing option that enables unlimited >> royalty-free distribution of the report engine for externally facing >> server and web deployment. >> http://p.sf.net/sfu/businessobjects >> _______________________________________________ >> PerfSuite-users mailing list >> Per...@li... >> https://lists.sourceforge.net/lists/listinfo/perfsuite-users >> >> >> > > |
From: Rick K. <rk...@il...> - 2009-06-11 18:17:35
|
George, Hopefully this is a simple problem to fix: my guess is that you configured and built PerfSuite with a different Fortran compiler than the one you are using to compile the NAS benchmarks. For example, you configured using the default Fortran compiler (typically GNU g77 or gfortran) but built the benchmarks with Intel's ifort compiler. If this is the case, then try reconfiguring and building PerfSuite, this time specifying a compiler explicitly with the variable F77. E.g., $ ./configure --prefix=XXX --with-papi=XXX F77=ifort If this doesn't solve the linking problem, please let us know. Regarding the second question: I think you may be referring to two different modes of operation here. When doing profiling, PerfSuite does indeed use statistical sampling. This usually requires a user-selected configuration file, though, one with the document type <ps_hwpc_profile>. When simply counting the total number of event occurrences (which is where the term "multiplexing" applies), you have some control over whether multiplexing is used based on the number of events you've requested. If you only request a few events, you are more likely to get a non-multiplexed run. If it doesn't detect that it needs to, PerfSuite will avoid enabling multiplexing. You can always check whether multiplexing was enabled by looking at the output of "psprocess" or viewing the output XML document directly. Probably the shortest route to what you need is to use a configuration file that only requests PAPI_FP_OPS or PAPI_FP_INS. There are a number of example configuration files installed in the directory $PREFIX/share/perfsuite/xml/pshwpc that may help in writing a new one that fits your needs. Hope that helps clear things up. Rick George Markomanolis wrote: > Dear all, > > I am sending you an email because I have a problem to profile a fortran > program with libpshwc. I can profile the program with psrun but because > I want to profile specific blocks of the code only, I would like to use > |PSF_hwpc_init(ierr) |etc. The program is one from NAS Parallel > benchmarks (FT) and when I compile it I see the errors > > ft.f:(.text+0x24): undefined reference to `psf_hwpc_init_' > ft.f:(.text+0x76): undefined reference to `psf_hwpc_start_' > ft.f:(.text+0x2b7): undefined reference to `psf_hwpc_stop_' > ft.f:(.text+0x471): undefined reference to `psf_hwpc_shutdown_' > > ||I have included fperfsuite.h and I have linked with -lpshwpc -lperfsuite > > Could you propose something to solve my problem? > Also a simple question: I am confused, PerfSuite measures with statistic > sampling? I now that with multiplexing is less accurate. I ask because I > want to count flops and I don't want to loose any info. > > Thanks in advance, > Best regards, > George Markomanolis > > > ------------------------------------------------------------------------------ > Crystal Reports - New Free Runtime and 30 Day Trial > Check out the new simplified licensing option that enables unlimited > royalty-free distribution of the report engine for externally facing > server and web deployment. > http://p.sf.net/sfu/businessobjects > _______________________________________________ > PerfSuite-users mailing list > Per...@li... > https://lists.sourceforge.net/lists/listinfo/perfsuite-users > > |
From: George M. <ge...@ma...> - 2009-06-11 14:02:37
|
Dear all, I am sending you an email because I have a problem to profile a fortran program with libpshwc. I can profile the program with psrun but because I want to profile specific blocks of the code only, I would like to use |PSF_hwpc_init(ierr) |etc. The program is one from NAS Parallel benchmarks (FT) and when I compile it I see the errors ft.f:(.text+0x24): undefined reference to `psf_hwpc_init_' ft.f:(.text+0x76): undefined reference to `psf_hwpc_start_' ft.f:(.text+0x2b7): undefined reference to `psf_hwpc_stop_' ft.f:(.text+0x471): undefined reference to `psf_hwpc_shutdown_' ||I have included fperfsuite.h and I have linked with -lpshwpc -lperfsuite Could you propose something to solve my problem? Also a simple question: I am confused, PerfSuite measures with statistic sampling? I now that with multiplexing is less accurate. I ask because I want to count flops and I don't want to loose any info. Thanks in advance, Best regards, George Markomanolis |
From: Rick K. <rk...@il...> - 2009-05-27 15:40:25
|
Jean-Guillaume, There are two aspects to the question you asked: first relates to VProf compilation and the error you received, while the second has to do with PerfSuite status on the Cray XT system. The error message you are receiving during compilation of VProf is related to the version of the BFD library (part of binutils) that you have installed on your system. I believe "bfd_get_section_size_before_reloc" is defined as a macro in earlier versions of BFD. A substitute that may work is to replace the offending line in exec.cc with "bfd_section_size(abfd, section)". The VProf patch that was posted at the NCSA PerfSuite website is rather dated, and does not address the above issue. It appears that VProf itself is no longer in active development, with the last release at SourceForge dated June 2006. Due to that, it is likely that VProf-related work in PerfSuite will be removed in the future. Another reason for doing this is that more recent tools such as ParaProf from TAU and Cube from Scalasca can work with data generated by PerfSuite. Regarding the other question (Cray XT): PerfSuite has not been ported to these systems, and it is likely that it will not work unchanged. Although I am not an expert on that series of machine, I believe the OS lacks necessary support required for the psrun command to work properly - specifically, the ability to preload shared libraries via LD_PRELOAD. It may be possible to provide the PerfSuite C support libraries (libperfsuite and libpshwpc) on that platform, but this has not been tried or tested. Rick jgp wrote: > Hi, > > I would like to install vprof on our CRAY XT5 but I get the following > error message : > > * exec.cc:144: error: 'bfd_get_section_size_before_reloc' was not > declared in this scope > > My configuration is : > > * gcc/4.3.2 > * xt-papi/3.6.2 > * /sys/kernel/perfmon/version # => 2.3 > * uname -a > o Linux gele2 2.6.16.60-0.33_1.0102.4270.2.2.21A-ss #1 SMP Thu > Apr 23 12:05:13 PDT > o 2009 x86_64 x86_64 x86_64 GNU/Linux > > I've got the sources for vprof from http://aros.ca.sandia.gov/~cljanss/perf/vprof using the getsoftware script. > I also tried the sources from sourceforge : http://switch.dl.sourceforge.net/sourceforge/vprof/vprof1.2.tar.gz > > Both compilation failed. Is there a way to install perfsuite on CRAY XT ? > > Thanks for your help, jg. > > |
From: jgp <jg...@cs...> - 2009-05-27 14:00:26
|
Hi, I would like to install vprof on our CRAY XT5 but I get the following error message : * exec.cc:144: error: 'bfd_get_section_size_before_reloc' was not declared in this scope My configuration is : * gcc/4.3.2 * xt-papi/3.6.2 * /sys/kernel/perfmon/version # => 2.3 * uname -a o Linux gele2 2.6.16.60-0.33_1.0102.4270.2.2.21A-ss #1 SMP Thu Apr 23 12:05:13 PDT o 2009 x86_64 x86_64 x86_64 GNU/Linux I've got the sources for vprof from http://aros.ca.sandia.gov/~cljanss/perf/vprof using the getsoftware script. I also tried the sources from sourceforge : http://switch.dl.sourceforge.net/sourceforge/vprof/vprof1.2.tar.gz Both compilation failed. Is there a way to install perfsuite on CRAY XT ? Thanks for your help, jg. -- Jean-Guillaume Piccinali High Performance Scientific Computing Group CSCS - Swiss National Supercomputing Centre jg...@cs... - www.cscs.ch - +41 (0) 91 610 8278 |
From: Rick K. <rk...@il...> - 2009-05-08 15:55:04
|
Jie, The things you are describing are slightly confusing, I admit. Let me see if I can clear it up a little: The most important thing to understand is that native events are supported when using PAPI (as you are). The syntax in the configuration document (supplied via -c option to psrun or through PS_HWPC_CONFIG environment variable) is slightly different, though. You can find an example in $PREFIX/share/perfsuite/xml/pshwpc/papi3_ipc_itanium2.xml However, as you note, the psinv command will not report native events on your platform; for that, an interface to the underlying performance monitor library has to be present (this would be either perfctr or perfmon). Even though it does not report them, they are still usable as described above. You probably want to obtain the native event names through the papi_native_avail command, which apparently works on your platform. Your system (and many people's) is actually using a combination of perfctr and perfmon2 software. Perfctr is the actual driver in the kernel, but the perfmon2 user library (libpfm) is used by PAPI to map event names to actual register settings. The error you see when configuring using the "--with-pfm" option is a known bug that will be fixed in an upcoming release. Perfmon2 support does not currently exist in PerfSuite; it is only for a 2.4 IA-64 kernel that this may be used. "configure" should have failed with an appropriate informational message for you (but did not, that is the bug). I hope this clears things up - please let us know if this does not answer your questions fully. Rick jj...@nu... wrote: > Hi Kufrin, > > Recently I'm trying to build and install perfsuite-0.6.2. > My platform is Thinkpad T43. Following is the CPU information dumped by > the /proc filesystem. > > jiejiang@UT43:~/Perfsuite$ cat /proc/cpuinfo > processor : 0 > vendor_id : GenuineIntel > cpu family : 6 > model : 13 > model name : Intel(R) Pentium(R) M processor 2.00GHz > stepping : 8 > cpu MHz : 800.000 > cache size : 2048 KB > fdiv_bug : no > hlt_bug : no > f00f_bug : no > coma_bug : no > fpu : yes > fpu_exception : yes > cpuid level : 2 > wp : yes > flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov > pat clflush dts acpi mmx fxsr sse sse2 ss tm pbe nx up bts est tm2 > bogomips : 1596.08 > clflush size : 64 > power management: > > The operation system is Linux-2.6.27 with appropriate perfctr patch. > Note that here I use a separate perfctr patch downloaded from perfctr > website, not the one included in the PAPI release. > > We can confirm that perfctr works well using "perfex -i" command: > jiejiang@UT43:~/Perfsuite$ perfex -i > PerfCtr Info: > abi_version 0x05020501 > driver_version 2.6.37 DEBUG > cpu_type 14 (Intel Pentium M) > cpu_features 0x7 (rdpmc,rdtsc,pcint) > cpu_khz 1995105 > tsc_to_cpu_mult 1 > cpu_nrctrs 2 > cpus [0], total: 1 > cpus_forbidden [], total: 0 > > > After the above work, I built and installed PAPI-3.6.2. > The "papi_avail" and "papi_native_avail" commands output the > corresponding preset and native event lists separately. > > After installing all other necessary enabling packages, such as > TCL/TK/Expat/tDOM, I managed to build and install perfsuite-0.6.2. > And basically, psinv/psrun/psprocess/psconfig commands works as > expected. > > However, there are still some questions confusing me. > When issuing psinv with -e optio, it says: > > jiejiang@UT43:~/Perfsuite/perfsuite-0.6.2$ psinv -e > System Information - > Node Name: UT43 > OS Name: Linux > OS Release: 2.6.27-perfctr > OS Build/Version: #2 SMP Tue Apr 28 20:29:12 CST 2009 > OS Machine: i686 > Processors: 1 > Total Memory (MB): 1007.81 > System Page Size (KB): 4.00 > > Processor Information - > Vendor: Intel > Processor family: Pentium Pro (P6) > Brand: Intel(R) Pentium(R) M processor 2.00GHz > Model (Type): (unknown) > Revision: 8 > Clock Speed: 800.00 MHz > > Cache and TLB Information - > Cache levels: 2 > > Cache Details - > Level 1: > Type: Data > Size: 32 KB > Line size: 64 bytes > Associativity: 8-way set associative > > Type: Instruction > Size: 32 KB > Line size: 64 bytes > Associativity: 8-way set associative > > Level 2: > Type: Unified > Size: 2.00 MB > Line size: 64 bytes > Associativity: 8-way set associative > > TLB Details - > Level 1: > Type: Data > Entries: 8 > Pagesize (KB): 4096 > Associativity: 4-way set associative > > Type: Instruction > Entries: 2 > Pagesize (KB): 4096 > Associativity: Fully associative > > Type: Data > Entries: 128 > Pagesize (KB): 4 > Associativity: 4-way set associative > > Type: Instruction > Entries: 128 > Pagesize (KB): 4 > Associativity: 4-way set associative > > The "-e" (or "--events") option is not supported on this system > > The last line tells that the native event list is currently not > available. However, "papi_native_avail" can give the native event list > as expected. It seems that the source code of psinv relies on libpfm > API to retrieve the native event list. > > Then does it mean that libpfm is a MUST requirement to get the support > for NATIVE event? > > Then I tried to configure and build perfsuite with "-with-pfm > PFM_CPPFLAGS=-I/usr/local/papi-3.6.2/include > FPM_LDFLAGS=-L/usr/local/papi-3.6.2/lib" options, which specify the > libpfm built with PAPI. (That is, papi built a libpfm internally.) > However, the building process failed with following message: > > .... > gcc -DHAVE_CONFIG_H -I. -I../.. -I../../src/libperfsuite -DPS_DATADIR= > \"/usr/local/perfsuite-0.6.2/share/perfsuite/xml/pshwpc\" -DPS_DTDDIR= > \"/usr/local/perfsuite-0.6.2/share/perfsuite/dtds/pshwpc\" > -I/usr/local/papi-3.6.2/include -I/usr/local/papi-3.6.2/include -g -O2 > -MT hwpc.lo -MD -MP -MF .deps/hwpc.Tpo -c hwpc.c -fPIC -DPIC > -o .libs/hwpc.o > hwpc.c:138:28: error: hwpc-perfmon2.h: No such file or directory > hwpc.c: In function 'set_package': > hwpc.c:2192: error: 'get_num_counters_perfmon' undeclared (first use in > this function) > hwpc.c:2192: error: (Each undeclared identifier is reported only once > hwpc.c:2192: error: for each function it appears in.) > hwpc.c:2193: error: 'get_max_counters_perfmon' undeclared (first use in > this function) > hwpc.c:2194: error: 'init_counters_perfmon' undeclared (first use in > this function) > hwpc.c:2195: error: 'start_counters_perfmon' undeclared (first use in > this function) > hwpc.c:2196: error: 'read_counters_perfmon' undeclared (first use in > this function) > hwpc.c:2197: error: 'stop_counters_perfmon' undeclared (first use in > this function) > hwpc.c:2198: error: 'destroy_counters_perfmon' undeclared (first use in > this function) > hwpc.c:2199: error: 'init_profiling_perfmon' undeclared (first use in > this function) > hwpc.c:2200: error: 'start_profiling_perfmon' undeclared (first use in > this function) > hwpc.c:2201: error: 'suspend_profiling_perfmon' undeclared (first use in > this function) > hwpc.c:2202: error: 'stop_profiling_perfmon' undeclared (first use in > this function) > make[4]: *** [hwpc.lo] Error 1 > make[3]: *** [all-recursive] Error 1 > make[2]: *** [all-recursive] Error 1 > make[1]: *** [all-recursive] Error 1 > make: *** [all] Error 2 > > It seems that hwpc-perfmon2.h doesn't exist. > How to solve it? > > Regards, > Jie Jiang > > > > > > > ------------------------------------------------------------------------------ > The NEW KODAK i700 Series Scanners deliver under ANY circumstances! Your > production scanning environment may not be a perfect world - but thanks to > Kodak, there's a perfect scanner to get the job done! With the NEW KODAK i700 > Series Scanner you'll get full speed at 300 dpi even with all image > processing features enabled. http://p.sf.net/sfu/kodak-com > _______________________________________________ > PerfSuite-users mailing list > Per...@li... > https://lists.sourceforge.net/lists/listinfo/perfsuite-users > > |
From: <jj...@nu...> - 2009-05-08 14:46:59
|
Hi Kufrin, Recently I'm trying to build and install perfsuite-0.6.2. My platform is Thinkpad T43. Following is the CPU information dumped by the /proc filesystem. jiejiang@UT43:~/Perfsuite$ cat /proc/cpuinfo processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 13 model name : Intel(R) Pentium(R) M processor 2.00GHz stepping : 8 cpu MHz : 800.000 cache size : 2048 KB fdiv_bug : no hlt_bug : no f00f_bug : no coma_bug : no fpu : yes fpu_exception : yes cpuid level : 2 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat clflush dts acpi mmx fxsr sse sse2 ss tm pbe nx up bts est tm2 bogomips : 1596.08 clflush size : 64 power management: The operation system is Linux-2.6.27 with appropriate perfctr patch. Note that here I use a separate perfctr patch downloaded from perfctr website, not the one included in the PAPI release. We can confirm that perfctr works well using "perfex -i" command: jiejiang@UT43:~/Perfsuite$ perfex -i PerfCtr Info: abi_version 0x05020501 driver_version 2.6.37 DEBUG cpu_type 14 (Intel Pentium M) cpu_features 0x7 (rdpmc,rdtsc,pcint) cpu_khz 1995105 tsc_to_cpu_mult 1 cpu_nrctrs 2 cpus [0], total: 1 cpus_forbidden [], total: 0 After the above work, I built and installed PAPI-3.6.2. The "papi_avail" and "papi_native_avail" commands output the corresponding preset and native event lists separately. After installing all other necessary enabling packages, such as TCL/TK/Expat/tDOM, I managed to build and install perfsuite-0.6.2. And basically, psinv/psrun/psprocess/psconfig commands works as expected. However, there are still some questions confusing me. When issuing psinv with -e optio, it says: jiejiang@UT43:~/Perfsuite/perfsuite-0.6.2$ psinv -e System Information - Node Name: UT43 OS Name: Linux OS Release: 2.6.27-perfctr OS Build/Version: #2 SMP Tue Apr 28 20:29:12 CST 2009 OS Machine: i686 Processors: 1 Total Memory (MB): 1007.81 System Page Size (KB): 4.00 Processor Information - Vendor: Intel Processor family: Pentium Pro (P6) Brand: Intel(R) Pentium(R) M processor 2.00GHz Model (Type): (unknown) Revision: 8 Clock Speed: 800.00 MHz Cache and TLB Information - Cache levels: 2 Cache Details - Level 1: Type: Data Size: 32 KB Line size: 64 bytes Associativity: 8-way set associative Type: Instruction Size: 32 KB Line size: 64 bytes Associativity: 8-way set associative Level 2: Type: Unified Size: 2.00 MB Line size: 64 bytes Associativity: 8-way set associative TLB Details - Level 1: Type: Data Entries: 8 Pagesize (KB): 4096 Associativity: 4-way set associative Type: Instruction Entries: 2 Pagesize (KB): 4096 Associativity: Fully associative Type: Data Entries: 128 Pagesize (KB): 4 Associativity: 4-way set associative Type: Instruction Entries: 128 Pagesize (KB): 4 Associativity: 4-way set associative The "-e" (or "--events") option is not supported on this system The last line tells that the native event list is currently not available. However, "papi_native_avail" can give the native event list as expected. It seems that the source code of psinv relies on libpfm API to retrieve the native event list. Then does it mean that libpfm is a MUST requirement to get the support for NATIVE event? Then I tried to configure and build perfsuite with "-with-pfm PFM_CPPFLAGS=-I/usr/local/papi-3.6.2/include FPM_LDFLAGS=-L/usr/local/papi-3.6.2/lib" options, which specify the libpfm built with PAPI. (That is, papi built a libpfm internally.) However, the building process failed with following message: .... gcc -DHAVE_CONFIG_H -I. -I../.. -I../../src/libperfsuite -DPS_DATADIR= \"/usr/local/perfsuite-0.6.2/share/perfsuite/xml/pshwpc\" -DPS_DTDDIR= \"/usr/local/perfsuite-0.6.2/share/perfsuite/dtds/pshwpc\" -I/usr/local/papi-3.6.2/include -I/usr/local/papi-3.6.2/include -g -O2 -MT hwpc.lo -MD -MP -MF .deps/hwpc.Tpo -c hwpc.c -fPIC -DPIC -o .libs/hwpc.o hwpc.c:138:28: error: hwpc-perfmon2.h: No such file or directory hwpc.c: In function 'set_package': hwpc.c:2192: error: 'get_num_counters_perfmon' undeclared (first use in this function) hwpc.c:2192: error: (Each undeclared identifier is reported only once hwpc.c:2192: error: for each function it appears in.) hwpc.c:2193: error: 'get_max_counters_perfmon' undeclared (first use in this function) hwpc.c:2194: error: 'init_counters_perfmon' undeclared (first use in this function) hwpc.c:2195: error: 'start_counters_perfmon' undeclared (first use in this function) hwpc.c:2196: error: 'read_counters_perfmon' undeclared (first use in this function) hwpc.c:2197: error: 'stop_counters_perfmon' undeclared (first use in this function) hwpc.c:2198: error: 'destroy_counters_perfmon' undeclared (first use in this function) hwpc.c:2199: error: 'init_profiling_perfmon' undeclared (first use in this function) hwpc.c:2200: error: 'start_profiling_perfmon' undeclared (first use in this function) hwpc.c:2201: error: 'suspend_profiling_perfmon' undeclared (first use in this function) hwpc.c:2202: error: 'stop_profiling_perfmon' undeclared (first use in this function) make[4]: *** [hwpc.lo] Error 1 make[3]: *** [all-recursive] Error 1 make[2]: *** [all-recursive] Error 1 make[1]: *** [all-recursive] Error 1 make: *** [all] Error 2 It seems that hwpc-perfmon2.h doesn't exist. How to solve it? Regards, Jie Jiang |
From: George M. <ge...@ma...> - 2009-05-02 16:38:56
|
Finally I solved the problem there were some errors about binutils and tk (debian lenny operating system), now it's everything ok! Yes it counts mflops :) Now I have just to find how to run psrun on mpi programs. Thanks a lot, Best regards, George Markomanolis On Sat, May 2, 2009 at 1:51 AM, Rick Kufrin <rk...@il...> wrote: > George, > > It's a little hard to guess what is going on without a little more > information. Can you say what type of machine you are using (the output of > psinv -p would be useful)? Also, some details about your test C program, in > particular how many seconds does it take to run? > > That should get us started, hopefully we can diagnose what is leading to > these effects. > > Rick > > ----- Original Message ----- > From: "George Markomanolis" <ge...@ma...> > To: per...@li... > Sent: Friday, May 1, 2009 3:50:01 PM GMT -06:00 US/Canada Central > Subject: [PerfSuite-users] problem with configuration? > > Dear all, > > I have just installed perfsuite and I have a question. Maybe I did > something wrong to configuration and I can't measure mflops of a > program. When I run psrun with papi3_mflops only PAPI_TOT_CYC has a > value, mflops is zero. I know that PAPI_FP_OPS is supported from my cpu, > I have used this event before. Moreover when I use the standard xml > file, every value is zero although multiplexing is enabled (except of > the info about the cpu). The psinv commands shows me that my cpu > supports 36 events as papi_avail but I have no values with psrun. It's a > simple serial program for testing with C language. For paprallel program > I have to recompile because I saw that I can't run psrun with ch_p4. > > At ./configure I declared prefix, with-papim with-tdom, enable-mpi and > MPICPPFLAGS > > Thanks a lot, > Best regards, > George Markomanolis > > > __________ Information from ESET Smart Security, version of virus signature > database 4049 (20090501) __________ > > The message was checked by ESET Smart Security. > > http://www.eset.com > > > > > ------------------------------------------------------------------------------ > Register Now & Save for Velocity, the Web Performance & Operations > Conference from O'Reilly Media. Velocity features a full day of > expert-led, hands-on workshops and two days of sessions from industry > leaders in dedicated Performance & Operations tracks. Use code vel09scf > and Save an extra 15% before 5/3. http://p.sf.net/sfu/velocityconf > _______________________________________________ > PerfSuite-users mailing list > Per...@li... > https://lists.sourceforge.net/lists/listinfo/perfsuite-users > > |
From: George M. <ge...@ma...> - 2009-05-02 15:16:36
|
Sorry for the delay but I have a problem because I had to install it again, but there is an error now. My cpu is (from papai-avail): Vendor string and code : AuthenticAMD (2) Model string and code : AMD K8 Revision C (15) CPU Revision : 3.000000 CPU Megahertz : 2613.395996 CPU Clock Megahertz : 2613 CPU's in this Node : 4 Nodes in this System : 1 Total CPU's : 4 Number Hardware Counters : 4 Max Multiplex Counters : 32 Yesterday I was using another cpu but there is no free nodes to this cluster for the weekend so I tried to install it to another cluster (both amd 64bit). I deploy my onwn image and I upload it to the nodes. So I have papi 3.6.2 and mpich 1.2.7. For configuration of tDom I give: ../configure --pefix=/usr/local/ --enable-64bit –with-tcl=/usr/lib/tcl8.4/ and the results from test are: Tests ended at Sat May 02 17:10:59 CEST 2009 all.tcl: Total 1271 Passed 1258 Skipped 13 Failed 0 Sourced 0 Test Files. Number of tests skipped for each constraint: 3 knownBug 10 need_uri and installation continues. Now for perfsuite: ./configure --prefix=/usr/local --with-papi=/usr/local --with-tdom=/usr/local/lib --enable-mpi MPICPPFLAGS="-I/usr/local/include" --with-pfm=/usr/local and the error from make: gcc -g -O2 -o .libs/cpi-pshwpc cpi_pshwpc-cpi-pthreads.o -L/tmp/perfsuite-0.6.2/src/libperfsuite -L/tmp/perfsuite-0.6.2/src/libpshwpc -L/usr/local/lib /tmp/perfsuite-0.6.2/src/libperfsuite/.libs/libperfsuite.so -lpshwpc -lpapi /usr/bin/ld: cannot find -lpshwpc collect2: ld returned 1 exit status make[5]: *** [cpi-pshwpc] Error 1 make[4]: *** [all-recursive] Error 1 make[3]: *** [all-recursive] Error 1 make[2]: *** [all-recursive] Error 1 make[1]: *** [all-recursive] Error 1 make: *** [all] Error 2 Any idea please? Best regards, George Markomanolis On Sat, May 2, 2009 at 1:51 AM, Rick Kufrin <rk...@il...> wrote: > George, > > It's a little hard to guess what is going on without a little more > information. Can you say what type of machine you are using (the output of > psinv -p would be useful)? Also, some details about your test C program, in > particular how many seconds does it take to run? > > That should get us started, hopefully we can diagnose what is leading to > these effects. > > Rick > > ----- Original Message ----- > From: "George Markomanolis" <ge...@ma...> > To: per...@li... > Sent: Friday, May 1, 2009 3:50:01 PM GMT -06:00 US/Canada Central > Subject: [PerfSuite-users] problem with configuration? > > Dear all, > > I have just installed perfsuite and I have a question. Maybe I did > something wrong to configuration and I can't measure mflops of a > program. When I run psrun with papi3_mflops only PAPI_TOT_CYC has a > value, mflops is zero. I know that PAPI_FP_OPS is supported from my cpu, > I have used this event before. Moreover when I use the standard xml > file, every value is zero although multiplexing is enabled (except of > the info about the cpu). The psinv commands shows me that my cpu > supports 36 events as papi_avail but I have no values with psrun. It's a > simple serial program for testing with C language. For paprallel program > I have to recompile because I saw that I can't run psrun with ch_p4. > > At ./configure I declared prefix, with-papim with-tdom, enable-mpi and > MPICPPFLAGS > > Thanks a lot, > Best regards, > George Markomanolis > > > __________ Information from ESET Smart Security, version of virus signature > database 4049 (20090501) __________ > > The message was checked by ESET Smart Security. > > http://www.eset.com > > > > > ------------------------------------------------------------------------------ > Register Now & Save for Velocity, the Web Performance & Operations > Conference from O'Reilly Media. Velocity features a full day of > expert-led, hands-on workshops and two days of sessions from industry > leaders in dedicated Performance & Operations tracks. Use code vel09scf > and Save an extra 15% before 5/3. http://p.sf.net/sfu/velocityconf > _______________________________________________ > PerfSuite-users mailing list > Per...@li... > https://lists.sourceforge.net/lists/listinfo/perfsuite-users > > |
From: Rick K. <rk...@il...> - 2009-05-01 22:51:30
|
George, It's a little hard to guess what is going on without a little more information. Can you say what type of machine you are using (the output of psinv -p would be useful)? Also, some details about your test C program, in particular how many seconds does it take to run? That should get us started, hopefully we can diagnose what is leading to these effects. Rick ----- Original Message ----- From: "George Markomanolis" <ge...@ma...> To: per...@li... Sent: Friday, May 1, 2009 3:50:01 PM GMT -06:00 US/Canada Central Subject: [PerfSuite-users] problem with configuration? Dear all, I have just installed perfsuite and I have a question. Maybe I did something wrong to configuration and I can't measure mflops of a program. When I run psrun with papi3_mflops only PAPI_TOT_CYC has a value, mflops is zero. I know that PAPI_FP_OPS is supported from my cpu, I have used this event before. Moreover when I use the standard xml file, every value is zero although multiplexing is enabled (except of the info about the cpu). The psinv commands shows me that my cpu supports 36 events as papi_avail but I have no values with psrun. It's a simple serial program for testing with C language. For paprallel program I have to recompile because I saw that I can't run psrun with ch_p4. At ./configure I declared prefix, with-papim with-tdom, enable-mpi and MPICPPFLAGS Thanks a lot, Best regards, George Markomanolis __________ Information from ESET Smart Security, version of virus signature database 4049 (20090501) __________ The message was checked by ESET Smart Security. http://www.eset.com ------------------------------------------------------------------------------ Register Now & Save for Velocity, the Web Performance & Operations Conference from O'Reilly Media. Velocity features a full day of expert-led, hands-on workshops and two days of sessions from industry leaders in dedicated Performance & Operations tracks. Use code vel09scf and Save an extra 15% before 5/3. http://p.sf.net/sfu/velocityconf _______________________________________________ PerfSuite-users mailing list Per...@li... https://lists.sourceforge.net/lists/listinfo/perfsuite-users |
From: George M. <ge...@ma...> - 2009-05-01 21:11:07
|
Dear all, I have just installed perfsuite and I have a question. Maybe I did something wrong to configuration and I can't measure mflops of a program. When I run psrun with papi3_mflops only PAPI_TOT_CYC has a value, mflops is zero. I know that PAPI_FP_OPS is supported from my cpu, I have used this event before. Moreover when I use the standard xml file, every value is zero although multiplexing is enabled (except of the info about the cpu). The psinv commands shows me that my cpu supports 36 events as papi_avail but I have no values with psrun. It's a simple serial program for testing with C language. For paprallel program I have to recompile because I saw that I can't run psrun with ch_p4. At ./configure I declared prefix, with-papim with-tdom, enable-mpi and MPICPPFLAGS Thanks a lot, Best regards, George Markomanolis __________ Information from ESET Smart Security, version of virus signature database 4049 (20090501) __________ The message was checked by ESET Smart Security. http://www.eset.com |
From: Jean-Christophe P. <jea...@ci...> - 2009-04-09 07:29:57
|
Hello, i try to install perfsuite-1.0.0a1, but i'm blocked on a pb in the directory libpshwpc, i've this message : hwpc.c(138): catastrophic error: could not open source file "hwpc-perfmon2.h" # include "hwpc-perfmon2.h" I use the versions of pfmon-3.8 (perfmon2 v3.8) from sources. In the Makefile, there're some comments : # @@ Not yet implemented # libpshwpc_la_SOURCES += hwpc-perfmon2.c and # @@ Not yet implemented #noinst_HEADERS += hwpc-perfmon2.h but in the file hwpc.c(138) : #if HAVE_PERFMON2 # include "hwpc-perfmon2.h" #endif So, there's no file hwpc-perfmon2.h and no file hwpc-perfmon2.c ! Where can i find these files ? Thank you. ###################################################################### Jean-Christophe Penalva Centre Informatique National de l'Enseignement Superieur (CINES) Montpellier, FRANCE Tel : 33 4 67 141 414 Fax : 33 4 67 523 763 http://www.cines.fr/ |
From: Rick K. <rk...@nc...> - 2009-03-06 20:26:22
|
Subscribers to this list may be interested in an upcoming full-day tutorial involving PerfSuite (as part of the NSF SDCI POINT project and in conjunction with the European VI-HPS Virtual Institute for High-Productivity Supercomputing project) that will be offered on Sunday, May 24th at the International Conference on Computational Science (ICCS '09). ICCS '09 is hosted by the Center for Computation and Technology at Louisiana State University in Baton Rouge, Lousiana, USA. Registration is now open for the conference and tutorial sessions. For more information, see: http://www.iccs-meeting.org/ http://www.iccs-meeting.org/iccs2009/Tutorial1.html Rick |
From: Rick K. <rk...@nc...> - 2009-03-04 22:17:26
|
PerfSuite versions 0.6.2 final and 1.0.0 alpha 1 are now available. Version 0.6.2 final contains no changes except for versioning information from the 0.6.2 beta 3 release, but it is now the official stable release of PerfSuite. Version 1.0.0 alpha 1 is the first alpha release of the 1.0 series of PerfSuite. During the alpha cycle, new capabilities and features will be prototyped and added. Features may experience significant changes as the alpha series progresses, and we caution that new features may be removed if necessary during the cycle. Highlights of this release include: - A new Java API for parsing PerfSuite-generated XML files. - Fundamental change in data gathering, based on user feedback. The new approach reverses the order in which the time stamp counter and other measurements (e.g. hardware performance counters) are sampled. - Preliminary support for creating files that may be visualized with the Cube visualizer from Scalasca. There are a number of other changes, enhancements, and bug fixes in this release. See the CHANGES file for more information. URL: http://perfsuite.sourceforge.net/ Rick |
From: Naveen P. <nav...@gm...> - 2009-01-17 17:36:47
|
Rui, Thanks for generating these comprehensive benchmarks. I appreciate your good work. Naveen On Fri, Jan 16, 2009 at 4:10 PM, Rui Liu <ru...@nc...> wrote: > hi Naveen, > > Thanks a lot for reporting this! > > Rick wrote, > > I understand. We have been able to reproduce the behavior you reported > > with a small test case and are in the process of running a number of > > experiments to summarize the behavior under different conditions, using > > both the PerfSuite library as well as the PAPI high- and low-level > > APIs. Results will be posted to this mailing list when complete. > > We reproduced the issue, and measured the costs of different layers on an > NCSA Intel x86_64 HPC system. > > - Summary of the investigation results are: > > 1. When the PerfSuite HWPC (later referred to as PSHWPC) API is used, as > with any measurement tool, the granularity of the measured function should > be large enough to minimize perturbation by the tool/API. Typically when the > granularity of a measured function is 10-100 times or larger than the cost > of PSHWPC API itself, the measurement is relatively accurate. Otherwise, the > measured values could present a picture significantly distorted by the cost > of the measurement itself. > > 2. An alternative light-weight measurement method using PSHWPC API, > instead of using ps_hwpc_start() and ps_hwpc_suspend(), is using > ps_hwpc_read(). The cost of the function is relatively small, but the flip > side is that the user of this function needs to manage the counter values > him/herself. On an NCSA HPC system with perfctr patch, it was found that > with 1 PAPI event, rdpmc (which is used by perfctr) costs 53 cycles, > PAPI_read (which calls perfctr) costs 164 cycles, ps_hwpc_read (which calls > PAPI_read) costs 184 cycles. > > > - Details: (Please note that these values are specific to this particular > hardware/software combination, but they can be used as a guideline.) > > 1. Measured cost of PSHWPC API calls: > > Approximate cost for 1 PAPI event (PAPI_TOT_CYC): > ps_hwpc_init(): 7.92 M cycles > ps_hwpc_stop(): 4.03 M cycles > ps_hwpc_shutdown(): 446 K cycles > first ps_hwpc_start(): 19 K cycles > first ps_hwpc_suspend(): 1750 cycles > subsequent ps_hwpc_start(): 473 cycles > subsequent ps_hwpc_suspend(): 376 cycles > func(NUM_OPS): 7 * NUM_OPS cycles > > 2. Comparison of measured numbers of "wall ticks" and "PAPI_TOT_CYC" in > PSHWPC generated XML files and direct PAPI high and low level API > measurements: > > ---------------------------------------------------------------- > COUNT NUM_OPS rtc_delta PS_ticks PS_TOT_CYC PAPI h PAPI l > ---------------------------------------------------------------- > 100 K 1 80 M 11 M 46 M 2.4 M 2.3 M > 100 K 10 98 M 22 M 56 M 8.1 M 8.1 M > 100 K 100 152 M 83 M 118 M 73.7 M 73.9 M > 100 K 1 K 784 M 716 M 748 M 704 M 704 M > 100 K 10 K 7295 M 7223 M 7052 M 7005 M 7005 M > 100 K 100 K 71.809 G 71.723 G 70.063 G 70.019 G 69.974 G > ---------------------------------------------------------------- > > In the above table, > - COUNT is how many times the loop (including ps_hwpc_start, > func(NUM_OPS), and suspend) is called, > - NUM_OPS is the granularity of the measured function (which includes a > loop running integer multiplication and summation), > - rtc_delta is the number of cycles of the program from beginning to > finish, which includes ps_hwpc_init, stop, shutdown, etc. > - PS_ticks is the number of wall ticks appearing in PSHWPC generated XML, > - PS_TOT_CYC is the value of PAPI_TOT_CYC in PSHWPC generated XML, > - PAPI h is the value of PAPI_TOT_CYC using PAPI high level API, > - PAPI l is the value of PAPI_TOT_CYC using PAPI low level API. > > From the table, one can observe: > > 1) from PS_ticks and PS_TOT_CYC columns, that indeed there is an issue > where PS_TOT_CYC > PS_ticks when the granularity of measured function is > small (NUM_OPS <= 1 k). This will result in "CPU time > wall clock time" > issue in psprocess output, as observed by Naveen. > This is due to the behavior of implementation in the released version of > PerfSuite where "count the timer" is done, instead of "time the counter", > and is being addressed and will be available in a later PerfSuite version. > > 2) when the granularity of measured function is relatively small (NUM_OPS > <= 100), the costs of ps_hwpc_start() + ps_hwpc_suspend() on top of PAPI > will significantly distort the PAPI_TOT_CYC values, as compared with direct > PAPI measurement. When the granularity is extremely small (NUM_OPS = 1, > which means 7 cycles), the costs of ps_hwpc_start() + ps_hwpc_suspend() > dominate, and PS_TOT_CYC is almost 20 times the value of direct PAPI > measurement. When the granularity is large enough (NUM_OPS = 10 K, which > means 70 K cycles or 30 micro seconds), PS_TOT_CYC and PAPI values are off > only within 0.7%. This is similar to what Naveen reported. > > 3. Comparison of the costs of rdpmc, PAPI_read, and ps_hwpc_read: (unit is > CPU cycles) > > ------------------------------------------- > # counter rdpmc PAPI_read ps_hwpc_read > ------------------------------------------- > 1 53* 164 184 > 2 106* 224 248 > ------------------------------------------- > > *: rdpmc cost was obtained from perfctr init test output in linux kernel > boot up message, as in "/var/log/messages" or dmesg output. The number 106 > was extrapolated by using 53 * 2. The values of PAPI_read and ps_hwpc_read > were measured in this investigation. > > 4. Test setup details: > > 1) Hardware/software: > A login node of the NCSA Intel x86_64 HPC system (Abe) -- > honest3.ncsa.uiuc.edu. > Linux 2.6.18 patched with perfctr 2.6.37 > Intel Xeon E5345, 2.33 GHz (2327.506 MHz), L1 I cache: 32K, L1 D cache: > 32K, L2 Unified cache 4 MB > 8 CPUs > gcc version 3.4.6 20060404 (Red Hat 3.4.6-3) > no compile time optimization, (-O0 was in effect at compile time). > PAPI version is 3.6.2. > PerfSuite version is 0.6.2b1. > > 2) How the PSHWPC API call costs were measured: > A program loops around "ps_hwpc_start(); func(NUM_OPS); > ps_hwpc_suspend()" for many iterations. ps_rtc() is used to measure the API > calls. > > 3) How the PS_ticks and PS_TOT_CYC numbers were obtained: > Using almost the same program as above, but removed the ps_rtc() calls > for measuring the PSHWPC calls, only left the ps_rtc() calls at beginning > and end of the program to measure entire duration of the program (rtc_delta > in the table). Then obtained "wallticks" and "PAPI_TOT_CYC" values from the > generated XML files. > > 4) How the PAPI high and low level API numbers were obtained: > Based on papi-3.6.2/src/examples/high-level.c, wrote a program to have a > loop to call func(NUM_OPS), used PAPI_start_counters() and > PAPI_read_counters() to wrap it for high level API measurement, and used > PAPI_start() and PAPI_stop() to wrap it for low level API measurement. > > 5) How the rdpmc, PAPI_read, ps_hwpc_read costs were obtained: > rdpmc cost was obtained from perfctr self test output in linux kernel > boot up message. Based on perfctr-2.6.37/linux/drivers/perfctr/x86_tests.c, > wrote a program, used rdtscll() assembly language call to measure PAPI_read, > and ps_hwpc_read for 1024000 iterations and averaged it out. > > The files used to measure PAPI_read and ps_hwpc_read costs (measure-cost.c, > Makefile, papi_1event.xml, papi_2events.xml) are attached. Please note that > when measuring 2 PAPI events, source code (measure-cost.c) needs to be > changed (by uncommenting 2 lines in setup_papi()) and recompiled for PAPI > API, while it does not need to be recompiled for PSHWPC API. > > Please let me know if you have any question or comments. Thanks! > > Thanks, > Rui Liu > NCSA / PerfSuite team > |
From: Rui L. <ru...@nc...> - 2009-01-17 00:34:41
|
hi Naveen, Thanks a lot for reporting this! Rick wrote, > I understand. We have been able to reproduce the behavior you reported > with a small test case and are in the process of running a number of > experiments to summarize the behavior under different conditions, using > both the PerfSuite library as well as the PAPI high- and low-level > APIs. Results will be posted to this mailing list when complete. We reproduced the issue, and measured the costs of different layers on an NCSA Intel x86_64 HPC system. - Summary of the investigation results are: 1. When the PerfSuite HWPC (later referred to as PSHWPC) API is used, as with any measurement tool, the granularity of the measured function should be large enough to minimize perturbation by the tool/API. Typically when the granularity of a measured function is 10-100 times or larger than the cost of PSHWPC API itself, the measurement is relatively accurate. Otherwise, the measured values could present a picture significantly distorted by the cost of the measurement itself. 2. An alternative light-weight measurement method using PSHWPC API, instead of using ps_hwpc_start() and ps_hwpc_suspend(), is using ps_hwpc_read(). The cost of the function is relatively small, but the flip side is that the user of this function needs to manage the counter values him/herself. On an NCSA HPC system with perfctr patch, it was found that with 1 PAPI event, rdpmc (which is used by perfctr) costs 53 cycles, PAPI_read (which calls perfctr) costs 164 cycles, ps_hwpc_read (which calls PAPI_read) costs 184 cycles. - Details: (Please note that these values are specific to this particular hardware/software combination, but they can be used as a guideline.) 1. Measured cost of PSHWPC API calls: Approximate cost for 1 PAPI event (PAPI_TOT_CYC): ps_hwpc_init(): 7.92 M cycles ps_hwpc_stop(): 4.03 M cycles ps_hwpc_shutdown(): 446 K cycles first ps_hwpc_start(): 19 K cycles first ps_hwpc_suspend(): 1750 cycles subsequent ps_hwpc_start(): 473 cycles subsequent ps_hwpc_suspend(): 376 cycles func(NUM_OPS): 7 * NUM_OPS cycles 2. Comparison of measured numbers of "wall ticks" and "PAPI_TOT_CYC" in PSHWPC generated XML files and direct PAPI high and low level API measurements: ---------------------------------------------------------------- COUNT NUM_OPS rtc_delta PS_ticks PS_TOT_CYC PAPI h PAPI l ---------------------------------------------------------------- 100 K 1 80 M 11 M 46 M 2.4 M 2.3 M 100 K 10 98 M 22 M 56 M 8.1 M 8.1 M 100 K 100 152 M 83 M 118 M 73.7 M 73.9 M 100 K 1 K 784 M 716 M 748 M 704 M 704 M 100 K 10 K 7295 M 7223 M 7052 M 7005 M 7005 M 100 K 100 K 71.809 G 71.723 G 70.063 G 70.019 G 69.974 G ---------------------------------------------------------------- In the above table, - COUNT is how many times the loop (including ps_hwpc_start, func(NUM_OPS), and suspend) is called, - NUM_OPS is the granularity of the measured function (which includes a loop running integer multiplication and summation), - rtc_delta is the number of cycles of the program from beginning to finish, which includes ps_hwpc_init, stop, shutdown, etc. - PS_ticks is the number of wall ticks appearing in PSHWPC generated XML, - PS_TOT_CYC is the value of PAPI_TOT_CYC in PSHWPC generated XML, - PAPI h is the value of PAPI_TOT_CYC using PAPI high level API, - PAPI l is the value of PAPI_TOT_CYC using PAPI low level API. >From the table, one can observe: 1) from PS_ticks and PS_TOT_CYC columns, that indeed there is an issue where PS_TOT_CYC > PS_ticks when the granularity of measured function is small (NUM_OPS <= 1 k). This will result in "CPU time > wall clock time" issue in psprocess output, as observed by Naveen. This is due to the behavior of implementation in the released version of PerfSuite where "count the timer" is done, instead of "time the counter", and is being addressed and will be available in a later PerfSuite version. 2) when the granularity of measured function is relatively small (NUM_OPS <= 100), the costs of ps_hwpc_start() + ps_hwpc_suspend() on top of PAPI will significantly distort the PAPI_TOT_CYC values, as compared with direct PAPI measurement. When the granularity is extremely small (NUM_OPS = 1, which means 7 cycles), the costs of ps_hwpc_start() + ps_hwpc_suspend() dominate, and PS_TOT_CYC is almost 20 times the value of direct PAPI measurement. When the granularity is large enough (NUM_OPS = 10 K, which means 70 K cycles or 30 micro seconds), PS_TOT_CYC and PAPI values are off only within 0.7%. This is similar to what Naveen reported. 3. Comparison of the costs of rdpmc, PAPI_read, and ps_hwpc_read: (unit is CPU cycles) ------------------------------------------- # counter rdpmc PAPI_read ps_hwpc_read ------------------------------------------- 1 53* 164 184 2 106* 224 248 ------------------------------------------- *: rdpmc cost was obtained from perfctr init test output in linux kernel boot up message, as in "/var/log/messages" or dmesg output. The number 106 was extrapolated by using 53 * 2. The values of PAPI_read and ps_hwpc_read were measured in this investigation. 4. Test setup details: 1) Hardware/software: A login node of the NCSA Intel x86_64 HPC system (Abe) -- honest3.ncsa.uiuc.edu. Linux 2.6.18 patched with perfctr 2.6.37 Intel Xeon E5345, 2.33 GHz (2327.506 MHz), L1 I cache: 32K, L1 D cache: 32K, L2 Unified cache 4 MB 8 CPUs gcc version 3.4.6 20060404 (Red Hat 3.4.6-3) no compile time optimization, (-O0 was in effect at compile time). PAPI version is 3.6.2. PerfSuite version is 0.6.2b1. 2) How the PSHWPC API call costs were measured: A program loops around "ps_hwpc_start(); func(NUM_OPS); ps_hwpc_suspend()" for many iterations. ps_rtc() is used to measure the API calls. 3) How the PS_ticks and PS_TOT_CYC numbers were obtained: Using almost the same program as above, but removed the ps_rtc() calls for measuring the PSHWPC calls, only left the ps_rtc() calls at beginning and end of the program to measure entire duration of the program (rtc_delta in the table). Then obtained "wallticks" and "PAPI_TOT_CYC" values from the generated XML files. 4) How the PAPI high and low level API numbers were obtained: Based on papi-3.6.2/src/examples/high-level.c, wrote a program to have a loop to call func(NUM_OPS), used PAPI_start_counters() and PAPI_read_counters() to wrap it for high level API measurement, and used PAPI_start() and PAPI_stop() to wrap it for low level API measurement. 5) How the rdpmc, PAPI_read, ps_hwpc_read costs were obtained: rdpmc cost was obtained from perfctr self test output in linux kernel boot up message. Based on perfctr-2.6.37/linux/drivers/perfctr/x86_tests.c, wrote a program, used rdtscll() assembly language call to measure PAPI_read, and ps_hwpc_read for 1024000 iterations and averaged it out. The files used to measure PAPI_read and ps_hwpc_read costs (measure-cost.c, Makefile, papi_1event.xml, papi_2events.xml) are attached. Please note that when measuring 2 PAPI events, source code (measure-cost.c) needs to be changed (by uncommenting 2 lines in setup_papi()) and recompiled for PAPI API, while it does not need to be recompiled for PSHWPC API. Please let me know if you have any question or comments. Thanks! Thanks, Rui Liu NCSA / PerfSuite team |
From: Rick K. <rk...@nc...> - 2009-01-07 17:32:13
|
Naveen, Thanks again for the update (and for raising this issue in the first place). Just a couple of additional comments: Naveen Parihar wrote: > The second point involving the difference between Perfsuite CPU time > and Wall-clock time, as you stated, can be explained by "time the > counters" or "count the timers". As you said, choosing one approach > depends on the objective. For my case, in either approach, the most > accurate measurements, will still have some overhead but I can imagine > scenarios, where changing from "count the timers" to "time the > counters" might be helpful. > I appreciate your input on this and we will be updating the relevant library to make this change for the next major release. Going forward, we will "time the counters" (or whatever is being measured), not the reverse. > Please note that the numbers in this email are generated on a > full-blown voice recognition system. Earlier when I said "small test > program" I meant "small test example", sorry for the confusion that > this might have created. Unfortunately, due to license issues > involving the speech database, I can't share the experimental setup. I understand. We have been able to reproduce the behavior you reported with a small test case and are in the process of running a number of experiments to summarize the behavior under different conditions, using both the PerfSuite library as well as the PAPI high- and low-level APIs. Results will be posted to this mailing list when complete. Thanks once again for your feedback, Rick |
From: Rick K. <rk...@nc...> - 2009-01-07 16:58:35
|
Subscribers to this mailing list may be interested in an upcoming full-day tutorial involving PerfSuite (as part of the NSF SDCI POINT project) that will be offered on Monday, March 9, 2009 at the 10th LCI International Conference on High-Performance Clustered Computing (LCI 09). LCI 09 will be held at the National Center for Atmospheric Research at Boulder, Colorado, USA. Registration is now open for both the conference and tutorial sessions. For more information on LCI 09, see: http://www.linuxclustersinstitute.org/conferences/ Rick |
From: Naveen P. <nav...@gm...> - 2009-01-06 17:47:17
|
Rick, Thanks for the informative reply. Small time granularity of the function seems to be the key issue. The table below shows the important numbers. All the numbers were generated on the same test example and a few were quoted in my previous mail. | Perfsuite | PAPI High-level | PAPI Low-level | ------------------------------------------------------------------------------------------ PAPI_TOT_CYC | 31,257,943,365 | 8,628,425,548 | 7,712,025,538 | ------------------------------------------------------------------------------------------ Corresp. Time (s) | 13.02 | 3.60 | 3.21 | ------------------------------------------------------------------------------------------ PerfSuite Wall-clock time (s): 2.33 Time measured through independent rdtsc (s): 2.02 The overhead shows up in the PAPI_TOT_CYC measurements. The PAPI High-level interface also has some overhead over the Low-level interface. For my purposes, where the time granularity is very small, I am inclined to write a custom interface for measurement of events to improve accuracy. Note that the independent time measurements (and hence, the TOT_CYC) through the use of independent rdtsc is the lowest with minimal overhead. The second point involving the difference between Perfsuite CPU time and Wall-clock time, as you stated, can be explained by "time the counters" or "count the timers". As you said, choosing one approach depends on the objective. For my case, in either approach, the most accurate measurements, will still have some overhead but I can imagine scenarios, where changing from "count the timers" to "time the counters" might be helpful. Please note that the numbers in this email are generated on a full-blown voice recognition system. Earlier when I said "small test program" I meant "small test example", sorry for the confusion that this might have created. Unfortunately, due to license issues involving the speech database, I can't share the experimental setup. -Naveen On Mon, Jan 5, 2009 at 4:40 PM, Rick Kufrin <rk...@nc...> wrote: > Naveen, > > Thanks for digging into this more. I hope this message gets out to the > mailing list (along with yours), as your experiments are interesting and > useful. I think your tests reported below are a slightly different matter > than the earlier one (CPU > wall clock), but they are both related. > > These more recent test results seem to point more to overhead introduced in > libpshwpc, especially when measuring at fine granularity, which it sounds > like you have in your tests. There is definitely additional work > introduced when using the libpshwpc API (above and beyond any work going on > in PAPI). > > A second point that we have been discussing following your earlier reports > arises from an issue that comes when one is measuring two things > simultaneously (as libpshwpc does): the hardware counters and the time stamp > counter. These have to be ordered in some way and the choice is made - I > will paraphrase Luiz DeRose of Cray here - whether one wants to "time the > counters" or "count the timers". That is, do you first read the counter > values and then the time stamp or vice versa? Reviewing the relevant code > in libpshwpc (the relevant source file, by the way, is > src/libpshwpc/hwpc.c), the current implementation chooses to "count the > timers". For most purposes, this gives reasonable results but I think you > are pushing the envelope here. I would be interested in hearing your > viewpoint if the reverse should be done, which may affect the CPU vs. wall > clock issue. > > On the more general overhead issue: we will try to reproduce by writing > test cases here. If it would be possible to send me a copy of the small > test program you are using, that would help us work with you on this. > > Thanks again for following up on this, > Rick > > Naveen Parihar wrote: > >> Rick, >> >> Thanks for the quick response and suggestions. >> >> It seems like a bug in Perfsuite and not PAPI. Here are the numbers for >> PAPI_TOT_CYC from a small test program. >> >> Using Perfsuite libpshwpc <http://perfsuite.ncsa.uiuc.edu/libpshwpc/> >> : 31,257,943,365 >> Using PAPI high-level interface: 8,628,425,548 >> >> Please note that ps_hwpc_start() and ps_hwpc_suspend() are called >> thousands of times within a large program to benchmark a specific function. >> The time granularity of this function is very small and hence, I have been >> using rdtsc to get independent timing measurements. The timing measurement >> using independent rdtsc are closer to one measured directly using PAPI. >> >> I also observed similar behavior with PAPI_TOT_INS, so it is likely that >> this bug is potentially influencing other events as well. >> >> I do not mind investing some time in finding this bug and fixing it in >> Perfsuite code base and will appreciate if you can tell me the potential >> places where I should be looking. >> >> -Naveen >> >> On Mon, Jan 5, 2009 at 3:30 PM, Rick Kufrin <rk...@nc...<mailto: >> rk...@nc...>> wrote: >> >> Naveen, >> >> Thanks for sending the document. We (of course) get the same >> results from post-processing it here as you do, but cannot easily >> reproduce this behavior with our own executables/tests on systems >> we have available. >> >> There are two suggestions I can think of to try offhand: >> >> 1. Add the "time" command to your run, i.e. "time psrun a.out", to >> get a 3rd-party opinion of the wall clock and CPU time >> >> 2. PerfSuite supports using the "gettimeofday" system call for >> wall-clock timing (by default, on a machine like yours, it uses >> the "rdtsc" asm instruction). To use gettimeofday, one has to >> reconfigure PerfSuits with the option "--enable-rtc=gettimeofday". >> Then make clean and remake as normal. >> >> If you come across any more info looking into this, we are of >> course very interested to address. Thanks for reporting. >> >> Rick >> >> Naveen Parihar wrote: >> >> RIck, >> >> The XML document is attached with this email. I'm trying to >> debug the problem by directly using high-level PAPI interface. >> Will let you know my findings later. >> >> -Naveen >> >> On Mon, Jan 5, 2009 at 2:00 PM, Rick Kufrin >> <rk...@nc... <mailto:rk...@nc...> >> <mailto:rk...@nc... <mailto:rk...@nc...>>> >> >> wrote: >> >> Naveen, >> >> That certainly does sound like a bug, or at least unexpected >> behavior... >> >> Would you please send a copy of the XML document that you >> used to >> obtain this output? It may help to look at its contents >> closer to >> track down what is going on. >> >> Rick >> >> Naveen Parihar wrote: >> >> Dear Perfsuite users, >> >> I'm a new user of Persuite/PAPI and would appreciate >> comments >> on my query below. >> >> On a quad core intel runing Fedora Core 6 (kernel >> 2.6.18), I >> get the following numbers while runing a *single* threaded >> program: >> CPU time (seconds) 47.966 >> Wall clock time (seconds) 30.582 >> >> Since the CPU time > Wall clock time, one conclusion >> might me >> a bug in Perfsuite/PAPI or somewhere else. I double >> checked >> the CPU >> time by dividing PAPI_TOT_CYC count by CPU frequency, and I >> arrive at the same number. Any ideas on what might be >> going on >> or what might be the best approach to debugging this >> problem >> is appreciated. >> >> Thanks, >> -Naveen >> >> ------------------------------------------------------------------------ >> >> >> ------------------------------------------------------------------------------ >> >> ------------------------------------------------------------------------ >> >> _______________________________________________ >> PerfSuite-users mailing list >> Per...@li... >> <mailto:Per...@li...> >> <mailto:Per...@li... >> <mailto:Per...@li...>> >> >> >> https://lists.sourceforge.net/lists/listinfo/perfsuite-users >> >> >> >> >> > |
From: Rick K. <rk...@nc...> - 2009-01-05 22:40:39
|
Naveen, Thanks for digging into this more. I hope this message gets out to the mailing list (along with yours), as your experiments are interesting and useful. I think your tests reported below are a slightly different matter than the earlier one (CPU > wall clock), but they are both related. These more recent test results seem to point more to overhead introduced in libpshwpc, especially when measuring at fine granularity, which it sounds like you have in your tests. There is definitely additional work introduced when using the libpshwpc API (above and beyond any work going on in PAPI). A second point that we have been discussing following your earlier reports arises from an issue that comes when one is measuring two things simultaneously (as libpshwpc does): the hardware counters and the time stamp counter. These have to be ordered in some way and the choice is made - I will paraphrase Luiz DeRose of Cray here - whether one wants to "time the counters" or "count the timers". That is, do you first read the counter values and then the time stamp or vice versa? Reviewing the relevant code in libpshwpc (the relevant source file, by the way, is src/libpshwpc/hwpc.c), the current implementation chooses to "count the timers". For most purposes, this gives reasonable results but I think you are pushing the envelope here. I would be interested in hearing your viewpoint if the reverse should be done, which may affect the CPU vs. wall clock issue. On the more general overhead issue: we will try to reproduce by writing test cases here. If it would be possible to send me a copy of the small test program you are using, that would help us work with you on this. Thanks again for following up on this, Rick Naveen Parihar wrote: > Rick, > > Thanks for the quick response and suggestions. > > It seems like a bug in Perfsuite and not PAPI. Here are the numbers > for PAPI_TOT_CYC from a small test program. > > Using Perfsuite libpshwpc > <http://perfsuite.ncsa.uiuc.edu/libpshwpc/> : 31,257,943,365 > Using PAPI high-level interface: 8,628,425,548 > > Please note that ps_hwpc_start() and ps_hwpc_suspend() are called > thousands of times within a large program to benchmark a specific > function. The time granularity of this function is very small and > hence, I have been using rdtsc to get independent timing measurements. > The timing measurement using independent rdtsc are closer to one > measured directly using PAPI. > > I also observed similar behavior with PAPI_TOT_INS, so it is likely > that this bug is potentially influencing other events as well. > > I do not mind investing some time in finding this bug and fixing it > in Perfsuite code base and will appreciate if you can tell me the > potential places where I should be looking. > > -Naveen > > On Mon, Jan 5, 2009 at 3:30 PM, Rick Kufrin <rk...@nc... > <mailto:rk...@nc...>> wrote: > > Naveen, > > Thanks for sending the document. We (of course) get the same > results from post-processing it here as you do, but cannot easily > reproduce this behavior with our own executables/tests on systems > we have available. > > There are two suggestions I can think of to try offhand: > > 1. Add the "time" command to your run, i.e. "time psrun a.out", to > get a 3rd-party opinion of the wall clock and CPU time > > 2. PerfSuite supports using the "gettimeofday" system call for > wall-clock timing (by default, on a machine like yours, it uses > the "rdtsc" asm instruction). To use gettimeofday, one has to > reconfigure PerfSuits with the option "--enable-rtc=gettimeofday". > Then make clean and remake as normal. > > If you come across any more info looking into this, we are of > course very interested to address. Thanks for reporting. > > Rick > > Naveen Parihar wrote: > > RIck, > > The XML document is attached with this email. I'm trying to > debug the problem by directly using high-level PAPI interface. > Will let you know my findings later. > > -Naveen > > On Mon, Jan 5, 2009 at 2:00 PM, Rick Kufrin > <rk...@nc... <mailto:rk...@nc...> > <mailto:rk...@nc... <mailto:rk...@nc...>>> > wrote: > > Naveen, > > That certainly does sound like a bug, or at least unexpected > behavior... > > Would you please send a copy of the XML document that you > used to > obtain this output? It may help to look at its contents > closer to > track down what is going on. > > Rick > > Naveen Parihar wrote: > > Dear Perfsuite users, > > I'm a new user of Persuite/PAPI and would appreciate > comments > on my query below. > > On a quad core intel runing Fedora Core 6 (kernel > 2.6.18), I > get the following numbers while runing a *single* threaded > program: > CPU time (seconds) 47.966 > Wall clock time (seconds) 30.582 > > Since the CPU time > Wall clock time, one conclusion > might me > a bug in Perfsuite/PAPI or somewhere else. I double > checked > the CPU > time by dividing PAPI_TOT_CYC count by CPU frequency, and I > arrive at the same number. Any ideas on what might be > going on > or what might be the best approach to debugging this > problem > is appreciated. > > Thanks, > -Naveen > > ------------------------------------------------------------------------ > > > ------------------------------------------------------------------------------ > > ------------------------------------------------------------------------ > > _______________________________________________ > PerfSuite-users mailing list > Per...@li... > <mailto:Per...@li...> > <mailto:Per...@li... > <mailto:Per...@li...>> > > > https://lists.sourceforge.net/lists/listinfo/perfsuite-users > > > > > |
From: Rick K. <rk...@nc...> - 2009-01-05 22:22:56
|
Naveen, Thanks for sending the document. We (of course) get the same results from post-processing it here as you do, but cannot easily reproduce this behavior with our own executables/tests on systems we have available. There are two suggestions I can think of to try offhand: 1. Add the "time" command to your run, i.e. "time psrun a.out", to get a 3rd-party opinion of the wall clock and CPU time 2. PerfSuite supports using the "gettimeofday" system call for wall-clock timing (by default, on a machine like yours, it uses the "rdtsc" asm instruction). To use gettimeofday, one has to reconfigure PerfSuits with the option "--enable-rtc=gettimeofday". Then make clean and remake as normal. If you come across any more info looking into this, we are of course very interested to address. Thanks for reporting. Rick Naveen Parihar wrote: > RIck, > > The XML document is attached with this email. I'm trying to debug the > problem by directly using high-level PAPI interface. Will let you know > my findings later. > > -Naveen > > On Mon, Jan 5, 2009 at 2:00 PM, Rick Kufrin <rk...@nc... > <mailto:rk...@nc...>> wrote: > > Naveen, > > That certainly does sound like a bug, or at least unexpected > behavior... > > Would you please send a copy of the XML document that you used to > obtain this output? It may help to look at its contents closer to > track down what is going on. > > Rick > > Naveen Parihar wrote: > > Dear Perfsuite users, > > I'm a new user of Persuite/PAPI and would appreciate comments > on my query below. > > On a quad core intel runing Fedora Core 6 (kernel 2.6.18), I > get the following numbers while runing a *single* threaded > program: > CPU time (seconds) 47.966 > Wall clock time (seconds) 30.582 > > Since the CPU time > Wall clock time, one conclusion might me > a bug in Perfsuite/PAPI or somewhere else. I double checked > the CPU > time by dividing PAPI_TOT_CYC count by CPU frequency, and I > arrive at the same number. Any ideas on what might be going on > or what might be the best approach to debugging this problem > is appreciated. > > Thanks, > -Naveen > ------------------------------------------------------------------------ > > ------------------------------------------------------------------------------ > ------------------------------------------------------------------------ > > _______________________________________________ > PerfSuite-users mailing list > Per...@li... > <mailto:Per...@li...> > https://lists.sourceforge.net/lists/listinfo/perfsuite-users > > > > |
From: Naveen P. <nav...@gm...> - 2009-01-05 21:52:34
|
Rick, Thanks for the quick response and suggestions. It seems like a bug in Perfsuite and not PAPI. Here are the numbers for PAPI_TOT_CYC from a small test program. Using Perfsuite libpshwpc <http://perfsuite.ncsa.uiuc.edu/libpshwpc/> : 31,257,943,365 Using PAPI high-level interface: 8,628,425,548 Please note that ps_hwpc_start() and ps_hwpc_suspend() are called thousands of times within a large program to benchmark a specific function. The time granularity of this function is very small and hence, I have been using rdtsc to get independent timing measurements. The timing measurement using independent rdtsc are closer to one measured directly using PAPI. I also observed similar behavior with PAPI_TOT_INS, so it is likely that this bug is potentially influencing other events as well. I do not mind investing some time in finding this bug and fixing it in Perfsuite code base and will appreciate if you can tell me the potential places where I should be looking. -Naveen On Mon, Jan 5, 2009 at 3:30 PM, Rick Kufrin <rk...@nc...> wrote: > Naveen, > > Thanks for sending the document. We (of course) get the same results from > post-processing it here as you do, but cannot easily reproduce this behavior > with our own executables/tests on systems we have available. > > There are two suggestions I can think of to try offhand: > > 1. Add the "time" command to your run, i.e. "time psrun a.out", to get a > 3rd-party opinion of the wall clock and CPU time > > 2. PerfSuite supports using the "gettimeofday" system call for wall-clock > timing (by default, on a machine like yours, it uses the "rdtsc" asm > instruction). To use gettimeofday, one has to reconfigure PerfSuits with > the option "--enable-rtc=gettimeofday". Then make clean and remake as > normal. > > If you come across any more info looking into this, we are of course very > interested to address. Thanks for reporting. > > Rick > > Naveen Parihar wrote: > >> RIck, >> >> The XML document is attached with this email. I'm trying to debug the >> problem by directly using high-level PAPI interface. Will let you know my >> findings later. >> >> -Naveen >> >> On Mon, Jan 5, 2009 at 2:00 PM, Rick Kufrin <rk...@nc...<mailto: >> rk...@nc...>> wrote: >> >> Naveen, >> >> That certainly does sound like a bug, or at least unexpected >> behavior... >> >> Would you please send a copy of the XML document that you used to >> obtain this output? It may help to look at its contents closer to >> track down what is going on. >> >> Rick >> >> Naveen Parihar wrote: >> >> Dear Perfsuite users, >> >> I'm a new user of Persuite/PAPI and would appreciate comments >> on my query below. >> >> On a quad core intel runing Fedora Core 6 (kernel 2.6.18), I >> get the following numbers while runing a *single* threaded >> program: >> CPU time (seconds) 47.966 >> Wall clock time (seconds) 30.582 >> >> Since the CPU time > Wall clock time, one conclusion might me >> a bug in Perfsuite/PAPI or somewhere else. I double checked >> the CPU >> time by dividing PAPI_TOT_CYC count by CPU frequency, and I >> arrive at the same number. Any ideas on what might be going on >> or what might be the best approach to debugging this problem >> is appreciated. >> >> Thanks, >> -Naveen >> >> ------------------------------------------------------------------------ >> >> >> ------------------------------------------------------------------------------ >> >> ------------------------------------------------------------------------ >> >> _______________________________________________ >> PerfSuite-users mailing list >> Per...@li... >> <mailto:Per...@li...> >> https://lists.sourceforge.net/lists/listinfo/perfsuite-users >> >> >> >> > |
From: Rick K. <rk...@nc...> - 2009-01-05 20:00:35
|
Naveen, That certainly does sound like a bug, or at least unexpected behavior... Would you please send a copy of the XML document that you used to obtain this output? It may help to look at its contents closer to track down what is going on. Rick Naveen Parihar wrote: > Dear Perfsuite users, > > I'm a new user of Persuite/PAPI and would appreciate comments on my > query below. > > On a quad core intel runing Fedora Core 6 (kernel 2.6.18), I get the > following numbers while runing a *single* threaded program: > CPU time (seconds) 47.966 > Wall clock time (seconds) 30.582 > > Since the CPU time > Wall clock time, one conclusion might me a bug in > Perfsuite/PAPI or somewhere else. I double checked the CPU > time by dividing PAPI_TOT_CYC count by CPU frequency, and I arrive at > the same number. Any ideas on what might be going on or what might be > the best approach to debugging this problem is appreciated. > > Thanks, > -Naveen > ------------------------------------------------------------------------ > > ------------------------------------------------------------------------------ > > ------------------------------------------------------------------------ > > _______________________________________________ > PerfSuite-users mailing list > Per...@li... > https://lists.sourceforge.net/lists/listinfo/perfsuite-users > |
From: Naveen P. <nav...@gm...> - 2009-01-05 17:32:04
|
Dear Perfsuite users, I'm a new user of Persuite/PAPI and would appreciate comments on my query below. On a quad core intel runing Fedora Core 6 (kernel 2.6.18), I get the following numbers while runing a *single* threaded program: CPU time (seconds) 47.966 Wall clock time (seconds) 30.582 Since the CPU time > Wall clock time, one conclusion might me a bug in Perfsuite/PAPI or somewhere else. I double checked the CPU time by dividing PAPI_TOT_CYC count by CPU frequency, and I arrive at the same number. Any ideas on what might be going on or what might be the best approach to debugging this problem is appreciated. Thanks, -Naveen |
From: Rick K. <rk...@nc...> - 2008-12-01 18:04:23
|
Thomas, Excellent news, I am glad that the problem was resolved. I will copy the mailing list to record the status. Your problem report also showed an unused variable in one of the test cases for Fortran, and that has been addressed for future releases. Thanks again. Rick It appeared that Thomas Tanaka wrote: > Hello Rick, > > Thanks for your prompt reply again. > > I think my problem was I missed the g77 compiler. Once the g77 is > installed, everything just works fine and expected and successfully > compile. > > I appreciate your prompt help. I am sorry to have missed that since > configure did not complain so I just assume my system has everything > that is needed to compile the perfsuite. > > Once again thank you very much for your kind attention. > > Regards, > > Thomas > > On 12/1/08, Rick Kufrin <rk...@nc...> wrote: > >> Thomas, >> >> We have been attempting to reproduce your error locally, but unfortunately >> cannot seem to duplicate it. What we are currently testing with is: Ubuntu >> 8.04, gcc 4.2, g77 3.4.6 (g77-3.4, installed this morning) >> >> Can you provide more information about your configuration, compiler versions >> (gcc/f77)? What would be especially helpful would be the complete contents >> of your "config.log" file, which should appear in the directory in which you >> configured PerfSuite, and your complete "make" output, if available. >> >> A further piece of information that would be useful would be the output of >> running this command from within the src/libperfsuite/tests directory. This >> command adds the "-v" flag to the compilation step in order to display the >> different phases of compilation: >> >> f77 -I../../../src/libperfsuite -v -g -O2 -c -o psf_rtc-test.o >> psf_rtc-test.f >> >> >> I am copying my colleague and co-developer of PerfSuite, who has been >> looking at this issue with me. >> >> Rick >> >> >> Rick Kufrin wrote: >> >>> Thomas, >>> >>> Your suspicion is probably correct. This particular issue was not caught >>> >> because we did not have the g77 compiler installed on the Ubuntu machine >> that is most similar to yours. I am guessing there is probably some >> language dialect option that will cause the compiler to recognize that >> construct, but do not know what it is offhand. We will look into it. >> >>> Any suggestions from list subscribers who may have a more recent Fortran >>> >> compiler (GNU 4.2.x ideal) on their system are welcome! >> >>> Thanks again for reporting these problems - will update as more is known. >>> >>> Rick >>> >>> >>> >>> >>> >> ------------------------------------------------------------------------ >> >>> Subject: >>> Re: Compilation Error on perfsuite-0.6.2b2 >>> From: >>> "Thomas Tanaka" <tho...@gm...> >>> Date: >>> Wed, 26 Nov 2008 09:48:33 -0800 >>> >>> To: >>> per...@li... >>> >>> >>> I suspect it doesn't like integer*8 start declaration on line 57 on >>> >>> >> perfsuite-0.6.2b2/src/libperfsuite/tests/psf_rtc-test.f >> >>> Thanks, >>> >>> Thomas >>> >>> On Wed, Nov 26, 2008 at 9:40 AM, Thomas Tanaka <tho...@gm...> >>> >> wrote: >> >>> >>>>> Hello again, >>>>> >>>>> I have tried to make the new perfsuite-0.6.2b2 and the following is >>>>> >> the error: >> >>>>> f77 -I../../../src/libperfsuite -g -O2 -o psf_error-test >>>>> psf_error-test.o >>>>> >>>>> >> -L/home/tanaka/Desktop/ALL/perfsuite-0.6.2b2/src/libperfsuite >> >>>>> -lperfsuite >>>>> >>>>> >> /usr/lib/gcc/i486-linux-gnu/4.2.3/../../../../lib/libf2c.so: >> undefined >> >>>>> reference to `MAIN__' >>>>> collect2: ld returned 1 exit status >>>>> f77 -I../../../src/libperfsuite -g -O2 -c -o psf_rtc-test.o >>>>> >> psf_rtc-test.f >> >>>>> MAIN rtctest: >>>>> Warning on line 104: local variable sleepsecs never used >>>>> dowork: >>>>> psf_rtc-test.f: In function 'MAIN__': >>>>> psf_rtc-test.f:74: error: expected '=', ',', ';', 'asm' or >>>>> '__attribute__' before 'stop' >>>>> psf_rtc-test.f:74: error: 'stop' undeclared (first use in this >>>>> >> function) >> >>>>> psf_rtc-test.f:74: error: (Each undeclared identifier is reported only >>>>> >> once >> >>>>> psf_rtc-test.f:74: error: for each function it appears in.) >>>>> psf_rtc-test.f:76: error: expected '=', ',', ';', 'asm' or >>>>> '__attribute__' before 'start' >>>>> psf_rtc-test.f:76: error: 'start' undeclared (first use in this >>>>> >> function) >> >>>>> psf_rtc-test.f:80: error: expected '=', ',', ';', 'asm' or >>>>> '__attribute__' before 'elapsed' >>>>> psf_rtc-test.f:80: error: 'elapsed' undeclared (first use in this >>>>> >> function) >> >>>>> psf_rtc-test.f:82: error: expected ')' before '*' token >>>>> /usr/bin/f77: aborting compilation >>>>> make[4]: *** [psf_rtc-test.o] Error 25 >>>>> make[3]: *** [all-recursive] Error 1 >>>>> make[2]: *** [all-recursive] Error 1 >>>>> make[1]: *** [all-recursive] Error 1 >>>>> make: *** [all] Error 2 >>>>> >>>>> My platform is Linux 2.6.27-rc5-mm1 on ubuntu 8.04 >>>>> >>>>> Any direction is much appreciated. Thanks in advance. >>>>> >>>>> Regards, >>>>> >>>>> Thomas >>>>> >>>>> >>>> >>> >>> >> > > |