You can subscribe to this list here.
2004 |
Jan
|
Feb
(2) |
Mar
(2) |
Apr
|
May
|
Jun
(1) |
Jul
(6) |
Aug
(3) |
Sep
|
Oct
(1) |
Nov
|
Dec
(2) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2005 |
Jan
(2) |
Feb
(2) |
Mar
|
Apr
(6) |
May
|
Jun
(4) |
Jul
(3) |
Aug
|
Sep
|
Oct
(2) |
Nov
(12) |
Dec
(10) |
2006 |
Jan
(27) |
Feb
(4) |
Mar
(3) |
Apr
(5) |
May
(5) |
Jun
(1) |
Jul
(2) |
Aug
|
Sep
(7) |
Oct
(5) |
Nov
(11) |
Dec
(5) |
2007 |
Jan
(15) |
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
(3) |
Sep
(1) |
Oct
|
Nov
(1) |
Dec
|
2008 |
Jan
(7) |
Feb
(9) |
Mar
(2) |
Apr
(1) |
May
|
Jun
(6) |
Jul
(2) |
Aug
|
Sep
|
Oct
(1) |
Nov
(3) |
Dec
(1) |
2009 |
Jan
(11) |
Feb
|
Mar
(2) |
Apr
(1) |
May
(8) |
Jun
(11) |
Jul
(9) |
Aug
(12) |
Sep
(1) |
Oct
(3) |
Nov
(10) |
Dec
|
2010 |
Jan
(3) |
Feb
(1) |
Mar
(5) |
Apr
|
May
|
Jun
|
Jul
|
Aug
(2) |
Sep
(1) |
Oct
(1) |
Nov
|
Dec
|
2011 |
Jan
(2) |
Feb
(2) |
Mar
(1) |
Apr
|
May
(1) |
Jun
|
Jul
|
Aug
(1) |
Sep
|
Oct
(2) |
Nov
|
Dec
|
2012 |
Jan
|
Feb
|
Mar
|
Apr
|
May
(1) |
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2013 |
Jan
(1) |
Feb
|
Mar
|
Apr
(3) |
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
(1) |
Nov
(1) |
Dec
(1) |
2014 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
(1) |
Oct
|
Nov
|
Dec
|
2015 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
(1) |
Aug
|
Sep
|
Oct
(2) |
Nov
|
Dec
|
From: liudawei <dbm...@ya...> - 2006-09-17 07:28:31
|
I have installed perfsuite successfully. and using make -s check show that all the test is ok. when I switch to the install directory /usr/lcoal/bin I only found 4 files :psenv.csh psenv.sh psinv psrun psin can run success and report the machine info. but when i try to run psrun I got the following message ERROR: ld.so: object 'libpsrun.so.0' from LD_PRELOAD cannot be preloaded: ignored. Can anyone help me? Best reagard Dawei --------------------------------- 抢注雅虎免费邮箱-3.5G容量,20M附件! |
From: liudawei <dbm...@ya...> - 2006-09-17 03:53:28
|
I using Ia-64 server. in the process of make, I encontered the follow error message: psinv.o(.text+0x2df2): In function `print_native_events': /home/ftpuser/perfsuite-0.6.2a5/tools/psinv/psinv.c:839: undefined reference to `pfm_get_first_event' psinv.o(.text+0x2e42):/home/ftpuser/perfsuite-0.6.2a5/tools/psinv/psinv.c:842: undefined reference to `pfm_get_next_event' psinv.o(.text+0x2e82):/home/ftpuser/perfsuite-0.6.2a5/tools/psinv/psinv.c:854: undefined reference to `pfm_get_first_event' psinv.o(.text+0x2f32):/home/ftpuser/perfsuite-0.6.2a5/tools/psinv/psinv.c:854: undefined reference to `pfm_get_next_event' psinv.o(.text+0x3102):/home/ftpuser/perfsuite-0.6.2a5/tools/psinv/psinv.c:839: undefined reference to `pfm_get_first_event' psinv.o(.text+0x31b2):/home/ftpuser/perfsuite-0.6.2a5/tools/psinv/psinv.c:854: undefined reference to `pfm_get_next_event' I have patched the libpfm 2.x and my linux CONFIG_PERFMON=y who can help me? Best reagards Dawei Liu --------------------------------- 雅虎免费邮箱-3.5G容量,20M附件 |
From: Rick K. <rk...@nc...> - 2006-07-10 23:05:01
|
Ryszard, > I have looked at XML document but unfortunately there are no more libraries > of interest inside. Setting up the variable PS_PROCESS_MAPPER before running > psprocess to 'addr2line' has significantly reduced the number of unknow > functions, which were marked as a question mark. Thanks again for your help. Thanks again for the followup information. I'm glad to hear that things are improved for you as far as the unknown functions (question marks) go with psprocess. I thought I would add that the latest release of PerfSuite (0.6.2a5) has changed its use of the BFD library in a way that should more closely approximate the results one gets with PSPROCESS_MAPPER set to addr2line (however, if one prefers the latter, that's OK too!). Rick |
From: Ryszard J. <rys...@ce...> - 2006-07-10 15:30:46
|
Hi Rick, Thank you for your last reply. I am answering to your question. > So in summary, you might want to a) examine a representative XML document > to see if the libraries of interest are present in a <module> element, and > b) try "export PSPROCESS_MAPPER=addr2line" before running psprocess to see > if this improves results. In any case, I would be interested to learn of > the outcome. > I have looked at XML document but unfortunately there are no more libraries of interest inside. Setting up the variable PS_PROCESS_MAPPER before running psprocess to 'addr2line' has significantly reduced the number of unknow functions, which were marked as a question mark. Thanks again for your help. Regards, Ryszard. |
From: Rick K. <rk...@nc...> - 2006-06-30 19:35:03
|
Appended is a message from Bron Nelson of SGI regarding a problem that can occur on large single-system-image Linux systems such as the SGI Altix. Note that this problem is not specific to the Altix, but a general limitation of the kernel, as mentioned below. I'm passing this along to this list to alert people of potential problems and also to mention that a workaround should be available in the future. I believe that the problem is related to the timeslice frequency for multiplexing within the PAPI library. By default it is 10us - this is an interval that is currently hardcoded into PAPI. The PAPI developers indicated today that a feature addition to PAPI that would allow user-selectable timeslices should appear in a later release. This in turn would allow PerfSuite to use longer timeslices to avoid pummeling the kernel with interrupts from large numbers of threads and causing system lockup as reported below. Many thanks to Bron Nelson of SGI and Phil Mucci/Dan Terpstra of PAPI/ICL for reporting and response! Rick ---------- Forwarded message ---------- Date: Thu, 29 Jun 2006 16:26:12 -0700 From: Bron Nelson <br...@br...> Subject: perfsuite locking up a 512 Altix We recently saw an instance of the use of "perfsuite" locking up one of the 512p Altix machines at Nasa Ames. The problem appears to be a generic limitation of the Linux community kernel: interrupts can only be delivered at a certain rate, and attempts to deliver signals a rates exceeding that limit cause the kernel to lock. We have generally found that doing 10ms sampling on a 64p job is pretty much right on the edge of what Linux is capable of handling. I would like to suggest/request that perfsuite automatically "dial-down" the rate at which it samples for larger user jobs (e.g. normal rate up to 32p, half the normal rate for 33-64p, a quarter the rater for 65-128p, etc.). While addmittedly this is a bandage, not a cure, the likelyhood of a major overhaul of the Linux signal handling code to fix the underlying problem seems vanishingly small. -- Bron Campbell Nelson br...@sg... These statements are my own, not those of Silicon Graphics. "I tremble for my country when I reflect that God is just." Thomas Jefferson |
From: Rick K. <rk...@nc...> - 2006-05-24 12:52:07
|
Ryszard, Thank you for the followup message, again good information from you. > I usually do the profiling job and the post-processing on the same machine > with the same set of environment variables, just one after another, so it > should work okay. OK, that's the safest way to do things. For some people this is not possible or convenient, so it's worth mentioning. > But I have other question about libraries. If our program is linked against > some libraries, and these libraries are linked against others, will psrun and > psprocess report these all libraries? I have asked this question, becasuse I > still see question marks even using less complicated examples than a full > simulation. psrun (actually, libpshwpc, which is what psrun uses) takes a snapshot of the entries in /proc/PID/maps to determine what libraries are currently loaded by the application. This snapshot occurs at the time a routine named ps_hwpc_init() is called, which (in the case of psrun) will be at the same time that an application enters the main() routine. Libraries that have not yet been loaded will not be present and will not have any samples taken within them accounted for. The output XML document lists all the modules that were accounted for even if they have had no samples taken within them, so that's one way to see what libraries were even under consideration (versus ignored completely). I don't recall if we mentioned this in this thread, but a while ago Tirath Ramdas reported a problem using psprocess on an x86 machine that manifested itself by inconsistent sample attributions showing up as a larger-than-expected number of unaccounted samples - i.e., question marks. Some testing indicated this is most likely due to failure in psprocess' BFD extension. As a workaround, a new environment variable was introduced in psprocess called PSPROCESS_MAPPER. This was in version 0.6.2a3 of PerfSuite. This variable can be set to the string "addr2line" and causes the binutils utility addr2line to be used rather than the internal PerfSuite BFD Tcl extension. Tirath reported more consistent results using this, and you might want to see if it makes any difference for you. This only has an effect on x86 machines, not Itanium. So in summary, you might want to a) examine a representative XML document to see if the libraries of interest are present in a <module> element, and b) try "export PSPROCESS_MAPPER=addr2line" before running psprocess to see if this improves results. In any case, I would be interested to learn of the outcome. Thanks again, Rick |
From: Ryszard J. <rys...@ce...> - 2006-05-24 11:15:13
|
Hello Rick, Thank you for your last email. > Hmmm! This is good news although I am at a loss as to why there might be > a difference depending on the filesystem in use. Nevertheless, I am glad > to hear that things are apparently working. In your opinion, is this > something that warrants inclusion in the BUGS file that is included in the > PerfSuite distribution? If so, I'd be happy to include in the hopes that > it might save someone else a little detective work. I think that it could be a good idea to mention about this problem in the BUGS file. When I had changed a running directory from an AFS into a local directory all my problems with threshold has dissappeared. > One thing I have seen before that sounds related is a difference in > executable/library locations between the computer system on which the > simulation is run versus the system on which the post-processing > (psprocess) occurs. What happened in that case is that the "<module>" > element in the profiling XML output identified the location of the library > through the "file" attribute in the context of runtime. For example, > instead of listing: > > <module file="/usr/lib/libsomething.so.0" offset="a5e000"> > > it had: > > <module file="/mnt/nfs/usr/lib/libsomething.so.0" offset="a5e000"> > > The result was that psprocess could not find the library as listed in the > XML document and would not report any samples for that module. The only > workaround in this case was to correct the entries (using a text editor) > before running psprocess. I usually do the profiling job and the post-processing on the same machine with the same set of environment variables, just one after another, so it should work okay. But I have other question about libraries. If our program is linked against some libraries, and these libraries are linked against others, will psrun and psprocess report these all libraries? I have asked this question, becasuse I still see question marks even using less complicated examples than a full simulation. Regards, Ryszard. |
From: Rick K. <rk...@nc...> - 2006-05-19 12:48:17
|
Ryszard, > Thank you for your concern and reply. Sorry it took a little whilre to get > back to your questions... No problem, I appreciate your taking the time to update on the latest. > I have tried both your suggestions. I have tested them with PerfSuite > 0.6.2.a3 and 0.6.2.a4. The profiling with the "profile.xml" config file works > for both versions!, but unfortunately the problem with a "real" config file > still remains. But, there is also a good news!. Because my simulation > produces some output files and for a long executions, the file size is quite > big (~300MB) I started run this simulation from the local hard drive instead > of using my home AFS directory, which I had used before in order to run a > simulation. After this movement my problem with a threshold has disappeared! > :) So, from my point of view the profiling works okay. Thank you again for > spending time on my questions!. Hmmm! This is good news although I am at a loss as to why there might be a difference depending on the filesystem in use. Nevertheless, I am glad to hear that things are apparently working. In your opinion, is this something that warrants inclusion in the BUGS file that is included in the PerfSuite distribution? If so, I'd be happy to include in the hopes that it might save someone else a little detective work. One thing I have seen before that sounds related is a difference in executable/library locations between the computer system on which the simulation is run versus the system on which the post-processing (psprocess) occurs. What happened in that case is that the "<module>" element in the profiling XML output identified the location of the library through the "file" attribute in the context of runtime. For example, instead of listing: <module file="/usr/lib/libsomething.so.0" offset="a5e000"> it had: <module file="/mnt/nfs/usr/lib/libsomething.so.0" offset="a5e000"> The result was that psprocess could not find the library as listed in the XML document and would not report any samples for that module. The only workaround in this case was to correct the entries (using a text editor) before running psprocess. Thanks again for the update and report, Rick |
From: Ryszard J. <rys...@ce...> - 2006-05-19 11:09:34
|
Hello Rick, Thank you for your concern and reply. Sorry it took a little whilre to get back to your questions... I have tried both your suggestions. I have tested them with PerfSuite 0.6.2.a3 and 0.6.2.a4. The profiling with the "profile.xml" config file works for both versions!, but unfortunately the problem with a "real" config file still remains. But, there is also a good news!. Because my simulation produces some output files and for a long executions, the file size is quite big (~300MB) I started run this simulation from the local hard drive instead of using my home AFS directory, which I had used before in order to run a simulation. After this movement my problem with a threshold has disappeared! :) So, from my point of view the profiling works okay. Thank you again for spending time on my questions!. Have a nice weekend. Best regards, Ryszard. ----- Original Message ----- From: "Rick Kufrin" <rk...@nc...> To: "Ryszard" <rys...@ce...> Cc: <per...@li...> Sent: Sunday, April 30, 2006 1:37 AM Subject: Re: [PerfSuite-users] threshold > Hello again Ryszard, > > I have finally had a chance to look at the output you sent more closely > and thought I would follow up. Unfortunately, there is nothing obvious > that is shown by the output you provided except that the run proceeds > through the initialization phase that PerfSuite (psrun) goes through. My > instinct is that the problem may be related to substantial memory > requirements for the profiling buffers that are allocated by PerfSuite, > one for each shared library in addition to your application itself. > As you reported, there are quite a few shared libraries involved; the > debug output indicated that 84 were found in the load map. > > I have two suggestions that you might want to try to test this theory. > The first is to try using an alternate configuration file that omits the > shared libraries. There is one called "profil.xml" that should be > installed from the PerfSuite distribution in the directory > $PREFIX/share/perfsuite/xml/pshwpc. This configuration only allocates > memory for the main program text of your application and should cut down > the memory required for profiling buffers considerably. Of course, you'll > not get any samples attributed to the shared libraries (which may be what > you're interested in), but if things work consistently with profil.xml, > it may point to memory issues. > > The second suggestion is to consider using the latest 0.6.2 version of > PerfSuite. There were some changes made in memory management for > profiling buffers that may have a positive effect, although no guarantees. > > Unfortunately, there is no way to be selective about which shared > libraries are profiled in PerfSuite in order to cut down on memory usage > as well as avoid modules that are not of interest. This would be a good > option to add, but nothing currently exists. > > I hope this helps, if you do try one or both of these suggestions, it > would be interesting to learn the result. > > Rick > > On Tue, 18 Apr 2006, Ryszard wrote: > >> Hi everybody, >> >> I am playing with a psrun and profiling. >> >> Here is a little background about my application. I have a simulation and >> its executable file is dynamically linked with about 80 libraries. More >> or less, the application spends about 2 seconds in order to open all >> shared libraries, before a real running (before main function). >> Simulations works well, but there is a problem with the psrun and the >> profiling. The psrun tool stops and does not run the simulations. The >> psrun exits after the fork() function. >> >> The running environment and the config file as well is exactly the same >> as in case of the profiling another simulation which works good. I >> started to change threshold parameter in the config file from 50000 to >> 200000 and it started work. The problem is that this trick does not give >> a stable solution, because from time to time the profiling job still does >> not work. >> >> Do you have any ideas, what could be wrong here? Please contact me if you >> need more information. Thank you in advance for your help. >> >> Best regards, >> Ryszard. |
From: Rick K. <rk...@nc...> - 2006-05-04 02:14:22
|
Henry, Thanks very much for testing this out and reporting the problem. You located the problem spot perfectly and yes, unfortunately it is a bug in psprocess. I'm going to copy the list on this in case others run across the same issue and will package an a4 release to fix this problem as soon as possible. The combine operation is very common for people to use so this shouldn't stay unaddressed for long. Fortunately, there is a quick fix that can be applied to combine.tcl either before or after installation. At the beginning of the source code in combine.tcl, lines 56-60 are: proc readAndValidateMultiple {fileList ignoreErrors} { global timers set res {} set doctype "" The following line should be added after line 60 ("set doctype"): set timers(total,domparse) 0 Within the distribution, combine.tcl can be found in the directory: src/tools/psprocess/tcl After installation, it will be located in: $PREFIX/share/perfsuite/tclbin/psprocess This is a bug that was introduced with a new capability of this release (measuring the time necessary to do XML document processing). Sorry I missed this before the release! But thanks again for the report and also the very good news that things appear to work otherwise on Columbia. Rick On Wed, 3 May 2006, Haoqiang H. Jin wrote: > Hi Rick, > > Glad to see you had the new version released. I installed > this version on our Columbia system. Except for a minor > problem, it worked pretty much out of the box. The minor > glitch occurred when I tried psprocess in the combined mode. > > % psprocess -c a.out.0.m.xml a.out.1.m.xml > can't read "timers(total,domparse)": no such variable > > The error seems to be generated from 'combine.tcl' > in line 78: > incr timers(total,domparse) [expr {$stoptime - $starttime}] > > If I comment out this line or replace 'incr' with 'set', > psprocess ran successfully. Is this a bug or something > else that I should specify in the command line? > > Thanks for your great work to make the package available! > > -Henry |
From: Rick K. <rk...@nc...> - 2006-04-30 16:20:40
|
Updates on some outstanding issues/questions: Regarding the following message that was sent back in January, I thought I would say that the new example program mentioned in the last paragraph has been included in the new 0.6.2a3 distribution. It's called "sampler" and it can be found in the directory "examples/sampler", which will be installed under $PREFIX/share/perfsuite. Also, the environment variable PSPROCESS_MAPPER, which was introduced to address psprocess issues that were raised by Tirath Ramdas around the same time, is now available in the 0.6.2a3 release. The thread originally describing this can be found here: http://sourceforge.net/mailarchive/forum.php?thread_id=9490386&forum_id=39162 Rick On Mon, 30 Jan 2006, Rick Kufrin wrote: > Tirath, > >> Thanks for clarifying that for me; for some reason I had the idea >> that the interrupt is triggered every \tau seconds, at which point it >> samples the counter, then resets it, then goes back to sleep for >> another \tau seconds, repeat... in which case the total count in the >> end should have been the aggregate count. > > Well, that's not a bad idea but it's not the way it's implemented in > PerfSuite. I should say this is a limitation of PerfSuite, not the > hardware or driver support - there's nothing to prevent one from > configuring on, say, the Itanium2, one performance counter to generate > an interrupt after some number of events (whether cycles or whatever), > and use that as a trigger to read event counts from one of the other 3 > counters on that CPU. This could provide some type of real-time > monitoring I think. > > There will be a new example program available that does something similar > but using interval timers (setitimer) to generate periodic interrupts, > allowing the interrupt handler to read the current value of the counters. > I'm thinking this is similar to what you're describing. > > Rick |
From: Rick K. <rk...@nc...> - 2006-04-29 23:37:24
|
Hello again Ryszard, I have finally had a chance to look at the output you sent more closely and thought I would follow up. Unfortunately, there is nothing obvious that is shown by the output you provided except that the run proceeds through the initialization phase that PerfSuite (psrun) goes through. My instinct is that the problem may be related to substantial memory requirements for the profiling buffers that are allocated by PerfSuite, one for each shared library in addition to your application itself. As you reported, there are quite a few shared libraries involved; the debug output indicated that 84 were found in the load map. I have two suggestions that you might want to try to test this theory. The first is to try using an alternate configuration file that omits the shared libraries. There is one called "profil.xml" that should be installed from the PerfSuite distribution in the directory $PREFIX/share/perfsuite/xml/pshwpc. This configuration only allocates memory for the main program text of your application and should cut down the memory required for profiling buffers considerably. Of course, you'll not get any samples attributed to the shared libraries (which may be what you're interested in), but if things work consistently with profil.xml, it may point to memory issues. The second suggestion is to consider using the latest 0.6.2 version of PerfSuite. There were some changes made in memory management for profiling buffers that may have a positive effect, although no guarantees. Unfortunately, there is no way to be selective about which shared libraries are profiled in PerfSuite in order to cut down on memory usage as well as avoid modules that are not of interest. This would be a good option to add, but nothing currently exists. I hope this helps, if you do try one or both of these suggestions, it would be interesting to learn the result. Rick On Tue, 18 Apr 2006, Ryszard wrote: > Hi everybody, > > I am playing with a psrun and profiling. > > Here is a little background about my application. I have a simulation and its executable file is dynamically linked with about 80 libraries. More or less, the application spends about 2 seconds in order to open all shared libraries, before a real running (before main function). > Simulations works well, but there is a problem with the psrun and the profiling. The psrun tool stops and does not run the simulations. The psrun exits after the fork() function. > > The running environment and the config file as well is exactly the same as in case of the profiling another simulation which works good. I started to change threshold parameter in the config file from 50000 to 200000 and it started work. The problem is that this trick does not give a stable solution, because from time to time the profiling job still does not work. > > Do you have any ideas, what could be wrong here? Please contact me if you need more information. Thank you in advance for your help. > > Best regards, > Ryszard. |
From: Rick K. <rk...@nc...> - 2006-04-29 00:00:13
|
PerfSuite 0.6.2a3 is now available. This is a significant update to PerfSuite that is the first version developed against 2.6-based Linux kernels (it may also be used with 2.4-based kernels). There are also numerous enhancements and bug fixes in this release. Please see the CHANGES file for a complete listing. This release does not yet offer direct support for ia64/Linux libpfm-3. This functionality is targeted for release 0.6.2 beta 1. This announcement is being cross-posted to both perfsuite-announce and perfsuite-users in order to reach as many as possible who have been (patiently) waiting for updated kernel support. Subsequent announcements for 0.6.2 will be posted to perfsuite-announce and SourceForge news items only. URL: http://perfsuite.sourceforge.net/ http://perfsuite.ncsa.uiuc.edu/download/ Rick |
From: Rick K. <rk...@nc...> - 2006-04-18 23:51:09
|
Ryszard, Sorry to hear you are experiencing problems with profiling through psrun. I'm afraid it may be a little difficult to diagnose what could be going on without being able to reproduce the problem. Your application sounds quite complex, so I'm not hopeful that asking you to provide would be reasonable - but if there is any way of demonstrating the behavior with a small test case, it would be very helpful. At a minimum though, would you provide the output of "psinv" and "psinv -x"? These provide some information about your system and supporting software, as well as PerfSuite version numbers. Also, only to clarify: when you say psrun stops after the fork, I assume you mean the fork within psrun itself (versus your application)? In normal operation, the psrun process itself should be waiting for the child process (your application) to exit before psrun does. So I would guess that for some reason your process is exiting prematurely but the question is where and why. There is an option to psrun (-X) added in version 0.6.2 that will cause psrun to return the exit status of the command being measured rather than psrun itself (there is an equivalent environment variable PSRUN_EXITSTATUS that can be set to cause the same effect). Typically what I do to help learn what is going on within the internals of PerfSuite is to use a "debugging build". This requires a total rebuild of PerfSuite itself after having reconfigured as normal but including the flag "--enable-debug". What this does is to activate debugging code within PerfSuite that tracks various things within the software and can be helpful in learning at what point failure happens. The level of debugging output can be controlled through an environment variable PS_DEBUG, which can be set to an integer between 0 and 3 (the higher the value, the more verbose the debugging information). If you have the time and inclination to try that and would like to send me the output from a run that fails, perhaps we can get you going again. One would not want to run a debug-compiled version of PerfSuite in normal production because it does add some overhead to the run. Rick On Tue, 18 Apr 2006, Ryszard wrote: > Hi everybody, > > I am playing with a psrun and profiling. > > Here is a little background about my application. I have a simulation and its executable file is dynamically linked with about 80 libraries. More or less, the application spends about 2 seconds in order to open all shared libraries, before a real running (before main function). > Simulations works well, but there is a problem with the psrun and the profiling. The psrun tool stops and does not run the simulations. The psrun exits after the fork() function. > > The running environment and the config file as well is exactly the same as in case of the profiling another simulation which works good. I started to change threshold parameter in the config file from 50000 to 200000 and it started work. The problem is that this trick does not give a stable solution, because from time to time the profiling job still does not work. > > Do you have any ideas, what could be wrong here? Please contact me if you need more information. Thank you in advance for your help. > > Best regards, > Ryszard. |
From: Ryszard <rys...@ce...> - 2006-04-18 15:27:36
|
Hi everybody, I am playing with a psrun and profiling.=20 Here is a little background about my application. I have a simulation = and its executable file is dynamically linked with about 80 libraries. = More or less, the application spends about 2 seconds in order to open = all shared libraries, before a real running (before main function).=20 Simulations works well, but there is a problem with the psrun and the = profiling. The psrun tool stops and does not run the simulations. The = psrun exits after the fork() function.=20 The running environment and the config file as well is exactly the same = as in case of the profiling another simulation which works good. I = started to change threshold parameter in the config file from 50000 to = 200000 and it started work. The problem is that this trick does not give = a stable solution, because from time to time the profiling job still = does not work.=20 Do you have any ideas, what could be wrong here? Please contact me if = you need more information. Thank you in advance for your help. Best regards, Ryszard. |
From: Ryszard J. <rys...@ce...> - 2006-03-17 17:54:54
|
Thank you for your quick reply. > One thing that may help in this situation is to arrange for the libraries > to be loaded at startup, which should be possible by including the .so of > interest in the LD_PRELOAD environment variable before you do your run. > That should cause the library to be included earlier on in the process and > then picked up when the profiling buffers are allocated. PerfSuite should > preserve existing LD_PRELOAD settings that are in effect when psrun starts > the measurement. This really works with my simple application. I had set up LD_PRELOAD variable and then I run the psrun with my application (with dynamic loading of libraries). Unfortunately it does not work with a huge simulations from Atlas, but I think that the idea of using LD_PRELOAD with the dynamic loading could help very much. Have a nice weekend, Ryszard. |
From: Rick K. <rk...@nc...> - 2006-03-17 13:53:54
|
Ryszard, Thank you for the very complete description of the issue - it's very helpful. I'll skip ahead to this portion of your message and comment, perhaps this will help to learn what might be going on: > Maybe this information could help. During profiling other small programs, I saw the different behaviour between the application which is dynamically linked with share libraries, and between one which uses the dynamic loading of shared libraries. It is worth to mention, that both applications run the same time. During running the application with the dymamic loading the only 11 samples are collected, compared to 381998 for the dynamically linked application. Below there are two small parts of reports from profiling two applications which take advantage of the same library: When you mention dynamic loading, I'm assuming you mean loading the library at runtime through some mechanism like dlopen(). If I understand properly what you're saying, I think your observation/guess is correct. PerfSuite's profiling mechanism works like this: the region(s) of program addresses to which samples are attributed are determined at application startup. The currently-loaded modules are scanned and profiling buffers are allocated for use in sample collection. Any module that is not present in the load map at that time is not included in the profile, as the addresses associated with it do not have a corresponding slot in the buffers that were created earlier. One thing that may help in this situation is to arrange for the libraries to be loaded at startup, which should be possible by including the .so of interest in the LD_PRELOAD environment variable before you do your run. That should cause the library to be included earlier on in the process and then picked up when the profiling buffers are allocated. PerfSuite should preserve existing LD_PRELOAD settings that are in effect when psrun starts the measurement. There is a section within the PerfSuite BUGS file (included with the distribution) that also attempts to describe this issue and workarounds, however it has changed (been expanded) a bit since the 0.6.1 release, so the information in your copy may not cover it as much as more recent versions. I hope this helps, please advise if this is not addressing the issue. Rick |
From: Ryszard J. <rys...@ce...> - 2006-03-17 11:30:24
|
Hi, I am using PerfSuite 0.6.1with perfctr-2.6.19 and with papi-3.2.1. I am = trying to profiling the Atlas simulation at CERN. It seems to work, but = I got stuck in one point. I run the profiling job by the command: export LD_PRELOAD=3Dsome libraries psrun -o output_file -e -p -c papi_cycles.xml python simulation.py = joboptions.py export LD_PRELOAD=3D"" In the output file from psprocess I can find most of libraries apart = from one (geant4 G4) which I expected to find. This library is not = inluded in the LD_PRELOAD variable. Functions from this library are very = frequently and hard used. I guess, that there are questions marks instead of the missed library in = the report. I have attached a small part from my raport. Profile Information =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D Class : PAPI Event : PAPI_TOT_CYC (Total cycles) Period : 50000 Samples : 18835979 Domain : user Run Time : 3759.06 (seconds) Min Self % : (all) Module Summary -------------------------------------------------------------------------= ------- Samples Self % Total % Module 4338437 23.03% 23.03% /usr/lib/libstdc++.so.5.0.3 4118852 21.87% 44.90% /lib/tls/libc-2.3.2.so 3633690 19.29% 64.19% /lib/tls/libm-2.3.2.so 1575474 8.36% 72.56% = /afs/cern.ch/sw/lcg/external/clhep/1.9.2.1/slc3_ia32_gcc323/lib/libCLHEP-= Random-1.9.2.1.so ..... =20 File Summary -------------------------------------------------------------------------= ------- Samples Self % Total % File 9508883 50.48% 50.48% ? 4893321 25.98% 76.46% ?? 1212717 6.44% 82.90% ../../../CLHEP/Random/src/JamesRandom.cc ..... Function Summary -------------------------------------------------------------------------= ------- Samples Self % Total % Function 4893321 25.98% 25.98% ?? 1212677 6.44% 32.42% CLHEP::HepJamesRandom::flat()=20 .... Function:File:Line Summary -------------------------------------------------------------------------= ------- Samples Self % Total % Function:File:Line 4893321 25.98% 25.98% ??:??:0 651638 3.46% 29.44% _int_malloc:?:0 592882 3.15% 32.59% atan2:?:0 462413 2.45% 35.04% __ieee754_log:?:0=20 As you can see at module summary 65% is reported as libstdc++, libc, = libm and in function summary. At the file summary 75% is unknown and at = function summary ~26% is unknown.=20 Do you have any ideas where the problem could be? Maybe this information could help. During profiling other small = programs, I saw the different behaviour between the application which is = dynamically linked with share libraries, and between one which uses the = dynamic loading of shared libraries. It is worth to mention, that both = applications run the same time. During running the application with the = dymamic loading the only 11 samples are collected, compared to 381998 = for the dynamically linked application. Below there are two small parts = of reports from profiling two applications which take advantage of the = same library: 1. Dynamically linked application Profile Information =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D Class : PAPI Event : PAPI_TOT_CYC (Total cycles) Period : 50000 Samples : 381998 Domain : user Run Time : 22.02 (seconds) Min Self % : (all) Module Summary -------------------------------------------------------------------------= ------- Samples Self % Total % Module 272392 71.31% 71.31% = /afs/cern.ch/user/o/oplaatl3/testdll/libhello2.so.1 109600 28.69% 100.00% = /afs/cern.ch/user/o/oplaatl3/testdll/libhello1.so.1 2 0.00% 100.00% /lib/tls/libc-2.3.2.so 2 0.00% 100.00% /lib/ld-2.3.2.so 1 0.00% 100.00% /usr/local/papi/lib/libpapi.so 1 0.00% 100.00% /usr/local/perfsuite/lib/libpsrun.so.0.0.0 File Summary -------------------------------------------------------------------------= ------- Samples Self % Total % File 360739 94.43% 94.43% ? 21259 5.57% 100.00% ?? Function Summary -------------------------------------------------------------------------= ------- Samples Self % Total % Function 194343 50.88% 50.88% sum 97454 25.51% 76.39% hello 68937 18.05% 94.43% count 21259 5.57% 100.00% ?? 1 0.00% 100.00% _IO_default_xsputn 1 0.00% 100.00% strlen 1 0.00% 100.00% _fini 1 0.00% 100.00% strcmp 1 0.00% 100.00% _dl_lookup_versioned_symbol =20 2. Application with the dynamic loading Profile Information =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D Class : PAPI Event : PAPI_TOT_CYC (Total cycles) Period : 50000 Samples : 11 Domain : user Run Time : 21.93 (seconds) Min Self % : (all) Module Summary -------------------------------------------------------------------------= ------- Samples Self % Total % Module 6 54.55% 54.55% /lib/ld-2.3.2.so 4 36.36% 90.91% /lib/tls/libc-2.3.2.so 1 9.09% 100.00% /lib/libdl-2.3.2.so File Summary -------------------------------------------------------------------------= ------- Samples Self % Total % File 10 90.91% 90.91% ? 1 9.09% 100.00% ?? Function Summary -------------------------------------------------------------------------= ------- Samples Self % Total % Function 2 18.18% 18.18% strcmp 1 9.09% 27.27% strlen 1 9.09% 36.36% _dl_close 1 9.09% 45.45% do_lookup 1 9.09% 54.55% fixup 1 9.09% 63.64% rtld_lock_default_lock_recursive 1 9.09% 72.73% _setjmp 1 9.09% 81.82% ?? 1 9.09% 90.91% dl_open_worker 1 9.09% 100.00% __write_nocancel=20 Thank you in advance for any help and comments. Best regards, Ryszard. |
From: Rick K. <rk...@nc...> - 2006-02-06 15:29:52
|
Phuong, > I was trying to use psrun / psprocess to look at a code segment and got > the following bad statistics from psprocess > > Level 2 cache line reuse (data)........................................ -0.999 > > Level 2 cache hit rate (data).......................................... -1494.576 > > All the other stat seem reasonable. Any idea why ? This is a pretty common thing to see in psprocess output when using an IA-64 platform like the Altix. The statistics that are calculated by psprocess are pretty "generic", to be used with PAPI events. They are taken from an XML database that should be installed in: PREFIX/share/perfsuite/xml/pshwpc/PAPI_metrics.xml The definition of the hit rate above uses PAPI_L1_DCM and PAPI_L2_DCM. On the Itanium 2, floating point loads bypass the L1 cache with the result that you can have more L2 cache misses than L1 cache misses (you can see if this is the case by looking either at the raw XML output or the display earlier on in the psprocess summary). This is a bit counter-intuitive and is what I think you may be seeing here (if not, please forward along one of the .xml files that were generated). I think there are ways to qualify things further with the Itanium 2 PMU to correct for this, but this is not done in PerfSuite, so one has to look at some of these statistics in the context of what's going on underneath. Rick |
From: Vu, P. A \(MP Technology\) <Phu...@bp...> - 2006-02-06 08:07:07
|
I was trying to use psrun / psprocess to look at a code segment and got the following bad statistics from psprocess Level 2 cache line reuse (data)........................................ = -0.999 Level 2 cache hit rate (data).......................................... = -1494.576 All the other stat seem reasonable. Any idea why ? Thanks, Phuong |
From: Rick K. <rk...@nc...> - 2006-02-05 16:52:09
|
Laksono, > Just a simple question: How to detect false sharing in an OpenMP or > multithreading program with Perfsuite ? In case this term is unfamiliar to people on the list, I think what you are referring to is a situation where performance degradation occurs because of multiple threads modifying the same (shared) cache line, such that even though specific data items within the line are modified independently by each thread, the line as a whole is the source of contention between them. (I'm sure there are plenty of architecture texts that say it a lot better than that :) In general, what can help is to look for cache invalidations/interventions and to see if cache misses increase with additional processors used. PAPI defines the events PAPI_CA_INV and PAPI_CA_ITV in addition to the _DCM events to help track these. If they're not defined for a particular processor, then you have to go to the native events (vendor materials). You'll want to turn on threading support with PerfSuite (psrun -p or link with libpshwpc_r) to track these things on a per-thread basis. If you think that's what's going on, then you might profile on the events to try to localize where in the application the problem might be occurring. There are also other tools that are system-dependent that might help the can be found with a little research through a web search (for example, pmshub on the SGI Altix). Hope this helps, Rick |
From: Laksono A. <la...@cs...> - 2006-02-05 13:05:47
|
Dear all, Just a simple question: How to detect false sharing in an OpenMP or multithreading program with Perfsuite ? Thanks ------ Laksono Adhianto |
From: Rick K. <rk...@nc...> - 2006-01-30 22:09:58
|
Tirath, > Thanks for clarifying that for me; for some reason I had the idea > that the interrupt is triggered every \tau seconds, at which point it > samples the counter, then resets it, then goes back to sleep for > another \tau seconds, repeat... in which case the total count in the > end should have been the aggregate count. Well, that's not a bad idea but it's not the way it's implemented in PerfSuite. I should say this is a limitation of PerfSuite, not the hardware or driver support - there's nothing to prevent one from configuring on, say, the Itanium2, one performance counter to generate an interrupt after some number of events (whether cycles or whatever), and use that as a trigger to read event counts from one of the other 3 counters on that CPU. This could provide some type of real-time monitoring I think. There will be a new example program available that does something similar but using interval timers (setitimer) to generate periodic interrupts, allowing the interrupt handler to read the current value of the counters. I'm thinking this is similar to what you're describing. Rick |
From: Tirath R. <ti...@tp...> - 2006-01-28 22:52:58
|
Rick, > For profiling runs, a sample is recorded for only every Nth > occurence of > an event (where N is user-selectable in PerfSuite, either through the Thanks for clarifying that for me; for some reason I had the idea that the interrupt is triggered every \tau seconds, at which point it samples the counter, then resets it, then goes back to sleep for another \tau seconds, repeat... in which case the total count in the end should have been the aggregate count. cheers, -tirath |
From: Tirath R. <ti...@tp...> - 2006-01-28 22:45:34
|
Henry, On 28/01/2006, at 12:10 PM, Haoqiang H. Jin wrote: > > This is not allowed in Fortran. Try something like: > > IMPLICIT DOUBLE PRECISION(A-H,O-Z) > INCLUDE 'fperfsuite.h' Yes, you were spot on. Thank you very much, from a grateful FORTRAN newbie! -tirath |