From: Rick K. <rk...@nc...> - 2007-08-29 16:22:27
|
Daniel - thanks for the message. I think you are not a subscriber and SourceForge bounced the email, but I am forwarding on to the list because I think you raise some good points. Comments follow your message. I also add some information about the general status and future of PerfSuite in case that is of interest to the list and users. On Wed, 29 Aug 2007, Daniel Thomas wrote: > Hi, > > I am trying to profile a 128 Core MPI application with Psrun on an Linux SGI > x86_64 machine. > The system is not patched with Perfctr nor with Perfmon and so I am reducing > my usage to the > /usr/local/share/perfsuite/xml/pshwpc/itmer.xml config file. > My problem is not related to the number of CPU but to the additional memory > Perfsuite is using on > each node of the cluster. This application uses a lot of memory for data and > also uses several xx.so > libraries that are themselves big. With the Perfsuite Itimer the application > is swapping. Shortly said it > cannot be used. > Is there a way to get equivalent information than the one the itimer.xml > profiling is providing with an other > config file on such non patched cluster that would use far less memory ? > > As I am suspecting I have few chance to get a positive answer to this last > question. I had a look to the Perfsuite > source. It seems to me looking at the get_pc, bin_pc routines in profile.c > that a memory location is reserved > for any possible PC address of the program. With my big text program too much > memory is allocated. > This is very sad as at the end only few counters are non zero (say some > hundreds). > It seems to me that it would not be an "Everest" work to manage dynamically ( > hashing or Btree) the PC locations instead > of performing the direct samples[map][pcoff]++ increment > Then it would not be too difficult to change the report routines to agree > with this new format. > This would introduce a little overhead to manage this indirect structure by > experiences with other profilers > learn us that this overhead is acceptable especially with HPC applications( > my domain) where the probability to be > interrupted inside the same inner loop is very high ( and so the ability to > retrieve immediately the corresponding counter). > I may think to implement it if I can get support from the Perfsuite > developers > > Thanks, > Daniel > > You are absolutely correct that memory consumption when profiling is not as efficient as it could be. Not only for the reason you point out (total mapping of program text and shared libraries), but additional details as well. One solution, as you point out, could be a different arrangement of data structures internally. The primary reason why it is as simple as it is in PerfSuite is for clarity and ease of understanding (also to avoid overhead as you mention, but that's a secondary reason). Another approach that I have considered is to allow the user to selectively include or exclude particular shared libraries from the profile. This would save any memory currently used always for all libraries that are in the load map at program startup. I've had requests from both sides of the fence ("I want more/less profiling of shared libraries"), each for legitimate reasons. I'd be interested to learn of people's opinions on possible approaches in the future. Regarding the more general state of PerfSuite's "future": to date, general development has been largely unfunded however we've had a need for these sorts of capabilities at NCSA (an HPC center at Illinois), and so there has been informal internal support for the work. Recently, however, it appears positive that outside funding through a grant may be coming to help support continued development/improvement of PerfSuite. This is welcomed news, of course, and would assure enhancements/maintenance for some time to come. I will update the list as things progress. In the meantime, to answer your direct question, please feel free to try your approach out if you have the time - and also feel free to assume that any support/questions you might have will be welcomed... happy to help where possible (and appreciate your willingness to dig in to help improve PerfSuite). It's been a while since there has been a new release of PerfSuite, and work is currently underway to update, with the most notable enhancement being support for Intel Core/Core 2 platforms. Hope this helps, Rick |