|
From: Hari N. <ha...@al...> - 2007-02-20 17:44:52
|
I'm trying to profile a mixed Fortran90/Fortran77 code with callgrind and kcachegrind. The code was compiled with g95 with the following flags: -g -fbounds-check -pg -ftrace=full -freal=nan -fpointer=invalid When I run kcachegrind, the call graph is displayed correctly, and most of the routines display their code. However some do not, but rather messages like this: There is no source available for the following function: 'l_rad__' This is because no debug information is present. This is with KCachegrind 0.4.5kde and valgrind 3.2.3. Does anyone have any suggestions as to what to try? The program spends most of its time in routines I can't browse! Thanks, Hari |
|
From: Josef W. <Jos...@gm...> - 2007-02-20 22:05:53
|
On Tuesday 20 February 2007, Hari Nair wrote: > I'm trying to profile a mixed Fortran90/Fortran77 code with callgrind > and kcachegrind. > > The code was compiled with g95 with the following flags: > -g -fbounds-check -pg -ftrace=full -freal=nan -fpointer=invalid > > When I run kcachegrind, the call graph is displayed correctly, and most > of the routines display their code. However some do not, but rather > messages like this: > > There is no source available for the following function: > 'l_rad__' > This is because no debug information is present. > > This is with KCachegrind 0.4.5kde and valgrind 3.2.3. > > Does anyone have any suggestions as to what to try? The program spends > most of its time in routines I can't browse! Obviously most of the time is spent in library functions you can not change/optimize anyway. Of what help would be the annotation of source (you probably do not have) there? But these functions (like that l_rad__) are somehow called by your code. Either check for a possibility to reduce the number of calls to these functions (cache the results?/are they really needed?), or try to get rid of them, e.g. by doing your own, better implementation. If this is not possible, I don't think you can optimize further. However, you also should check if these functions also take most of time in reality (e.g. by using oprofile, or directly inserting time measurement). Josef > > Thanks, > Hari > > ------------------------------------------------------------------------- > Take Surveys. Earn Cash. Influence the Future of IT > Join SourceForge.net's Techsay panel and you'll get the chance to share your > opinions on IT & business topics through brief surveys-and earn cash > http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV > _______________________________________________ > Valgrind-users mailing list > Val...@li... > https://lists.sourceforge.net/lists/listinfo/valgrind-users > |
|
From: Hari N. <ha...@al...> - 2007-02-20 23:08:52
|
Josef Weidendorfer wrote: > Obviously most of the time is spent in library functions you can not > change/optimize anyway. Of what help would be the annotation of source > (you probably do not have) there? No, most of the time does appear to be spent in functions in our code. Anyway, I should be able to browse the code for l_rad and other routines, and I can't, so it seems that there may be a problem with callgrind, right? > However, you also should check if these functions also take most of > time in reality (e.g. by using oprofile, or directly inserting time > measurement). I hadn't heard of oprofile before. I will see what that can tell me. Thanks for the tip! Hari |
|
From: Josef W. <Jos...@gm...> - 2007-02-20 23:40:06
|
On Wednesday 21 February 2007, Hari Nair wrote: > Josef Weidendorfer wrote: > > Obviously most of the time is spent in library functions you can not > > change/optimize anyway. Of what help would be the annotation of source > > (you probably do not have) there? > > No, most of the time does appear to be spent in functions in our code. Ah, sorry. So l_rad is your code? Obviously KCachegrind thought that this function has no debug information. Perhaps your Fortran compiler generates a debug format that Valgrind does not understand, or the debug reader for the given format has a bug. Perhaps you can change the format. VG should be able to understand DWARF and STABS, at least the line number info which is needed here. Josef |
|
From: Nicholas N. <nj...@cs...> - 2007-02-20 23:48:29
|
On Wed, 21 Feb 2007, Josef Weidendorfer wrote: > On Wednesday 21 February 2007, Hari Nair wrote: >> Josef Weidendorfer wrote: >> > Obviously most of the time is spent in library functions you can not >> > change/optimize anyway. Of what help would be the annotation of source >> > (you probably do not have) there? >> >> No, most of the time does appear to be spent in functions in our code. > > Ah, sorry. So l_rad is your code? > Obviously KCachegrind thought that this function has no debug information. > Perhaps your Fortran compiler generates a debug format that Valgrind > does not understand, or the debug reader for the given format has a bug. > > Perhaps you can change the format. VG should be able to understand DWARF > and STABS, at least the line number info which is needed here. Changing the topic slightly... I currently see two main problems faced by Valgrind users: (1) unimplemented opcodes on AMD64; (2) debug info problems. (1) is slowly improving, although the current strategy is usually "wait for a complaint, then uncomment the code that already handles it". It would be nice to be more proactive for the cases where we already have code to handle it. (2) is more difficult. The main problem is that we're relying on the debug info produced by other programs. We've done a great job in general of insulating Valgrind from the rest of the world (eg. removing glibc and pthreads dependencies) which I think has helped a lot. This one seems harder to avoid. Julian's change to print the debug info in exactly the same format as readelf is a great start. I guess that way we can say to someone that if Valgrind is doing exactly the same thing as readelf, then we can blame their compiler for generating bad debug info. Still frustrating, though. Nick |
|
From: Josef W. <Jos...@gm...> - 2007-02-21 00:48:42
|
On Wednesday 21 February 2007, Nicholas Nethercote wrote: > On Wed, 21 Feb 2007, Josef Weidendorfer wrote: > > (2) is more difficult. The main problem is that we're relying on the debug > info produced by other programs. We've done a great job in general of > insulating Valgrind from the rest of the world (eg. removing glibc and > pthreads dependencies) which I think has helped a lot. This one seems > harder to avoid. Julian's change to print the debug info in exactly the > same format as readelf is a great start. Yes, that's really a very good, important thing. It allows us to do comparisions of the debug info reader with readelf on various platforms and binaries. But perhaps more important, it allows users to pinpoint at the differences when they have problems with a binary. At least, I hope so. > I guess that way we can say to > someone that if Valgrind is doing exactly the same thing as readelf, then we > can blame their compiler for generating bad debug info. Still frustrating, > though. IMHO the best would be to directly use readelf ourself; to me it seems to be some kind of de-facto reference implementation. This should be done with an out-of-process helper, perhaps communicating via shared memory. Is this an option at all? Josef |
|
From: Ashley P. <as...@qu...> - 2007-02-21 10:20:04
|
On Wed, 2007-02-21 at 01:48 +0100, Josef Weidendorfer wrote: > On Wednesday 21 February 2007, Nicholas Nethercote wrote: > > On Wed, 21 Feb 2007, Josef Weidendorfer wrote: > > I guess that way we can say to > > someone that if Valgrind is doing exactly the same thing as readelf, then we > > can blame their compiler for generating bad debug info. Still frustrating, > > though. > > IMHO the best would be to directly use readelf ourself; to me > it seems to be some kind of de-facto reference implementation. This should be > done with an out-of-process helper, perhaps communicating via shared memory. > Is this an option at all? I've done some experiments with this although using addr2line rather than readelf. I've got a script which does post-processing on the xml files produced by memcheck and it's fairly easy to resolve addresses offline. I don't think it's the "Right Thing"(tm) but has helped me on a number of occasions when icc has thrown valgrind off the scent. I plan on submitting this script shortly anyway and could include this functionality if people think it could be useful. Ashley, |
|
From: Julian S. <js...@ac...> - 2007-02-21 01:08:38
|
> I currently see two main problems faced by Valgrind users: > > (1) unimplemented opcodes on AMD64; > (2) debug info problems. > > (1) is slowly improving, although the current strategy is usually "wait for > a complaint, then uncomment the code that already handles it". It would be > nice to be more proactive for the cases where we already have code to > handle it. Yes. I think 3.2.3 is much better than previous versions in that respect. AFAICS we handle all insns produced by gcc and icc; the recent failures (http://bugs.kde.org/show_bug.cgi?id=141790) were generated by a compiler of hardware descriptions. My implement-on- demand strategy is motivated by not wanting to implement opcodes without at least some sort of test case; tracking down incorrectly implemented, rarely used opcodes after the fact is really no fun. (I had to do that for x86 'fsincos', for example). > (2) is more difficult. The main problem is that we're relying on the debug > info produced by other programs. We've done a great job in general of > insulating Valgrind from the rest of the world (eg. removing glibc and > pthreads dependencies) which I think has helped a lot. This one seems > harder to avoid. Julian's change to print the debug info in exactly the > same format as readelf is a great start. I guess that way we can say to > someone that if Valgrind is doing exactly the same thing as readelf, then > we can blame their compiler for generating bad debug info. Still > frustrating, though. Handling debug info really well all the time is difficult. For the most part I think we do ok, although there's certainly room for improvement. From my recent experiments I see our DWARF2 line number reader corresponds almost exactly with GNU readelf, and the frame reader is fairly good too, except it doesn't handle DW_CFA_val_expression et al, which are very rare. I would be happy to at least have a look at this if Hari can get me a way to reproduce the problem. It may even be that there is no debug info for this function, in which case there's nothing we can do, or it could be something simple and easy to fix. J |
|
From: Hari N. <ha...@al...> - 2007-02-21 01:20:38
|
Julian Seward wrote: > I would be happy to at least have a look at this if Hari can get me > a way to reproduce the problem. It may even be that there is no debug info > for this function, in which case there's nothing we can do, or it could > be something simple and easy to fix. I should have mentioned that this is on an "AMD Opteron(tm) Processor 252", if that is a useful piece of information. I compiled l_rad.f with the -g flag, so there should be debugging info. I'll try running a simpler test case using a driver for l_rad only to see if I can resolve l_rad's source lines. If you have any suggestions for any other information that will be useful I will try to supply it. Thanks to all of you! Valgrind is a terrific product. Hari |
|
From: Nicholas N. <nj...@cs...> - 2007-02-21 04:36:18
|
On Wed, 21 Feb 2007, Julian Seward wrote: >> I currently see two main problems faced by Valgrind users: >> >> (1) unimplemented opcodes on AMD64; >> (2) debug info problems. >> >> (1) is slowly improving, although the current strategy is usually "wait for >> a complaint, then uncomment the code that already handles it". It would be >> nice to be more proactive for the cases where we already have code to >> handle it. > > Yes. I think 3.2.3 is much better than previous versions in that > respect. AFAICS we handle all insns produced by gcc and icc; the > recent failures (http://bugs.kde.org/show_bug.cgi?id=141790) were > generated by a compiler of hardware descriptions. My implement-on- > demand strategy is motivated by not wanting to implement opcodes > without at least some sort of test case; tracking down incorrectly > implemented, rarely used opcodes after the fact is really no fun. > (I had to do that for x86 'fsincos', for example). Yep, by "proactive" I meant "writing our own test cases", eg. by augmenting Tom's insn_* tests, rather than waiting for a report. But it's easy for me to say this, and I know I won't be doing it myself, so I'll leave it at that :) Nick |
|
From: Hari N. <ha...@al...> - 2007-02-21 01:56:21
|
I have access to the Absoft f90 compiler as well as NAG f95. I can browse the source with the Absoft compiled binary but not with the NAG. I think that g95 and f95 translate Fortran code to C before compilation and that f90 does not. Might that be important? Anyway, since I can use f90, my immediate problem seems to be solved. If I can be of any assistance to anyone for debugging this behavior, I will be happy to help. Thanks, Hari |
|
From: Julian S. <js...@ac...> - 2007-02-23 10:29:15
|
On Wednesday 21 February 2007 01:56, Hari Nair wrote: > I have access to the Absoft f90 compiler as well as NAG f95. I can > browse the source with the Absoft compiled binary but not with the NAG. > > I think that g95 and f95 translate Fortran code to C before compilation > and that f90 does not. Might that be important? Not really. Without having the problem executable directly available, there is no way for us to establish what the problem is. J |
|
From: Hari N. <ha...@al...> - 2007-02-23 15:26:55
|
On Feb 23, 2007, at 2:27 AM, Julian Seward wrote: > On Wednesday 21 February 2007 01:56, Hari Nair wrote: >> I have access to the Absoft f90 compiler as well as NAG f95. I can >> browse the source with the Absoft compiled binary but not with the >> NAG. >> >> I think that g95 and f95 translate Fortran code to C before >> compilation >> and that f90 does not. Might that be important? > > Not really. Without having the problem executable directly available, > there is no way for us to establish what the problem is. If you don't need to actually run it, I can send you a binary. Running it requires a large number of ancillary files. I've found that Absoft f90 doesn't resolve all of the source code lines either. It does a much better job that the other two compilers, but I get the "no source available message" for some routines. In Fortran 90, a module can contain a number of different routines. I find that callgrind can see the source inside some of the routines, but not others, in the same module. Hari |
|
From: Julian S. <js...@ac...> - 2007-02-23 16:08:32
|
> If you don't need to actually run it, I can send you a binary. > Running it requires a large number of ancillary files. If it's a binary then I will need to run it to get V to read its symbols. A better solution would be if you could compile the relevant code into a shared object (.so file) since then I can just dlopen it and V will do the right thing - I don't actually need to run the code in the .so. J |