|
From: Evan <res...@gm...> - 2009-06-15 20:06:54
|
Two questions: First - Any suggestions as to where to look when callgrind only finds hexadecimal strings as function names? For instance, should I expect this a problem in the way I'm building the program, running it, passing options to callgrind...? Second - I'm writing out decompressed callgrind data files and then parsing them with a python script to grab total cost. This seems to disagree with kcachegrind when there are multiple function definitions. That is, two points in the file when fn='functionname' 0 cost# . . etc should I ignore these duplicates? Suggestions much appreciated. Evan |
|
From: Evan <res...@gm...> - 2009-06-15 21:30:30
|
> It means that the debug reader in Valgrind was not able to detect > a symbol name for a given instruction address, ie. debug info seems > to be missing. This has nothing to do with missing command line > options. Build your program with debug info. > I should have said that I am building with debug info, and some names come back hex while others do not. For instance: ----------------------- fn=orte_gpr_replica_init 0 20 fn=0x0000000000005c60 0 6 ----------------------- are listed alongside one another. > Are you saying there are multiple such entries with the same function name? > KCachegrind's behavior then would be to sum up the costs (according to > the format definition). > However, I can not think of a reason why Callgrind should print costs for > the same function multiple times in one dump. Do you have an example > when this happens? Here's an example: This entry occurs: ----------------------- fn=PMPI_Recv 0 61074 ----------------------- followed by this one: ----------------------- fn=PMPI_Recv 0 9477 cob=/usr/lib/openmpi/lib/openmpi/mca_pml_ob1.so cfi=??? cfn=mca_pml_ob1_recv calls=1053 0 0 44144882 0 17901 ----------------------- I should also possibly have mentioned that I'm working with MPI, though it's unclear to me how that would cause this. Best, Evan |
|
From: Ashley P. <as...@pi...> - 2009-06-15 21:51:56
|
On Mon, 2009-06-15 at 16:30 -0500, Evan wrote: > > This entry occurs: > > ----------------------- > fn=PMPI_Recv > 0 61074 > ----------------------- > > followed by this one: > ----------------------- > fn=PMPI_Recv > 0 9477 > cob=/usr/lib/openmpi/lib/openmpi/mca_pml_ob1.so > cfi=??? > cfn=mca_pml_ob1_recv > calls=1053 0 > 0 44144882 > 0 17901 > ----------------------- > > I should also possibly have mentioned that I'm working with MPI, > though it's unclear to me how that would cause this. For those that don't know the PMPI_* functions are part of the MPI profiling interface, basically each MPI function is really a PMPI_ function with a weak MPI_ alias, it's described here among other places: http://www.netlib.org/utk/papers/mpi-book/node190.html#SECTION00942100000000000000 I've no real idea if it's relevant to this discussion but it's rather unusual so I thought I'd mention it in case you aren't aware. Ashley, -- Ashley Pittman, Bath, UK. Padb - A parallel job inspection tool for cluster computing http://padb.pittman.org.uk |
|
From: Josef W. <Jos...@gm...> - 2009-06-15 22:03:30
|
On Monday 15 June 2009, Evan wrote: > > It means that the debug reader in Valgrind was not able to detect > > a symbol name for a given instruction address, ie. debug info seems > > to be missing. This has nothing to do with missing command line > > options. Build your program with debug info. > > > > I should have said that I am building with debug info, and some names > come back hex while others do not. For instance: > ----------------------- > fn=orte_gpr_replica_init > 0 20 > > fn=0x0000000000005c60 > 0 6 > ----------------------- > > are listed alongside one another. Any chance this is from a shared library without debug info? Then only function names can be detected which are exported symbols (ie. which can be called from the outside), internal ones would get you the hex number. > > Are you saying there are multiple such entries with the same function name? > > KCachegrind's behavior then would be to sum up the costs (according to > > the format definition). > > However, I can not think of a reason why Callgrind should print costs for > > the same function multiple times in one dump. Do you have an example > > when this happens? > > > Here's an example: > > This entry occurs: > > ----------------------- > fn=PMPI_Recv > 0 61074 > ----------------------- > > followed by this one: > ----------------------- > fn=PMPI_Recv > 0 9477 Not sure about this one. Theoretically, the same symbol could appear in multiple shared libs (and linker does not complain because of a weak symbol?). Running with "--dump-instr=yes" should print the instruction offsets for above output. Can you check this out? Josef |
|
From: Evan <res...@gm...> - 2009-07-01 14:49:07
|
It took me some time to get back to this. I ran with the --dump-instr=yes option, and got the following output off PMPI_Recv: fn=PMPI_Recv 0x5f870 0 48 0x5f875 0 48 0x5f87a 0 48 0x5f87d 0 48 0x5f882 0 48 0x5f887 0 48 ..etc fn=PMPI_Recv 0x1d1a6 1047 48 0x1d1a7 1047 48 0x1d1aa 1047 48 ...etc I presume this is what you were referring to? Any further suggestions? Best, Evan On Mon, Jun 15, 2009 at 5:01 PM, Josef Weidendorfer < Jos...@gm...> wrote: > On Monday 15 June 2009, Evan wrote: > > > It means that the debug reader in Valgrind was not able to detect > > > a symbol name for a given instruction address, ie. debug info seems > > > to be missing. This has nothing to do with missing command line > > > options. Build your program with debug info. > > > > > > > I should have said that I am building with debug info, and some names > > come back hex while others do not. For instance: > > ----------------------- > > fn=orte_gpr_replica_init > > 0 20 > > > > fn=0x0000000000005c60 > > 0 6 > > ----------------------- > > > > are listed alongside one another. > > Any chance this is from a shared library without debug info? Then > only function names can be detected which are exported symbols (ie. which > can be called from the outside), internal ones would get you the hex > number. > > > > Are you saying there are multiple such entries with the same function > name? > > > KCachegrind's behavior then would be to sum up the costs (according to > > > the format definition). > > > However, I can not think of a reason why Callgrind should print costs > for > > > the same function multiple times in one dump. Do you have an example > > > when this happens? > > > > > > Here's an example: > > > > This entry occurs: > > > > ----------------------- > > fn=PMPI_Recv > > 0 61074 > > ----------------------- > > > > followed by this one: > > ----------------------- > > fn=PMPI_Recv > > 0 9477 > > Not sure about this one. > Theoretically, the same symbol could appear in multiple shared libs > (and linker does not complain because of a weak symbol?). > Running with "--dump-instr=yes" should print the instruction offsets > for above output. Can you check this out? > > Josef > |
|
From: Josef W. <Jos...@gm...> - 2009-07-01 15:07:45
|
On Wednesday 01 July 2009, Evan wrote:
> It took me some time to get back to this. I ran with the --dump-instr=yes
> option, and got the following output off PMPI_Recv:
>
>
> fn=PMPI_Recv
> 0x5f870 0 48
> 0x5f875 0 48
> 0x5f87a 0 48
> 0x5f87d 0 48
> 0x5f882 0 48
> 0x5f887 0 48
> ..etc
>
> fn=PMPI_Recv
> 0x1d1a6 1047 48
> 0x1d1a7 1047 48
> 0x1d1aa 1047 48
> ...etc
>
>
> I presume this is what you were referring to?
Just to get the context right: the problem was that the same symbol
occurs multiple times in Callgrinds output for your program. And you
wanted to know how/whether this can happen.
In the above, the hex numbers are instruction offsets in the shared
library/binary the code lies in. As these are quite distinctive, it is
obvious that you call different code pieces where the debug information
claims that both belong to the same symbol name.
How such a result can be produced by a compiler is another question.
It could be a compiler bug, or it could be about weak symbol handling
by the static/runtime linker, or it could be because different functions
with compilation unit scope ("static" in C) have the same name, or it
could be an result of dlopen/dlsym usage, and so on...
All these possible cases lead to results which can be misleading when
the visualization tool thinks that all this code belongs to the same
function.
Did I miss your original question?
Josef
> Any further suggestions?
>
> Best,
> Evan
>
> On Mon, Jun 15, 2009 at 5:01 PM, Josef Weidendorfer <
> Jos...@gm...> wrote:
>
> > On Monday 15 June 2009, Evan wrote:
> > > > It means that the debug reader in Valgrind was not able to detect
> > > > a symbol name for a given instruction address, ie. debug info seems
> > > > to be missing. This has nothing to do with missing command line
> > > > options. Build your program with debug info.
> > > >
> > >
> > > I should have said that I am building with debug info, and some names
> > > come back hex while others do not. For instance:
> > > -----------------------
> > > fn=orte_gpr_replica_init
> > > 0 20
> > >
> > > fn=0x0000000000005c60
> > > 0 6
> > > -----------------------
> > >
> > > are listed alongside one another.
> >
> > Any chance this is from a shared library without debug info? Then
> > only function names can be detected which are exported symbols (ie. which
> > can be called from the outside), internal ones would get you the hex
> > number.
> >
> > > > Are you saying there are multiple such entries with the same function
> > name?
> > > > KCachegrind's behavior then would be to sum up the costs (according to
> > > > the format definition).
> > > > However, I can not think of a reason why Callgrind should print costs
> > for
> > > > the same function multiple times in one dump. Do you have an example
> > > > when this happens?
> > >
> > >
> > > Here's an example:
> > >
> > > This entry occurs:
> > >
> > > -----------------------
> > > fn=PMPI_Recv
> > > 0 61074
> > > -----------------------
> > >
> > > followed by this one:
> > > -----------------------
> > > fn=PMPI_Recv
> > > 0 9477
> >
> > Not sure about this one.
> > Theoretically, the same symbol could appear in multiple shared libs
> > (and linker does not complain because of a weak symbol?).
> > Running with "--dump-instr=yes" should print the instruction offsets
> > for above output. Can you check this out?
> >
> > Josef
> >
>
|