|
From: Nicholas N. <nj...@ca...> - 2004-07-02 12:55:59
|
Hi, People complain when Memcheck gives them useless error leak messages because the leaks occur in a .so which has been unmapped. Cachegrind has a similar problems -- all stats for an unmapped .so file get dumped into a single "discarded" pile, and cannot be used for annotation. The underlying problem is that we are using an instruction's address as the key for identifying it. This only works while the code is in memory; once it gets unloaded we can't rely on it any more. A better way of identifying instructions is with a (obj_file, obj_offset) pair. That will AFAICT uniquely identify any static instruction. It's not good for dynamically generated instructions -- I don't see how these can be uniquely identified without bringing time into the equation and that would be pretty awful -- but that probably doesn't matter because they don't have any debug info anyway, and so can be lumped together in an "other" category. If ExeContexts were changed to use (obj_file, obj_offset) pairs instead of just instructions, Memcheck (or the core) could, when printing the leaks, reload the debug info of any obj_file that had been unmapped. A similar thing could be done with Cachegrind to avoid the "discarded" pile, and it might actually make Cachegrind simpler overall. Anyway, I'm just thinking aloud. N |
|
From: Jeremy F. <je...@go...> - 2004-07-02 16:06:47
|
On Fri, 2004-07-02 at 13:55 +0100, Nicholas Nethercote wrote: > A better way of identifying instructions is with a (obj_file, obj_offset) > pair. That will AFAICT uniquely identify any static instruction. Yeah, that's not a bad idea. There's a few complications though. Do you actually mean byte offset into the object file? That's a bit coarse, since you'd then have to work hard to map that back into a segment and get symtab information. It also makes it hard to work out the mapped address if you just have a file offset+baseaddr. If you recorded (file, segment, offset), you could address these. And obviously you also need the virtual address of the instruction for the common case of the .so has not been unloaded. If you store file as (path, base_address), you know where the object was mapped, and you can work everything out from that. J |
|
From: Nicholas N. <nj...@ca...> - 2004-07-02 16:22:21
|
On Fri, 2 Jul 2004, Jeremy Fitzhardinge wrote: > On Fri, 2004-07-02 at 13:55 +0100, Nicholas Nethercote wrote: >> A better way of identifying instructions is with a (obj_file, obj_offset) >> pair. That will AFAICT uniquely identify any static instruction. > > Yeah, that's not a bad idea. There's a few complications though. > > Do you actually mean byte offset into the object file? That's a bit > coarse, since you'd then have to work hard to map that back into a > segment and get symtab information. It also makes it hard to work out > the mapped address if you just have a file offset+baseaddr. If you > recorded (file, segment, offset), you could address these. Er, I'm not sure what I mean, I've forgotten the difference between object files and segments. > And obviously you also need the virtual address of the instruction for > the common case of the .so has not been unloaded. If you store file as > (path, base_address), you know where the object was mapped, and you can > work everything out from that. Ok, then let's pretend that's what I meant. -- Actually, even this still isn't really what we want. The whole point of this is to get back to file/function/line info. If we recorded that straight away in error messages instead of instruction addresses, the debug/symbol info would be nice and fresh and there wouldn't be any problems with unloading, and no need to fiddle around with object/segment offset and maybe reload symbols later. Of course, the reason we currently don't do this is because it would vastly increase the size of each ExeContext. Unless we worked out a clever scheme whereby it didn't. I'm thinking something where every unique code location recorded in an ExeContext is stored in a way that it's never duplicated, and the same for all file/function names. I'm currently looking at rejigging Cachegrind's data structures. I think I can solve the missing-info-for-unloaded-code and also significantly simplify its code. The key is a tri-level data structure that looks like: table(filename, table(fn_name, table(line_num, CC))) where CC is the cost-centre stored for each instruction. Instruction addresses are translated immediately to file/fn/line info, so there's no staleness. Also, all the relevant filenames and fn_names are stored only once, which is nice. (This structure is also necessary due to the way Cachegrind dumps its info at the end.) N |
|
From: Jeremy F. <je...@go...> - 2004-07-02 17:06:56
|
On Fri, 2004-07-02 at 17:21 +0100, Nicholas Nethercote wrote: > On Fri, 2 Jul 2004, Jeremy Fitzhardinge wrote: > > > On Fri, 2004-07-02 at 13:55 +0100, Nicholas Nethercote wrote: > >> A better way of identifying instructions is with a (obj_file, obj_offset) > >> pair. That will AFAICT uniquely identify any static instruction. > > > > Yeah, that's not a bad idea. There's a few complications though. > > > > Do you actually mean byte offset into the object file? That's a bit > > coarse, since you'd then have to work hard to map that back into a > > segment and get symtab information. It also makes it hard to work out > > the mapped address if you just have a file offset+baseaddr. If you > > recorded (file, segment, offset), you could address these. > > Er, I'm not sure what I mean, I've forgotten the difference between object > files and segments. Well, an ELF file looks (very approximately) like this 0: ELF Header PHeader -> A PHeader -> C PHeader -> B A: Segment B: Segment C: Segment Each Segment is a chunk mmaped out of the file, from a particular file offset. The PHeaders contains the mapping between virtual address and file offset. There's also a lot of other things in the ELF file, so the net result is that the file offset doesn't have much to do with the mapped virtual address. > I'm currently looking at rejigging Cachegrind's data structures. I think > I can solve the missing-info-for-unloaded-code and also significantly > simplify its code. The key is a tri-level data structure that looks like: > > table(filename, table(fn_name, table(line_num, CC))) > > where CC is the cost-centre stored for each instruction. Instruction > addresses are translated immediately to file/fn/line info, so there's no > staleness. Also, all the relevant filenames and fn_names are stored only > once, which is nice. (This structure is also necessary due to the way > Cachegrind dumps its info at the end.) The trouble with using file/line info is that, obviously, a lot of code doesn't have that info, and also that there can be more than one CC per code line. J |
|
From: Nicholas N. <nj...@ca...> - 2004-07-02 17:24:04
|
On Fri, 2 Jul 2004, Jeremy Fitzhardinge wrote: >> I'm currently looking at rejigging Cachegrind's data structures. I think >> I can solve the missing-info-for-unloaded-code and also significantly >> simplify its code. The key is a tri-level data structure that looks like: >> >> table(filename, table(fn_name, table(line_num, CC))) >> >> where CC is the cost-centre stored for each instruction. Instruction >> addresses are translated immediately to file/fn/line info, so there's no >> staleness. Also, all the relevant filenames and fn_names are stored only >> once, which is nice. (This structure is also necessary due to the way >> Cachegrind dumps its info at the end.) > > The trouble with using file/line info is that, obviously, a lot of code > doesn't have that info, and also that there can be more than one CC per > code line. But think about how the per-instr CCs are actually used -- they just get mapped onto file/func/line info at the end anyway. So nothing changes except there are fewer CCs which saves memory and reduces the size of the output file. N |
|
From: Nicholas N. <nj...@ca...> - 2004-07-04 21:37:22
|
On Fri, 2 Jul 2004, Nicholas Nethercote wrote:
> I'm currently looking at rejigging Cachegrind's data structures. I think I
> can solve the missing-info-for-unloaded-code and also significantly simplify
> its code.
I've made these changes. The improvement are pretty immense. I've
include the change log below, check out the general improvements. They
arose because I split one messy data structure, which was (I now realise)
serving two distinct purposes, into two much cleaner data structures.
I've attached the diff, plus the new cg_main.c -- enough code changed that
the diff is hard to read.
I'm pretty keen to commit these changes now, in time for 2.1.2. It would
be good to see them in Calltree too, but I'm not sure how easy that will
be.
N
Completely overhauled Cachegrind's data structures. With the new
scheme, there are two main structures:
1. The CC table holds a cost centre (CC) for every distinct source code
line, as found using debug/symbol info. It's arranged by files, then
functions, then lines.
2. The instr-info-table holds certain important pieces of info about
each instruction -- instr_addr, instr_size, data_size, its line-CC.
A pointer to the instr's info is passed to the simulation functions,
which is shorter and quicker than passing the pieces individually.
This is nice and simple. Previously, there was a single data structure
(the BBCC table) which mingled the two purposes (maintaining CCs and
caching instruction info). The CC stuff was done at the level of
instructions, and there were different CC types for different kinds of
instructions, and it was pretty yucky. It's now much cleaner.
As a result, we have the following general improvements:
- Previously, when code was unloaded all its hit/miss counts were stuck
in a single "discard" CC, and so that code would not be annotated. Now
this code is profiled and annotatable just like all other code.
- Source code size is 27% smaller. cg_main.c is now 1494 lines, down
from 2174. Some (1/3?) of this is from removing the special handling
of JIFZ and general compaction, but most is from the data structure
changes. Happily, a lot of the removed code was nasty.
- Object code size (vgskin_cachegrind.so) is 15% smaller.
- cachegrind.out.pid size is about 90+% smaller(!) Annotation time is
accordingly much faster. Doing cost-centres at the level of source
code lines rather than instructions makes a big difference, since
there's typically 2--3 instructions per source line. Even better,
when debug info is not present, entire functions (and even files) get
collapsed into a single "???" CC. (This behaviour is no different
to what happened before, it's just the collapsing used to occur in the
annotation script, rather than within Cachegrind.) This is a big win
for stripped libraries.
- Memory consumption is about 10--20% less, due to fewer CCs.
- Speed is not much changed -- the changes were not in the intensive
parts, so the only likely change is a cache improvement due to using
less memory. SPEC experiments go -3 -- 10% faster, with the "average"
being unchanged or perhaps a tiny bit faster.
I've tested it moderately thoroughly, it seems to give the same results as
the old version where it should.
Some particularly nice changes that happened:
- No longer need an instrumentation prepass; this is because CC are not
stored grouped by BB, and they're all the same size now (which makes
various code much simpler than before).
- The actions to take when a BB translation is discarded (due to the
translation table getting full) are much easier -- just chuck all the
instr-info nodes for the BB, without touching the CCs.
- Dumping the cachegrind.out.pid file at the end is much simpler, just
because the CC data structure is much neater.
Some other, specific changes:
- Removed the JIFZ special handling, which never did what it was
intended to do and just complicated things. This changes the results
for REP-prefixed instructions very slightly, but it's not important.
- Abbreviated the FP/MMX/SSE crap by being slightly laxer with size
checking -- will be caught later if there's any problems anyway
- Removed "fi" and "fe" handling from cg_annotate, no longer needed due
to neatening of the CC-table.
- Factorised out some code a bit, so fewer monolithic slabs.
- Just improved formatting and compacted code in general in a few
places.
- Removed the long-commented-out sanity checking code at the bottom
Phew.
|
|
From: Nicholas N. <nj...@ca...> - 2004-07-06 08:48:11
|
[Sorry for the dup, my original too-big message must have been approved by the list moderator] |