|
From: Josef W. <Jos...@gm...> - 2004-07-05 19:08:32
|
Hi Nick, On Monday 05 July 2004 11:00, Nicholas Nethercote wrote: > I've made these changes. The improvement are pretty big. I've include > the change log below, check out the general improvements. They arose > because I split one messy data structure, which was (I now realise) > serving two distinct purposes, into two much cleaner data structures. I always thought that this mix up was because of getting better cache behaviour (more spatial locality) of cachegrind ... which obviously is not true? > I'm pretty keen to commit these changes soon, in time for 2.1.2. It would > be good to see them in Calltree too, but I'm not sure how easy that will > be. I will see what I can do. One problem is that Calltree has an option to dump out events by their instruction address (actually object file offset), and KCachegrind uses this for annotated dissassembler. I don't want to loose this feature, especially as I think it is important for the user to be able to look at this detail level if needed. But your separation is a very good thing. I'm sure this will give similar simplifications. Instead of a CC per distinct source line, can't we use a CC per (obj_file/instruction offset) which is mapped to a distinct source line? I know that this makes things a little bit more complex, but still the CCs don't have to be discarded on unmapping code segments. Is a CC per distinct source line enough? E.g. all initialisation functions of shared libraries, where you don't have source code, will be mapped into one, as the functions are called the same (_init). > - Previously, when code was unloaded all its hit/miss counts were stuck > in a single "discard" CC, and so that code would not be annotated. Now > this code is profiled and annotatable just like all other code. I actually do almost nothing at all when a code segment is discarded, and I know this is buggy and leaking, just to be able to dump data for discarded code segments. Your solution (the structure separation) is very good here. > - Source code size is 27% smaller. cg_main.c is now 1494 lines, down > from 2174. Some (1/3?) of this is from removing the special handling > of JIFZ and general compaction, but most is from the data structure > changes. Happily, a lot of the removed code was nasty. Then I will get rid of the JIFZ code, too. > - cachegrind.out.pid size is about 90+% smaller(!) Annotation time is > accordingly much faster. Doing cost-centres at the level of source > code lines rather than instructions makes a big difference, since > there's typically 2--3 instructions per source line. Even better, > when debug info is not present, entire functions (and even files) get > collapsed into a single "???" CC. (This behaviour is no different > to what happened before, it's just the collapsing used to occur in the > annotation script, rather than within Cachegrind.) This is a big win > for stripped libraries. I actually do some compression at dump time, but it is much uglier... > - Speed is not much changed -- the changes were not in the intensive > parts, so the only likely change is a cache improvement due to using > less memory. SPEC experiments go -3 -- 10% faster, with the "average" > being unchanged or perhaps a tiny bit faster. That's good to see. > - Removed "fi" and "fe" handling from cg_annotate, no longer needed due > to neatening of the CC-table. What actually was the purpose here? Why wasn't "fn=" in all cases enough? Josef |