|
From: Nicholas N. <nj...@ca...> - 2004-07-05 09:00:23
|
On Fri, 2 Jul 2004, Nicholas Nethercote wrote: > I'm currently looking at rejigging Cachegrind's data structures. I think I > can solve the missing-info-for-unloaded-code and also significantly simplify > its code. I've made these changes. The improvement are pretty big. I've include the change log below, check out the general improvements. They arose because I split one messy data structure, which was (I now realise) serving two distinct purposes, into two much cleaner data structures. The diff is at: www.cl.cam.ac.uk/~njn25/cg.diff Enough code has changed that the diff is hard to read, the new cg_main.c is here: www.cl.cam.ac.uk/~njn25/cg_main.c [I tried attaching them but the mailing list software didn't like them, they were too big.] I'm pretty keen to commit these changes soon, in time for 2.1.2. It would be good to see them in Calltree too, but I'm not sure how easy that will be. N Completely overhauled Cachegrind's data structures. With the new scheme, there are two main structures: 1. The CC table holds a cost centre (CC) for every distinct source code line, as found using debug/symbol info. It's arranged by files, then functions, then lines. 2. The instr-info-table holds certain important pieces of info about each instruction -- instr_addr, instr_size, data_size, its line-CC. A pointer to the instr's info is passed to the simulation functions, which is shorter and quicker than passing the pieces individually. This is nice and simple. Previously, there was a single data structure (the BBCC table) which mingled the two purposes (maintaining CCs and caching instruction info). The CC stuff was done at the level of instructions, and there were different CC types for different kinds of instructions, and it was pretty yucky. It's now much cleaner. As a result, we have the following general improvements: - Previously, when code was unloaded all its hit/miss counts were stuck in a single "discard" CC, and so that code would not be annotated. Now this code is profiled and annotatable just like all other code. - Source code size is 27% smaller. cg_main.c is now 1494 lines, down from 2174. Some (1/3?) of this is from removing the special handling of JIFZ and general compaction, but most is from the data structure changes. Happily, a lot of the removed code was nasty. - Object code size (vgskin_cachegrind.so) is 15% smaller. - cachegrind.out.pid size is about 90+% smaller(!) Annotation time is accordingly much faster. Doing cost-centres at the level of source code lines rather than instructions makes a big difference, since there's typically 2--3 instructions per source line. Even better, when debug info is not present, entire functions (and even files) get collapsed into a single "???" CC. (This behaviour is no different to what happened before, it's just the collapsing used to occur in the annotation script, rather than within Cachegrind.) This is a big win for stripped libraries. - Memory consumption is about 10--20% less, due to fewer CCs. - Speed is not much changed -- the changes were not in the intensive parts, so the only likely change is a cache improvement due to using less memory. SPEC experiments go -3 -- 10% faster, with the "average" being unchanged or perhaps a tiny bit faster. I've tested it moderately thoroughly, it seems to give the same results as the old version where it should. Some particularly nice changes that happened: - No longer need an instrumentation prepass; this is because CCs are not stored grouped by BB, and they're all the same size now (which makes various code much simpler than before). - The actions to take when a BB translation is discarded (due to the translation table getting full) are much easier -- just chuck all the instr-info nodes for the BB, without touching the CCs. - Dumping the cachegrind.out.pid file at the end is much simpler, just because the CC data structure is much neater. Some other, specific changes: - Removed the JIFZ special handling, which never did what it was intended to do and just complicated things. This changes the results for REP-prefixed instructions very slightly, but it's not important. - Abbreviated the FP/MMX/SSE crap by being slightly laxer with size checking -- will be caught later if there's any problems anyway - Removed "fi" and "fe" handling from cg_annotate, no longer needed due to neatening of the CC-table. - Factorised out some code a bit, so fewer monolithic slabs. - Just improved formatting and compacted code in general in a few places. - Removed the long-commented-out sanity checking code at the bottom Phew. |