|
From: Nicholas N. <nj...@cs...> - 2005-09-13 21:47:49
|
On Sun, 4 Sep 2005, John Reiser wrote: > In order for valgrind to tell other tools (debugger, profiler, ...) > about the modules it loads (stage2, vgtool_memcheck, ld-linux, ...), > I have had success using the attached patches for valgrind-3.0.1 > (co-resident.patch, 4370 bytes.) > > Synopsys (refer to <elf.h> and <link.h>): > Have a PT_DYNAMIC that contains a DT_DEBUG with a datum that points > to a struct r_debug. For each module loaded: add a struct link_map > onto the front of the list that is maintained through .l_next, .l_prev. > Use .r_brk as a function pointer, and call it as appropriate: > with .r_state=RT_ADD just before loading some modules, > and with .r_state=RT_CONSISTENT just after loading some modules. > Set .r_brk to the address of a no-op subroutine if 0==.r_brk. > A debugger or co-resident tool may intercept this subroutine, > so make it easy to breakpoint or overwrite (allow at least 16 bytes.) > Using names _r_debug and _dl_debug_state() may help older [or lazy] > debuggers that don't understand the PT_DYNAMIC + DT_DEBUG protocol. > > I have used this with a subroutine profiler that is in development. > The current state is sufficient to measure memcheck on simple programs. > Example output is http://BitWagon.com/valgrind/valgrind.tsprof.bz2 > (76636 bytes; expands to 497089 bytes, 6084 lines.) The profile output looks really interesting. There's an explanation of the format at http://bitwagon.com/tsprof/tsprof.html. Here's an interesting snippet: Black box 4.766 seconds ('+' excludes Recursive entries) count ticks millisec @millisec % [id] module:name [...] 2599 9055199077 4527.5995 1.7421 95 [4] 2:LibVEX_Translate 2599 37727305 18.8637 0.0073 0 [4] 2:LibVEX_Translate (self) 2599 2641400997 1320.7005 0.5082 29 [4]->[5] 2:doRegisterAllocation 10396 1482524941 741.2625 0.0713 16 [4]->[6] 2:sanityCheckIRBB 2599 905329930 452.6650 0.1742 10 [4]->[11] 2:iselBB_AMD64 2599 856546420 428.2732 0.1648 9 [4]->[13] 2:do_iropt_BB 2599 689222695 344.6113 0.1326 8 [4]->[19] 5:vgMemCheck_instrument 2599 688523386 344.2617 0.1325 8 [4]->[20] 2:do_treebuild_BB 2599 490806029 245.4030 0.0944 5 [4]->[17] 2:cprop_BB 185354 445410431 222.7052 0.0012 5 [4]->[26] 2:emit_AMD64Instr 2599 373706079 186.8530 0.0719 4 [4]->[33] 2:bb_to_IR 7797 335553017 167.7765 0.0215 4 [4]->[27] 2:do_deadcode_BB 2599 93003070 46.5015 0.0179 1 [4]->[87] 2:vg_SP_update_pass 2599 14737922 7.3690 0.0028 0 [4]->[211] 2:getAllocableRegs_AMD64 5198 706855 0.3534 0.0001 0 [4]->[506] 2:vexClearTEMP This shows that 95% of time accounted for was spent in LibVEX_Translate, and that 29% of that 95% was in doRegisterAllocation, and 16% of that 95% was in sanitycheckIRBB. This is a small program, it would be interesting to see what the numbers are like for a bigger program. Presumably the translation costs would drop as a proportion of total time. John, will generated code be covered by this profile? I see this entry: 1 994533995 497.2670 497.2670 10 [9] 2:0x70057c53 1 0 0.0000 0.0000 0 [9] 2:0x70057c53 (self) Is this a generated basic block? Also, Valgrind seg faulted during the run -- is this due to a bad interaction with tsprof? Nick |