|
From: Konstantin S. <kon...@gm...> - 2008-01-11 16:04:44
|
Dear valgrind developers,
How do you usually profile valgrind tools?
I tried to profile a particularly long run of helgrind with oprofile.
The test runs ~1 second on a real CPU and fails after few hours under
helgrind.
The flat profile (after running for ~20 minutes and then pressing ^C) is:
8012128 56.8375 vgHelgrind_nextIterFM
3130689 22.2089 shadow_mem_make_NoAccess
1777779 12.6115 is_sane_Lock_BASE
241600 1.7139 cacheline_wback
194274 1.3782 avl_find_node
172924 1.2267 cacheline_fetch
I wanted to get the callgraph profile with oprofile, but it did not work :(
Would it be possible to run helgrind under callgrind? Did you try gprof? Any
other suggestions?
The output of -v (after 10 minutes run) follows,
Thanks,
--kcc
WordSet "univ_tsets":
addTo 22480424 (103026 uncached)
delFrom 0 (0 uncached)
union 6
intersect 0 (0 uncached) [nb. incl isSubsetOf]
minus 0 (0 uncached)
elem 0
doubleton 690570
isEmpty 0
isSingleton 0
anyElementOf 0
isSubsetOf 0
WordSet "univ_lsets":
addTo 23193877 (1663 uncached)
delFrom 264477 (1777 uncached)
union 0
intersect 22480375 (2546 uncached) [nb. incl isSubsetOf]
minus 1103497 (3271 uncached)
elem 38718
doubleton 0
isEmpty 5564937
isSingleton 0
anyElementOf 111388
isSubsetOf 1
WordSet "univ_laog":
addTo 15126 (1059 uncached)
delFrom 4 (3 uncached)
union 0
intersect 0 (0 uncached) [nb. incl isSubsetOf]
minus 0 (0 uncached)
elem 0
doubleton 110
isEmpty 0
isSingleton 0
anyElementOf 0
isSubsetOf 0
hbefore: 2,947,164 queries
hbefore: 2,579,162 cache 0 hits
hbefore: 366,749 cache > 0 hits
hbefore: 1,253 graph searches
hbefore: 1,253 of which slow
hbefore: 0 stack high water mark
hbefore: 1 cache invals
hbefore: 3,986,562 probes
segments: 115 Segment objects allocated
locksets: 376 unique lock sets
threadsets: 1,824 unique thread sets
univ_laog: 276 unique lock sets
L(ast)L(ock) map: 111,388 inserts (111382 map size)
LockN-to-P map: 0 queries (0 map size)
string table map: 3 queries (2 map size)
LAOG: 110 map size
LAOG exposition: 220 map size
locks: 20,541 acquires, 20,539 releases
sanity checks: 1
msm: 940,233,639 134,911,383 rd/wr_Excl_nochange
msm: 2,247,835 8,765 rd/wr_Excl_transfer
msm: 689,209 1,355 rd/wr_Excl_to_ShR/ShM
msm: 21,895,558 285 rd/wr_ShR_to_ShR/ShM
msm: 568,124 16,407 rd/wr_ShM_to_ShM
msm: 14,346,151 73,185,872 rd/wr_New_to_Excl
msm: 15,436,322 9,849,548 rd/wr_NoAccess
secmaps: 152,486 allocd (1,249,165,312 g-a-range)
linesZ: 39,036,416 allocd ( 936,873,984 bytes occupied)
linesF: 108,317 allocd ( 14,297,844 bytes occupied)
secmaps: 9,910,787 iterator steppings
cache: 1,713,992,037 totrefs (127,941,812 misses)
cache: 127,301,323 Z-fetch, 640,489 F-fetch
cache: 127,174,345 Z-wback, 701,931 F-wback
cache: 5 invals, 4 flushes
cline: 127,941,812 normalises
cline: reads 8/4/2/1: 641,284,727 123,701,933 9,885,381
223,364,501
cline: writes 8/4/2/1: 178,916,609 35,642,116 321,318
3,570,498
cline: sets 8/4/2/1: 464,389,946 52,889 51,272
58,262
cline: get1s 5,137, copy1s 5,136
cline: splits: 8to4 893,992 4to2 1,052,314 2to1
1,350,324
cline: pulldowns: 8to4 37,966,008 4to2 9,893,933 2to1
27,935,348
|
|
From: Josef W. <Jos...@gm...> - 2008-01-12 23:24:34
|
On Friday 11 January 2008, Konstantin Serebryany wrote: > Dear valgrind developers, > > How do you usually profile valgrind tools? > > I tried to profile a particularly long run of helgrind with oprofile. > The test runs ~1 second on a real CPU and fails after few hours under > helgrind. > The flat profile (after running for ~20 minutes and then pressing ^C) is: > > 8012128 56.8375 vgHelgrind_nextIterFM > 3130689 22.2089 shadow_mem_make_NoAccess > 1777779 12.6115 is_sane_Lock_BASE > 241600 1.7139 cacheline_wback > 194274 1.3782 avl_find_node > 172924 1.2267 cacheline_fetch > > I wanted to get the callgraph profile with oprofile, but it did not work :( > > Would it be possible to run helgrind under callgrind? Did you try gprof? Any > other suggestions? Yes. Check out "Self hosting" in README_DEVELOPERS. To run callgrind on a Valgrind tool, you need to specify "--pop-on-jump=yes" for the the outer callgrind. Otherwise, the callgraph will grow linear to run time, producting an "out-of-memory" condition after some time. Note that self hosting is _really_ slow, and it could take a while until you reach the problematic phase. However, you can check out intermediate dumps triggered by "callgrind_control". Josef |
|
From: Nicholas N. <nj...@cs...> - 2008-01-12 23:27:03
|
On Sun, 13 Jan 2008, Josef Weidendorfer wrote: >> Would it be possible to run helgrind under callgrind? Did you try gprof? Any >> other suggestions? > > Yes. > Check out "Self hosting" in README_DEVELOPERS. > > To run callgrind on a Valgrind tool, you need to specify > "--pop-on-jump=yes" for the the outer callgrind. Otherwise, the callgraph > will grow linear to run time, producting an "out-of-memory" condition > after some time. > > Note that self hosting is _really_ slow, and it could take a while until you > reach the problematic phase. However, you can check out intermediate dumps > triggered by "callgrind_control". I've used Cachegrind and OProfile. OProfile is generally better because it is so much faster, and you know the numbers are real times, rather than instruction counts which don't necessarily map exactly to real times. But Cachegrind does give more detailed information. Nick |
|
From: Konstantin S. <kon...@gm...> - 2008-01-15 09:04:49
|
Thanks for the answers! oprofile is indeed very useful, but unfortunately I can't make the call-graph (something wrong with my oprofile setup). Anyway, I found the guilty place in helgrind by manually inserting counters in the code. Details follow in separate thread. --kcc On Jan 13, 2008 2:26 AM, Nicholas Nethercote <nj...@cs...> wrote: > On Sun, 13 Jan 2008, Josef Weidendorfer wrote: > > >> Would it be possible to run helgrind under callgrind? Did you try > gprof? Any > >> other suggestions? > > > > Yes. > > Check out "Self hosting" in README_DEVELOPERS. > > > > To run callgrind on a Valgrind tool, you need to specify > > "--pop-on-jump=yes" for the the outer callgrind. Otherwise, the > callgraph > > will grow linear to run time, producting an "out-of-memory" condition > > after some time. > > > > Note that self hosting is _really_ slow, and it could take a while until > you > > reach the problematic phase. However, you can check out intermediate > dumps > > triggered by "callgrind_control". > > I've used Cachegrind and OProfile. OProfile is generally better because > it > is so much faster, and you know the numbers are real times, rather than > instruction counts which don't necessarily map exactly to real times. > But Cachegrind does give more detailed information. > > Nick > > ------------------------------------------------------------------------- > Check out the new SourceForge.net Marketplace. > It's the best place to buy or sell services for > just about anything Open Source. > > http://ad.doubleclick.net/clk;164216239;13503038;w?http://sf.net/marketplace > _______________________________________________ > Valgrind-developers mailing list > Val...@li... > https://lists.sourceforge.net/lists/listinfo/valgrind-developers > |
|
From: Julian S. <js...@ac...> - 2008-01-15 09:21:26
|
On Tuesday 15 January 2008 10:04, Konstantin Serebryany wrote: > Thanks for the answers! > oprofile is indeed very useful, but unfortunately I can't make the > call-graph (something wrong with my oprofile setup). I think oprofile cannot do call-graph on 64-bit x86, only on 32-bit x86. I have successfully used it w/ call-graph on 32-bit x86, on Helgrind. J |
|
From: Konstantin S. <kon...@gm...> - 2008-01-15 09:31:57
|
You mean it requires 32-bit hardware? or 32-bit-only os? or 32-bit executable? I don't have any 32-bit-only hardware at the moment, so I can't check :) Anyway, I've sent the question to oprofile folks already. --kcc On Jan 15, 2008 12:19 PM, Julian Seward <js...@ac...> wrote: > On Tuesday 15 January 2008 10:04, Konstantin Serebryany wrote: > > Thanks for the answers! > > oprofile is indeed very useful, but unfortunately I can't make the > > call-graph (something wrong with my oprofile setup). > > I think oprofile cannot do call-graph on 64-bit x86, only on > 32-bit x86. I have successfully used it w/ call-graph on 32-bit > x86, on Helgrind. > > J > |
|
From: Julian S. <js...@ac...> - 2008-01-15 09:39:56
|
On Tuesday 15 January 2008 10:32, Konstantin Serebryany wrote: > You mean it requires 32-bit hardware? or 32-bit-only os? or 32-bit > executable? I _think_ requires 32-bit executable. > I don't have any 32-bit-only hardware at the moment, so I can't check :) > Anyway, I've sent the question to oprofile folks already. Not their fault. The 64-bit x86 stack is difficult to unwind (completely different from 32-bit case) and OProfile maybe does not have the relevant information Dwarf CFA data available. J |
|
From: Konstantin S. <kon...@gm...> - 2008-01-21 10:20:16
|
On Jan 15, 2008 12:38 PM, Julian Seward <js...@ac...> wrote: > On Tuesday 15 January 2008 10:32, Konstantin Serebryany wrote: > > You mean it requires 32-bit hardware? or 32-bit-only os? or 32-bit > > executable? > > I _think_ requires 32-bit executable. > > It looks like oprofile's callgraph requires 32-bit OS. At least it started working for me only when I rebooted my box in 32-bit mode. A bit of fun: I tried to profile helgrind with vtssrun (a tool from Intel Performance Tuning Utilities). vtssrun EXPERIMENT_DIR -- valgrind ... When running usual programs, vtssrun performs stack sampling of the program and then produces nice callgraphs, flat profiles, etc. But not for valgrind -- instead of profiling valgrind with vtssrun my command line debugged the vtssrun utility with valgrind!! Both tools try to inject themselves into the same address space... LOL --kcc |
|
From: Konstantin S. <kon...@gm...> - 2008-03-12 09:51:22
|
Hi,
I just discovered that oprofile's callgraph works fine on x86_64.
One just needs to compile the program in question with -fno-omit-frame-pointer.
So, in order to get valgrind's callgraph on x86_64 you'll need this change:
--- Makefile.flags.am (revision 7635)
+++ Makefile.flags.am (working copy)
@@ -20,7 +20,7 @@
AM_FLAG_M3264_AMD64_LINUX = @FLAG_M64@
AM_CPPFLAGS_AMD64_LINUX = $(add_includes_amd64_linux)
-AM_CFLAGS_AMD64_LINUX = $(WERROR) @FLAG_M64@ -fomit-frame-pointer \
+AM_CFLAGS_AMD64_LINUX = $(WERROR) @FLAG_M64@ -fno-omit-frame-pointer \
@PREFERRED_STACK_BOUNDARY@ $(AM_CFLAGS_BASE)
AM_CCASFLAGS_AMD64_LINUX = $(add_includes_amd64_linux) @FLAG_M64@ -g
--kcc
On Mon, Jan 21, 2008 at 1:20 PM, Konstantin Serebryany
<kon...@gm...> wrote:
>
>
> On Jan 15, 2008 12:38 PM, Julian Seward <js...@ac...> wrote:
> >
> > On Tuesday 15 January 2008 10:32, Konstantin Serebryany wrote:
> > > You mean it requires 32-bit hardware? or 32-bit-only os? or 32-bit
> > > executable?
> >
> > I _think_ requires 32-bit executable.
> >
> >
> >
>
> It looks like oprofile's callgraph requires 32-bit OS.
> At least it started working for me only when I rebooted my box in 32-bit
> mode.
>
>
> A bit of fun: I tried to profile helgrind with vtssrun (a tool from Intel
> Performance Tuning Utilities).
> vtssrun EXPERIMENT_DIR -- valgrind ...
> When running usual programs, vtssrun performs stack sampling of the program
> and then produces nice callgraphs, flat profiles, etc.
> But not for valgrind -- instead of profiling valgrind with vtssrun my
> command line debugged the vtssrun utility with valgrind!!
> Both tools try to inject themselves into the same address space... LOL
>
> --kcc
>
>
|