|
From: Geoff A. <gd...@us...> - 2012-11-18 05:40:28
|
I've recently been looking at Callgrind and Cachegrind output. I've found that both tools generate an Ir count. From what I've found in the Valgrind documentation and on the Web, the Ir count is the number of instructions executed (instructions read). Thus, I would expect Callgrind and Cachegrind to generate the same Ir count; however, I've found that they give different values. Can some explain exactly what the Callgrind and Cachegrind Ir counts are and why their values differ? I'm using the Valgrind 3.6.1 that comes with openSUSE 12.1 if that make any difference. Thanks, Geoff Alexander, Ph.D. Software Engineer, Corporate Tools Development IBM Corporation RTP, NC |
|
From: Josef W. <Jos...@gm...> - 2012-11-19 10:48:47
|
Am 18.11.2012 06:40, schrieb Geoff Alexander: > I've recently been looking at Callgrind and Cachegrind output. I've > found that both tools generate an Ir count. From what I've found in the > Valgrind documentation and on the Web, the Ir count is the number of > instructions executed (instructions read). Thus, I would expect > Callgrind and Cachegrind to generate the same Ir count; There can be slight changes from run to run, be it because of retried system calls (EAGAIN), signal handlers run at different times, and so on. As callgrind runs slower, this could make for different polling/retry behavior. But these should result only in slight changes, perhaps a few hundert. > however, I've > found that they give different values. Can some explain exactly what > the Callgrind and Cachegrind Ir counts are and why their values differ? If they differ by a larger amount, that seems to be a bug. Hmm. I think I just found a bug here :( Running callgrind without (default) vs. with simulation (--cache-sim=yes) gives different Ir counts. Analysing this in more detail using "--dump-instr=yes" to see the machine code annotation, Ir for calls to shared libraries differ by 1. And this goes away by switching off the smart PLT-ignore behavior (--skip-plt=no). Can you confirm this to be also be the case for the observed difference in your test runs? Some background: Callgrind has a mechanism where it can "ignore" given functions, by propagating costs and deeper calls to the call sites of that function. By ignoring dispatcher code, it makes the resulting call graph often much more useful. As code in PLT sections (jump tables to shared library functions, resolved by the runtime linker) are dispatchers, callgrind by default ignores them. This can be changed by "--skip-plt". Josef > I'm using the Valgrind 3.6.1 that comes with openSUSE 12.1 if that > make any difference. > > Thanks, > Geoff Alexander, Ph.D. > Software Engineer, Corporate Tools Development > IBM Corporation > RTP, NC > > > ------------------------------------------------------------------------------ > Monitor your physical, virtual and cloud infrastructure from a single > web console. Get in-depth insight into apps, servers, databases, vmware, > SAP, cloud infrastructure, etc. Download 30-day Free Trial. > Pricing starts from $795 for 25 servers or applications! > http://p.sf.net/sfu/zoho_dev2dev_nov > > > > _______________________________________________ > Valgrind-users mailing list > Val...@li... > https://lists.sourceforge.net/lists/listinfo/valgrind-users > |
|
From: Josef W. <Jos...@gm...> - 2012-11-19 22:07:22
|
Am 19.11.2012 11:38, schrieb Josef Weidendorfer: > Hmm. I think I just found a bug here :( I just fixed that bug in SVN trunk. So if that was the issue you observed, can you check current SVN? Josef |
|
From: Geoff A. <gd...@us...> - 2012-11-27 07:48:52
|
Josef Weidendorfer <Jos...@gm...> wrote on 11/19/2012 05:07:13 PM: > From: Josef Weidendorfer <Jos...@gm...> > To: Geoff Alexander/Raleigh/IBM@IBMUS, > Cc: val...@li... > Date: 11/19/2012 05:07 PM > Subject: Re: [Valgrind-users] Why does Ir count in Callgrind and > Cachegrind differ? > > Am 19.11.2012 11:38, schrieb Josef Weidendorfer: > > Hmm. I think I just found a bug here :( > > I just fixed that bug in SVN trunk. > So if that was the issue you observed, can you > check current SVN? > > Josef > Josef, I tested with both Valgrind 3.8.1 and Valgrind SVN-13143 as well as retesting with the Valgrind 3.6.1 that comes with openSUSE 12.1. The SVN version appears to fix the problem as the Cachegrind and Callgrind instruction counts now only differ by 3 out of over 8.9 billion. Here are the results of my test runs: gdlxn@alexander-linux:~/workspace/eccl/perftest/debug> valgrind --tool=cachegrind PerfTestSet ==16039== Cachegrind, a cache and branch-prediction profiler ==16039== Copyright (C) 2002-2012, and GNU GPL'd, by Nicholas Nethercote et al. ==16039== Using Valgrind-3.9.0.SVN and LibVEX; rerun with -h for copyright info ==16039== Command: PerfTestSet ==16039== ==16039== ==16039== I refs: 8,921,481,138 ==16039== I1 misses: 1,327 ==16039== LLi misses: 1,317 ==16039== I1 miss rate: 0.00% ==16039== LLi miss rate: 0.00% ==16039== ==16039== D refs: 5,275,956,419 (3,342,655,668 rd + 1,933,300,751 wr) ==16039== D1 misses: 37,330,601 ( 33,179,400 rd + 4,151,201 wr) ==16039== LLd misses: 11,259,050 ( 7,507,219 rd + 3,751,831 wr) ==16039== D1 miss rate: 0.7% ( 0.9% + 0.2% ) ==16039== LLd miss rate: 0.2% ( 0.2% + 0.1% ) ==16039== ==16039== LL refs: 37,331,928 ( 33,180,727 rd + 4,151,201 wr) ==16039== LL misses: 11,260,367 ( 7,508,536 rd + 3,751,831 wr) ==16039== LL miss rate: 0.0% ( 0.0% + 0.1% ) gdlxn@alexander-linux:~/workspace/eccl/perftest/debug> /usr/local/valgrind-3.8.1/bin/valgrind --tool=cachegrind PerfTestSet ==16044== Cachegrind, a cache and branch-prediction profiler ==16044== Copyright (C) 2002-2012, and GNU GPL'd, by Nicholas Nethercote et al. ==16044== Using Valgrind-3.8.1 and LibVEX; rerun with -h for copyright info ==16044== Command: PerfTestSet ==16044== ==16044== ==16044== I refs: 8,921,481,169 ==16044== I1 misses: 1,327 ==16044== LLi misses: 1,317 ==16044== I1 miss rate: 0.00% ==16044== LLi miss rate: 0.00% ==16044== ==16044== D refs: 5,275,956,429 (3,342,655,670 rd + 1,933,300,759 wr) ==16044== D1 misses: 37,330,606 ( 33,179,405 rd + 4,151,201 wr) ==16044== LLd misses: 11,259,052 ( 7,507,220 rd + 3,751,832 wr) ==16044== D1 miss rate: 0.7% ( 0.9% + 0.2% ) ==16044== LLd miss rate: 0.2% ( 0.2% + 0.1% ) ==16044== ==16044== LL refs: 37,331,933 ( 33,180,732 rd + 4,151,201 wr) ==16044== LL misses: 11,260,369 ( 7,508,537 rd + 3,751,832 wr) ==16044== LL miss rate: 0.0% ( 0.0% + 0.1% ) gdlxn@alexander-linux:~/workspace/eccl/perftest/debug> /usr/bin/valgrind --tool=cachegrind PerfTestSet ==16052== Cachegrind, a cache and branch-prediction profiler ==16052== Copyright (C) 2002-2010, and GNU GPL'd, by Nicholas Nethercote et al. ==16052== Using Valgrind-3.6.1 and LibVEX; rerun with -h for copyright info ==16052== Command: PerfTestSet ==16052== ==16052== ==16052== I refs: 8,921,480,599 ==16052== I1 misses: 1,327 ==16052== LLi misses: 1,317 ==16052== I1 miss rate: 0.00% ==16052== LLi miss rate: 0.00% ==16052== ==16052== D refs: 5,275,956,329 (3,342,655,583 rd + 1,933,300,746 wr) ==16052== D1 misses: 37,242,762 ( 33,096,112 rd + 4,146,650 wr) ==16052== LLd misses: 11,259,039 ( 7,507,218 rd + 3,751,821 wr) ==16052== D1 miss rate: 0.7% ( 0.9% + 0.2% ) ==16052== LLd miss rate: 0.2% ( 0.2% + 0.1% ) ==16052== ==16052== LL refs: 37,244,089 ( 33,097,439 rd + 4,146,650 wr) ==16052== LL misses: 11,260,356 ( 7,508,535 rd + 3,751,821 wr) ==16052== LL miss rate: 0.0% ( 0.0% + 0.1% ) gdlxn@alexander-linux:~/workspace/eccl/perftest/debug> valgrind --tool=callgrind PerfTestSet ==16058== Callgrind, a call-graph generating cache profiler ==16058== Copyright (C) 2002-2012, and GNU GPL'd, by Josef Weidendorfer et al. ==16058== Using Valgrind-3.9.0.SVN and LibVEX; rerun with -h for copyright info ==16058== Command: PerfTestSet ==16058== ==16058== For interactive control, run 'callgrind_control -h'. ==16058== ==16058== Events : Ir ==16058== Collected : 8921481135 ==16058== ==16058== I refs: 8,921,481,135 gdlxn@alexander-linux:~/workspace/eccl/perftest/debug> /usr/local/valgrind-3.8.1/bin/valgrind --tool=callgrind PerfTestSet ==16379== Callgrind, a call-graph generating cache profiler ==16379== Copyright (C) 2002-2012, and GNU GPL'd, by Josef Weidendorfer et al. ==16379== Using Valgrind-3.8.1 and LibVEX; rerun with -h for copyright info ==16379== Command: PerfTestSet ==16379== ==16379== For interactive control, run 'callgrind_control -h'. ==16379== ==16379== Events : Ir ==16379== Collected : 8901479966 ==16379== ==16379== I refs: 8,901,479,966 gdlxn@alexander-linux:~/workspace/eccl/perftest/debug> /usr/bin/valgrind --tool=callgrind PerfTestSet ==16386== Callgrind, a call-graph generating cache profiler ==16386== Copyright (C) 2002-2010, and GNU GPL'd, by Josef Weidendorfer et al. ==16386== Using Valgrind-3.6.1 and LibVEX; rerun with -h for copyright info ==16386== Command: PerfTestSet ==16386== ==16386== For interactive control, run 'callgrind_control -h'. ==16386== ==16386== Events : Ir ==16386== Collected : 8901479396 ==16386== ==16386== I refs: 8,901,479,396 gdlxn@alexander-linux:~/workspace/eccl/perftest/debug> Thanks for fixing the problem. Geoff Alexander, Ph.D. Software Engineer, Corporate Tools Development IBM Corporation RTP, NC |
|
From: Josef W. <Jos...@gm...> - 2012-11-27 13:39:54
|
Am 27.11.2012 08:48, schrieb Geoff Alexander: > Josef Weidendorfer <Jos...@gm...> wrote on 11/19/2012 > > I just fixed that bug in SVN trunk. > > I tested with both Valgrind 3.8.1 and Valgrind SVN-13143 as well as > retesting with the Valgrind 3.6.1 that comes with openSUSE 12.1. The > SVN version appears to fix the problem as the Cachegrind and Callgrind > instruction counts now only differ by 3 out of over 8.9 billion. Very good. These small changes probably are about nondeterminism regarding system calls, as said in the old mail. > Thanks for fixing the problem. And vice versa. While PLT code is probably not important for performance analysis in most cases (and that's the reason the bug did not came up yet), this difference worried me. So, thanks for the confirmation that it actually was the bug I fixed! Josef > > Geoff Alexander, Ph.D. > Software Engineer, Corporate Tools Development > IBM Corporation > RTP, NC |
|
From: Siddharth N. <si...@gm...> - 2012-12-02 23:29:21
|
Curious question. How much is the slowdown of Callgrind over Cachegrind? How does the slowdown change with larger programs? On 27 November 2012 08:39, Josef Weidendorfer <Jos...@gm...>wrote: > Am 27.11.2012 08:48, schrieb Geoff Alexander: > > Josef Weidendorfer <Jos...@gm...> wrote on 11/19/2012 > > > I just fixed that bug in SVN trunk. > > > > I tested with both Valgrind 3.8.1 and Valgrind SVN-13143 as well as > > retesting with the Valgrind 3.6.1 that comes with openSUSE 12.1. The > > SVN version appears to fix the problem as the Cachegrind and Callgrind > > instruction counts now only differ by 3 out of over 8.9 billion. > > Very good. > These small changes probably are about nondeterminism regarding > system calls, as said in the old mail. > > > Thanks for fixing the problem. > > And vice versa. While PLT code is probably not important for performance > analysis in most cases (and that's the reason the bug did not came up > yet), this difference worried me. > > So, thanks for the confirmation that it actually was the bug I fixed! > > Josef > > > > > > > > Geoff Alexander, Ph.D. > > Software Engineer, Corporate Tools Development > > IBM Corporation > > RTP, NC > > > > ------------------------------------------------------------------------------ > Monitor your physical, virtual and cloud infrastructure from a single > web console. Get in-depth insight into apps, servers, databases, vmware, > SAP, cloud infrastructure, etc. Download 30-day Free Trial. > Pricing starts from $795 for 25 servers or applications! > http://p.sf.net/sfu/zoho_dev2dev_nov > _______________________________________________ > Valgrind-users mailing list > Val...@li... > https://lists.sourceforge.net/lists/listinfo/valgrind-users > |
|
From: Josef W. <Jos...@gm...> - 2012-12-03 18:26:08
|
Hi, Am 03.12.2012 00:28, schrieb Siddharth Nilakantan: > Curious question. How much is the slowdown of Callgrind over > Cachegrind? Callgrind unfortunately is quite a bit slower than cachegrind. Reasons for this: (1) for dynamic call-graph collection, every basic block (BB) first calls into a Callgrind helper function to trace the sequence of basic blocks (ie. VEX's ability to merge multiple BBs is switched off). (2) there is some additional work in comparison to cachegrind as Callgrind collects separate counters of the same code if run within different threads or call-contexts. This often needs a lookup per executed BB in a dynamically growing hash table, to find the counter array to update. The actual slowdown will depend on the application, but I assume a factor of 2 or 3. Try it out yourself (note that cache simulation is by default switched off in Callgrind). > How does the slowdown change with larger programs? In general, all Valgrind tools get faster if the work on instrumentation goes down in the long run. But then, the slowdown between Cachegrind and Callgrind (given same cache simulation parameters), should be constant. Josef |