|
From: Ferad Z. <fer...@bs...> - 2007-05-17 11:20:22
|
Hi, I read the valgrind manual but I couldn't figure out how to collect the the event Cycles (CPU cycles). I want to see how many cycles it costs to call and execute a function. -- Ferad Zyulkyarov |
|
From: Nicholas N. <nj...@cs...> - 2007-05-17 11:27:56
|
On Thu, 17 May 2007, Ferad Zyulkyarov wrote: > Hi, I read the valgrind manual but I couldn't figure out how to > collect the the event Cycles (CPU cycles). I want to see how many > cycles it costs to call and execute a function. Why do you think Valgrind can do this? Gprof might be more suited to your needs. Nick |
|
From: Nicholas N. <nj...@cs...> - 2007-05-17 22:08:19
|
On Thu, 17 May 2007, Ferad Zyulkyarov wrote: > Hi, gprof is working in higher level time granularity. > > Here > (http://valgrind.org/docs/manual/cl-format.html#cl-format.overview.example1) > is shown an example with events "cycles" collecting the cycles a > passed until the execution of that line in the file. > > I want to collect that information with callgrind but I don't know > what options (flags) to use to run the tool. I think that is just an example of what could go in the file, but Callgrind doesn't actually collect those numbers. Josef, is that right? Nick |
|
From: Josef W. <Jos...@gm...> - 2007-05-18 00:04:49
|
On Friday 18 May 2007, Nicholas Nethercote wrote: > On Thu, 17 May 2007, Ferad Zyulkyarov wrote: > > > Hi, gprof is working in higher level time granularity. > > > > Here > > (http://valgrind.org/docs/manual/cl-format.html#cl-format.overview.example1) > > is shown an example with events "cycles" collecting the cycles a > > passed until the execution of that line in the file. > > > > I want to collect that information with callgrind but I don't know > > what options (flags) to use to run the tool. > > I think that is just an example of what could go in the file, but Callgrind > doesn't actually collect those numbers. Josef, is that right? Ah, yes. The fact that an example for the callgrind format description talks about an event "cycle" (which was arbitrarly choosen, every name without white space is fine here) does not impose any features of callgrind. If there ever is a profiling tool in Valgrind that talks about event "cycle", this has to the cycle count of an machine module which is simulated by the tool, and fed with the instruction stream of the client program. So even in this case, this event probably has no relation to any real processor. Given a machine model which exactly executes one instruction per cycle, you can interpret the even "Ir" (instructions fetched) of cachegrind as such a "cycle" event. A better model is to take a cache hierarchy into account. Then you come up with the cycle estimation formula which can be found as derived event in KCachegrind: CycleEstimation = Ir + 10 * L1Misses + 100 * L2Misses (insert the coefficients according to the average latencies of the memory subsystem in your own machine) However, that still is a _very_ rough estimation. For exact, real cycle values, you need some instrumentation which reads the cycle counter (on x86 rdtsc) at enter/leave time of a function. GProf does not this. It uses instrumentation to collect call relationships and call counters; for time cost, it does sampling. Why do you need this fine granular information? Sampling usually is enough (better use OProfile than GProf). Josef |
|
From: Ferad Z. <fe...@gm...> - 2007-05-18 08:21:41
|
Hi, I examined the callgrind source and realized that CPU Cycles is not really supported :) However it can be done with instrumentation and reading the relevant CPU counters (in my case this is intel x86). > interpret the even "Ir" (instructions fetched) of cachegrind as such a "cycle" > event. This is not quite valid. Especially for the "call" instruction, which for an empty function in my CPU takes about 150 cycles :) (if I didn't calculate it wrong). > A better model is to take a cache hierarchy into account. Then you come up > with the cycle estimation formula which can be found as derived event in KCachegrind: > > CycleEstimation = Ir + 10 * L1Misses + 100 * L2Misses > > (insert the coefficients according to the average latencies of the memory > subsystem in your own machine) > > However, that still is a _very_ rough estimation. I saw this. I think it is a good flexibility in getting insight about the underlying CPU capability. > Why do you need this fine granular information? Sampling usually is enough > (better use OProfile than GProf). I want to analyze some hardware extension before implementing it in a simulator to see if it is worths working on. At some points I need CPU cycle level granularity. Thanks, Ferad Zyulkyarov |
|
From: Nicholas N. <nj...@cs...> - 2007-05-18 03:29:41
|
On Fri, 18 May 2007, Josef Weidendorfer wrote: >> I think that is just an example of what could go in the file, but Callgrind >> doesn't actually collect those numbers. Josef, is that right? > > Ah, yes. Would you mind changing the docs to use the events Callgrind does collect, to avoid this potential confusion in the future? Thanks! Nick |
|
From: Josef W. <Jos...@gm...> - 2007-05-18 09:53:20
|
On Friday 18 May 2007, Nicholas Nethercote wrote: > On Fri, 18 May 2007, Josef Weidendorfer wrote: > > >> I think that is just an example of what could go in the file, but Callgrind > >> doesn't actually collect those numbers. Josef, is that right? > > > > Ah, yes. > > Would you mind changing the docs to use the events Callgrind does collect, > to avoid this potential confusion in the future? Thanks! I just look at the section; there is a description which also refers to these event names. I would be happier to add the following instead of changing the example; in KCachegrind, there is a converter for OProfile data to the callgrind format, and I want to keep the format description in sync with the one I have as part of the KCachegrind docu (and on the KCachegrind web site). Josef --- cl-format.xml (Revision 6742) +++ cl-format.xml (Arbeitskopie) @@ -49,6 +49,12 @@ <sect2 id="cl-format.overview.example1" xreflabel="Simple Example"> <title>Simple Example</title> +<para>The event names in the following example are quite arbitrary, and are not +related to event names used by Callgrind. Especially, cycle counts matching +real processors probably will never be generated by any Valgrind tools, as these +are bound to simulations of simple machine models for acceptable slowdown. +However, any profiling tool can use the format described in this chapter.</para> + <para> <screen>events: Cycles Instructions Flops |
|
From: Nicholas N. <nj...@cs...> - 2007-05-19 01:20:25
|
On Fri, 18 May 2007, Josef Weidendorfer wrote: > I just look at the section; there is a description which also refers to these > event names. I would be happier to add the following instead of changing > the example; in KCachegrind, there is a converter for OProfile data to the > callgrind format, and I want to keep the format description in sync with > the one I have as part of the KCachegrind docu (and on the KCachegrind web site). That's fine, thanks for doing it :) Nick |