|
From: Paul B. <pau...@gm...> - 2008-11-24 17:07:01
|
Hi, I'm currently using cachegrind as a kind of a simulator to benchmark my code. Using the 'I refs' field of cachegrind's output, I see that revision A of my code runs in X cycles, and revision B runs in Y cycles. Hence, the speedup between the two revisions in X/Y. I find these results to be consistent on this machine, if I ignore that last 4 digits or so. I'd like to know how reliable I can consider this number to be: - Could it be considered an accurate reflection of how long my program will take to run? - Can I consider the number to be consistent between different versions of cachegrind? - Will it be forward-compatible with future versions? - Can I expect the results to be the same on different OS/architecture combinations? Finally, would you consider it to be 'safe' to include these numbers in a submission to a peer-reviewed publication? The idea would be that instead of using performance results that are fickle - depending on the load on my machine, my configuration, CPU etc - I could quote a number which would be reproducible on another machine. Thanks in advance, Paul Biggar -- Paul Biggar pau...@gm... |
|
From: Paul B. <pau...@gm...> - 2008-11-25 13:55:00
|
Hi Josef, Thanks for getting back to me so quickly. On Mon, Nov 24, 2008 at 6:55 PM, Josef Weidendorfer <Jos...@in...> wrote: > Hmm. Perhaps speedup in terms of executed instructions. However, > the term "speedup" is usually used for runtime improvement... Well, yeah. I was kinda deliberately fudging my terms, but I hadnt made that clear. Sorry. >> I find >> these results to be consistent on this machine, if I ignore that last >> 4 digits or so. > > What do you mean by 'consistent' here? Real measurement of instructions > executed via a performance counter? I meant that if I ran the same program twice, I got almost the same results (vs using clock time, where it varies quite wildly between runs of the same program). > But if you talk about the relation of your instruction speedup to actual > time speedup on your machine, then this is only true by chance, for your > specific code. Yes, I wasnt looking to correlate these. That would have been some fluke. > In another code, one instruction doing a memory access could need as > much time as 100 instructions doing integer operations... > >> I'd like to know how reliable I can consider this number to be: >> - Could it be considered an accurate reflection of how long my >> program will take to run? > > No. Right, that should have been obvious. >> - Can I consider the number to be consistent between different >> versions of cachegrind? > > As far as I know, the semantic behind Ir did not change in the past. > </snip> >> - Will it be forward-compatible with future versions? > > Probably, yes. OK, that's good to know. >> - Can I expect the results to be the same on different >> OS/architecture combinations? > > No. Depending on the compiler & architecture, for the same problem > there of course can be code generated with different numbers of > instructions. Right, this should have been obvious too. Apologies for the silly questions. >> Finally, would you consider it to be 'safe' to include these numbers >> in a submission to a peer-reviewed publication? The idea would be that >> instead of using performance results that are fickle - depending on >> the load on my machine, my configuration, CPU etc - I could quote a >> number which would be reproducible on another machine. > > As said above, cachegrinds Ir numbers say nothing about performance on > a real machine. They never will be a replacement for real time measurements. Yes, thats clear to me now. > However, if you assume a simple machine model (doing 1 instruction per time > step), the "Ir" numbers would be a good estimation of performance in this model. As you say, this isnt a very good model. > PS: A better (but still rough) time estimation is to take the cache misses into > account, which cachegrind can provide. I suppose the best thing for me is to take the caches and branch mispredictions into account alongside instruction references. Using calibrator I can probably come up with a single number with which to compare revisions. I'll have to use wall clock time for publication though. Thanks very much for you answers. They were very helpful. Paul -- Paul Biggar pau...@gm... |
|
From: Nicholas N. <nj...@cs...> - 2008-11-26 01:01:45
|
On Tue, 25 Nov 2008, Paul Biggar wrote: > I suppose the best thing for me is to take the caches and branch > mispredictions into account alongside instruction references. Using > calibrator I can probably come up with a single number with which to > compare revisions. I'll have to use wall clock time for publication > though. You might find this paper interesting: http://www.cs.mu.oz.au/~njn/pubs/cache-large-lazy2002.ps In it, I was able to find a simple model that just included instructions, cache misses and branch mispredictions, and which did a pretty good job of predicting the run-time of various Haskell programs. But this doesn't work in general, and indicated that the Haskell programs were quite unusual. And the machine was from about 2001, so the hardware for newer machines will be substantially different. Furthermore, I got the numbers from hardware counters. Cachegrind's numbers are less accurate -- see section 3.3.7 of http://www.cs.mu.oz.au/~njn/pubs/phd2004.pdf for all the reasons why. For publications you should use wall clock time as the definitive number, you'll get mauled by reviewers if you try anything else. Once you've got the wall clock times, you could possibly use Cachegrind numbers to explain differences, so long as you explain that Cachegrind has inaccuracies and its numbers are a useful guide rather than definitive. Nick |