|
From: Hynek S. <hs+...@ox...> - 2006-08-14 12:39:48
|
Hello Josef, Josef Weidendorfer <Jos...@gm...> writes: >> Also, Callgrind says >> >> ==14861== I refs: 1,206,122,472 >> ==14861== I1 misses: 3,955 >> ==14861== L2i misses: 2,648 >> ==14861== I1 miss rate: 0.0% >> ==14861== L2i miss rate: 0.0% > Hmm... instruction fetches most often hit in the cache. What's about > the data accesses? I measured with more real data and got: ==22798== Events : Ir Dr Dw I1mr D1mr D1mw I2mr D2mr D2mw ==22798== Collected : 754730 253655 132391 3789 4150 911 2589 2094 802 ==22798== ==22798== I refs: 754,730 ==22798== I1 misses: 3,789 ==22798== L2i misses: 2,589 ==22798== I1 miss rate: 0.50% ==22798== L2i miss rate: 0.34% ==22798== ==22798== D refs: 386,046 (253,655 rd + 132,391 wr) ==22798== D1 misses: 5,061 ( 4,150 rd + 911 wr) ==22798== L2d misses: 2,896 ( 2,094 rd + 802 wr) ==22798== D1 miss rate: 1.3% ( 1.6% + 0.6% ) ==22798== L2d miss rate: 0.7% ( 0.8% + 0.6% ) ==22798== ==22798== L2 refs: 8,850 ( 7,939 rd + 911 wr) ==22798== L2 misses: 5,485 ( 4,683 rd + 802 wr) ==22798== L2 miss rate: 0.4% ( 0.4% + 0.6% ) Still okay, is it? >> Looks like it. But as said, it's also problematic, because of moving >> additional code. > You can subtract the overhead of the inserted rdtsc instruction, as > you know this overhead (you can measure it before). BTW, you also > should be able to read performance counters (... I am not really sure > if rdmsr is allowed in user space ...). Hm, I guess I'll hack some general .a together, to make that easy. I used to use just a preprocessor macro but it turned out pretty copious. >> So I guess a combination of rdtsc (exact times), oprofile >> (aprox. runtime distribution) and callgrind (caches + callgraphs) is the >> way to go. That's also what I expected. > Yes. That's also a big TODO item for KCachegrind: to combine > measurement results of different tools to come up with something > better. VTune is supposed to support this (callgraph from > instrumentation mode, time from sampling). Sounds like a really huge TODO. >> Hm, what would speak against rdtsc-instrumentation? > Why? If you are doing it yourself, you can avoid unneeded > instrumentation, you can control the overhead, and even subtract it > from the result. Some kind of automation would be nice I'd say. -hs |