RE: [Valgrind-users] cpu cycles

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

> On the contrary, it's extremely relevant from a practical 
> point of view.

Yes, and it might be sensible to add it either to cachegrind
or to a variant of it, since cache misses/hits will be needed
for this to be accurate.

> It's almost impossible to know how long any one instruction will take.
> For example, on my machine, a memory read takes 1 cycle if it 
> hits the L1
> cache, 10 cycles in the worst case if it only hits the L2 
> cache, and 206
> cycles in the worst case if it misses both caches.  But out-of-order

as you mention here.

> execution and all the other fancy modern CPU mumbo-jumbo 
> means that these
> delays are rarely as bad as the worst case.  Then throw in branch
> mispredictions, and other pipeline stalls... it's a mess.

Branch prediction algorithms are known (well, they were for the
Pentium, when I last did asm stuff). Stalls have well defined
conditions (AGIs, etc, are predictable). So it would be doable.
After all, Vtune does (did, at least) this.

-- 
Vincent Penquerc'h