|
From: Vincent Penquerc'h <Vin...@ar...> - 2003-07-28 09:46:20
|
> On the contrary, it's extremely relevant from a practical > point of view. Yes, and it might be sensible to add it either to cachegrind or to a variant of it, since cache misses/hits will be needed for this to be accurate. > It's almost impossible to know how long any one instruction will take. > For example, on my machine, a memory read takes 1 cycle if it > hits the L1 > cache, 10 cycles in the worst case if it only hits the L2 > cache, and 206 > cycles in the worst case if it misses both caches. But out-of-order as you mention here. > execution and all the other fancy modern CPU mumbo-jumbo > means that these > delays are rarely as bad as the worst case. Then throw in branch > mispredictions, and other pipeline stalls... it's a mess. Branch prediction algorithms are known (well, they were for the Pentium, when I last did asm stuff). Stalls have well defined conditions (AGIs, etc, are predictable). So it would be doable. After all, Vtune does (did, at least) this. -- Vincent Penquerc'h |