|
From: Nicholas N. <nj...@ca...> - 2003-07-28 10:23:34
|
On Mon, 28 Jul 2003, Vincent Penquerc'h wrote: > Yes, and it might be sensible to add it either to cachegrind > or to a variant of it, since cache misses/hits will be needed > for this to be accurate. Don't give too much credit to Cachegrind, what it says is only an approximation. For example, it doesn't even try to take into account virtual->physical address mappings. See developer.kde.org/~sewardj/docs-1.9.5/cg_main.html#cg-top, section 4.12 for a full list of its known shortcomings (Nb: the one about custom malloc() is not a problem since v1.9.6). > Branch prediction algorithms are known (well, they were for the > Pentium, when I last did asm stuff). Stalls have well defined > conditions (AGIs, etc, are predictable). So it would be doable. > After all, Vtune does (did, at least) this. Hmm. Doable, maybe; but very, very difficult. Much harder than you might think at first. To count cycles you need to simulate pretty much everything: the whole pipleline, caches, TLBs, all that stuff. The SimpleScalar project (www.simplescalar.com) does that. N |