|
From: Jeremy F. <je...@go...> - 2003-07-28 16:25:19
|
On Mon, 2003-07-28 at 02:53, Joerg Beyer wrote: > Vincent Penquerc'h wrote: > >>OK, so an implementation to do cycle counting needs to have > >>a table that lists for every instruction how many cycles it > >>need for the different cache-hit/miss situations. > >> > >>Are these informations available (for all the processors)? > > > > > > Yes, but there is an awful lot of possible combinations. That > > what are the conditions? Is it this or more? > * L1 Hit, L2 hit, Cache Miss > * branch prediction was true, false I'm afraid it is much, much more complicated than this. There's a lot more to it than "branch prediction", since modern CPUs will speculatively execute way in advance. There's the TLB, and whether you're getting TLB misses. There's the breakdown of an instruction into uops, and how those uops are handled by the various functional units, and what conflicts they have. Basically people have completely given up on the idea of how many "cycles" a particular instruction takes and mostly given up on the idea of cycles for groups of instructions. The innards of the CPU are not well enough documented to really simulate, and even if you could, each CPU is different enough that it would take a lot of work for each one. If you're interested in running time, the only meaningful measurement you can make is run the code and see how long it takes to run. You can use the performance counters to glean information about why a particular sequence ran slower than expected by looking for "bad" events (cache misses, interlocks, stalls, etc), and try to work out what code caused them. The other difficulty with measurement is that the instructions which read the timer/counter registers are not necessarily synchronized with the code stream, or if they are, are so expensive that they upset the thing you're trying to measure. The best way to profile the code is to take a small kernel which you run many times to amortize the cost of doing measurement. On the plus side, cache misses are so expensive they tend to dominate execution time. If you use cachegrind to reduce your miss rate, that may be enough to make things go faster. J |