I presented a paper today at the IISWC (IEEE International Symposium
on Workload Characterization) Conference that might be of interest to
those on this list.
Sorry in advance about the alarmist nature of the title, we really only
looked at the retired instruction counters on x86.
Can Hardware Performance Counters be Trusted?
by Vincent M. Weaver and Sally A. McKee
A quick summary:
We investigated the same-machine and cross-machine sources of variation
in the retired instruction count for a wide variety of x86 machines. In
theory the retired instruction count should be the same across all
machines, but it isn't, sometimes by billions of instructions.
We looked at 9 different x86 implementations, from a Pentium Pro up
through a Core 2 system. We ran the full SPEC 2000 and 2006 benchmark
We found the following sources of variation:
+ The fldcw instruction on Pentium 4 with the instr_retired counter
counts as two instructions, on all other implementations it counts as
1. With a new enough machine you can avoid this by using the
instr_completed count instead.
+ The layout of virtual memory can cause non-deterministic counts.
This is because some benchmarks do things like use pointers as
hash-table keys, among other things.
To work around this:
* Disable heap/stack randomization on recent 2.6 kernels
* Enforce 3GB compatibility layout when running 32-bit apps on
64-bit machines (otherwise the stack is moved higher to give
* Make sure the environment variables, command line args, and
executable name are the same size on all machines being
investigated (these affect stack offset)
The first two of the above can be enforced using the
"linux32 -3 -R" helper command, at least on debian systems.
+ System hardware interrupts cause extra retired instruction counts
even if you are only measuring userspace code. This is often
visible as being equivelent to the number of timer interrupts
(approximately equal to the program runtime times HZ value)
This does not affect the instr_completed count found on newer P4s
but does affect all other processors we investigated.
+ Pagefaults can also increase the count (we did not fully investigate)