Kenneth –

Didn’t mean to ignore this; just didn’t get a chance to take a close look before the holidays got in the way. I still haven’t looked closely enough to offer an informed opinion. It’s on my TODO list…

- d


From: [] On Behalf Of Philip Mucci
Sent: Thursday, December 20, 2007 5:52 PM
To: Kenneth Hoste
Cc: papi list;
Subject: Re: [perfmon2] [Ptools-perfapi] L1 data cache misses on Pentium 4


Sorry, we didn't mean to ignore you. This is great stuff.  We've needed these definitions for a long time. Are the cache misses data cache or I cache or both? It's worth digging through libpfm to see if this even can be specified symbolically. Dan, you're our PAPI P4 expert, any thoughts?






On Dec 20, 2007, at 3:55 AM, Kenneth Hoste wrote:



Nobody has comments on this? Do the settings seem reasonable? Or am I just dreaming I got this right?




On 14 Dec 2007, at 11:04, Kenneth Hoste wrote:




I think I have it figured out... I ran some tests with perfex, and the numbers I'm getting seem valid to me. I don't have any patch for PAPI or libpfm, but I suspect people who are familiar with the insides of it will be able to create a patch of out this easily...


I measured L1 cache misses as follows on the Pentium 4 machines available to me:


perfex -e 0x3B000/0x12000204@0x8000000C --p4pe=0x1000001 --p4pmv=0x1


L2 cache misses rates are trivial from this, just change --p4pe to 0x1000002.


Breaking this down:


CCCR: 0x3B000


bits 16-17 ('3'): measure for any active thread

bits 12-15 ('B'): bit 12 enables the counters, bits 13-15 select ESCR 05h


These settings are the same for the instr_completed event, no surprise there.


ESCR: 0x12000204


bits 20-27 ('12'): bits 21-24 select 09h, being replay_event

bits 4-7 ('2'): bits 5 set to count NBOGUS tagged µops

bits 0-3 ('4'): bit 2 set, enabled counting for thread 0 user-level


counter: 0x8000000C


bits 24-27 ('8'): enables fast rdpcm

bits 0-3 ('C'): 0Ch, which corresponds to MSR_IQ_COUNTER0


This speficies counting replay_event at an appropriate counter, but only tagged µops will be counted. Tagging is specified by setting the appropriate bits in IA32_PEBS_ENABLE and MSR_PEBS_MATRIX_VERT (see Table A-10 in Intel docs). Using perfex, this is done with --p4pe and --p4pmv respectively.


In IA32_PEBS_ENABLE, bits 0 and 24 need to be set, resulting in 0x1000001. Table A-10 in the Intel docs say to also enable bit 25, but that's only needed when using PEBS (and we are not in this case). MSR_PEBS_MATRIX_VERT only needs bit 0 to be set, according to Table A-10, hence 0x1.  


If something isn't clear in the details above, please let me know, and I'll try and explain.


Now, for the validation of this, I used two SPEC CPU2000 benchmarks, art and mcf, which are notorious for having a large amount of cache misses. I've also measured cache miss rates for these on an Opteron 244 and a Core 2 Duo (same statically linked binaries used on all machines, compiled/linked with gcc 4.1.2 -O2 -static). The graphs are uploaded at If you want these for future reference, make sure to make a local copy of these, because I can't guarantee they will be up there forever. To me, these numbers make perfect sense. 


Two notes I should make: the L2 misses for the Core 2 Duo machine are so low that they are not showing in the graph; and one thing which might seem strange at first is that the L1 miss rate for art on the model 2 Pentium4 (8K L1-D) are _lower_ than the model 3/4 Pentium 4s (16K L1-D). I think this can be explained because the latter models probably have more aggressive instruction prefetching, which causes more L1 data entries to be pushed out, and hence more L1-D cache misses. 


Any comments on this are highly appreciated.





Computer Science is no more about computers than astronomy is about telescopes. (E. W. Dijkstra)

Kenneth Hoste
ELIS - Ghent University


SF.Net email is sponsored by:
Check out the new Marketplace.
It's the best place to buy or sell services
for just about anything Open Source.;164216239;13503038;w?
perfmon2-devel mailing list



Computer Science is no more about computers than astronomy is about telescopes. (E. W. Dijkstra)

Kenneth Hoste
ELIS - Ghent University



Ptools-perfapi mailing list