Kenneth –

Didn’t mean to ignore this; just didn’t get a chance to take a close look before the holidays got in the way. I still haven’t looked closely enough to offer an informed opinion. It’s on my TODO list…

- d

 


From: perfmon2-devel-bounces@lists.sourceforge.net [mailto:perfmon2-devel-bounces@lists.sourceforge.net] On Behalf Of Philip Mucci
Sent: Thursday, December 20, 2007 5:52 PM
To: Kenneth Hoste
Cc: papi list; perfmon2-devel@lists.sourceforge.net
Subject: Re: [perfmon2] [Ptools-perfapi] L1 data cache misses on Pentium 4

 

Sorry, we didn't mean to ignore you. This is great stuff.  We've needed these definitions for a long time. Are the cache misses data cache or I cache or both? It's worth digging through libpfm to see if this even can be specified symbolically. Dan, you're our PAPI P4 expert, any thoughts?

 

Phil

 

 

 

On Dec 20, 2007, at 3:55 AM, Kenneth Hoste wrote:

 

 

Nobody has comments on this? Do the settings seem reasonable? Or am I just dreaming I got this right?

 

K.

 

On 14 Dec 2007, at 11:04, Kenneth Hoste wrote:

 

Hi,

 

I think I have it figured out... I ran some tests with perfex, and the numbers I'm getting seem valid to me. I don't have any patch for PAPI or libpfm, but I suspect people who are familiar with the insides of it will be able to create a patch of out this easily...

 

I measured L1 cache misses as follows on the Pentium 4 machines available to me:

 

perfex -e 0x3B000/0x12000204@0x8000000C --p4pe=0x1000001 --p4pmv=0x1

 

L2 cache misses rates are trivial from this, just change --p4pe to 0x1000002.

 

Breaking this down:

 

CCCR: 0x3B000

 

bits 16-17 ('3'): measure for any active thread

bits 12-15 ('B'): bit 12 enables the counters, bits 13-15 select ESCR 05h

 

These settings are the same for the instr_completed event, no surprise there.

 

ESCR: 0x12000204

 

bits 20-27 ('12'): bits 21-24 select 09h, being replay_event

bits 4-7 ('2'): bits 5 set to count NBOGUS tagged µops

bits 0-3 ('4'): bit 2 set, enabled counting for thread 0 user-level

 

counter: 0x8000000C

 

bits 24-27 ('8'): enables fast rdpcm

bits 0-3 ('C'): 0Ch, which corresponds to MSR_IQ_COUNTER0

 

This speficies counting replay_event at an appropriate counter, but only tagged µops will be counted. Tagging is specified by setting the appropriate bits in IA32_PEBS_ENABLE and MSR_PEBS_MATRIX_VERT (see Table A-10 in Intel docs). Using perfex, this is done with --p4pe and --p4pmv respectively.

 

In IA32_PEBS_ENABLE, bits 0 and 24 need to be set, resulting in 0x1000001. Table A-10 in the Intel docs say to also enable bit 25, but that's only needed when using PEBS (and we are not in this case). MSR_PEBS_MATRIX_VERT only needs bit 0 to be set, according to Table A-10, hence 0x1.  

 

If something isn't clear in the details above, please let me know, and I'll try and explain.

 

Now, for the validation of this, I used two SPEC CPU2000 benchmarks, art and mcf, which are notorious for having a large amount of cache misses. I've also measured cache miss rates for these on an Opteron 244 and a Core 2 Duo (same statically linked binaries used on all machines, compiled/linked with gcc 4.1.2 -O2 -static). The graphs are uploaded at http://www.elis.ugent.be/~kehoste/PAPI_cache_misses. If you want these for future reference, make sure to make a local copy of these, because I can't guarantee they will be up there forever. To me, these numbers make perfect sense. 

 

Two notes I should make: the L2 misses for the Core 2 Duo machine are so low that they are not showing in the graph; and one thing which might seem strange at first is that the L1 miss rate for art on the model 2 Pentium4 (8K L1-D) are _lower_ than the model 3/4 Pentium 4s (16K L1-D). I think this can be explained because the latter models probably have more aggressive instruction prefetching, which causes more L1 data entries to be pushed out, and hence more L1-D cache misses. 

 

Any comments on this are highly appreciated.

 

K.

 

--


Computer Science is no more about computers than astronomy is about telescopes. (E. W. Dijkstra)

Kenneth Hoste
ELIS - Ghent University
email: kenneth.hoste@elis.ugent.be
blog: http://www.elis.ugent.be/~kehoste/blog
website: http://www.elis.ugent.be/~kehoste

 

-------------------------------------------------------------------------
SF.Net email is sponsored by:
Check out the new SourceForge.net Marketplace.
It's the best place to buy or sell services
for just about anything Open Source.
http://ad.doubleclick.net/clk;164216239;13503038;w?http://sf.net/marketplace_______________________________________________
perfmon2-devel mailing list
perfmon2-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/perfmon2-devel

 

-- 

Computer Science is no more about computers than astronomy is about telescopes. (E. W. Dijkstra)

Kenneth Hoste
ELIS - Ghent University
email: kenneth.hoste@elis.ugent.be
blog: http://www.elis.ugent.be/~kehoste/blog
website: http://www.elis.ugent.be/~kehoste

 

_______________________________________________

Ptools-perfapi mailing list