Re: [perfmon2] [Ptools-perfapi] L1 data cache misses on Pentium 4
Status: Beta
Brought to you by:
seranian
From: Philip M. <mu...@cs...> - 2007-12-20 22:51:55
|
Sorry, we didn't mean to ignore you. This is great stuff. We've =20 needed these definitions for a long time. Are the cache misses data =20 cache or I cache or both? It's worth digging through libpfm to see if =20= this even can be specified symbolically. Dan, you're our PAPI P4 =20 expert, any thoughts? Phil On Dec 20, 2007, at 3:55 AM, Kenneth Hoste wrote: > > Nobody has comments on this? Do the settings seem reasonable? Or am =20= > I just dreaming I got this right? > > K. > > On 14 Dec 2007, at 11:04, Kenneth Hoste wrote: > >> Hi, >> >> I think I have it figured out... I ran some tests with perfex, and =20= >> the numbers I'm getting seem valid to me. I don't have any patch =20 >> for PAPI or libpfm, but I suspect people who are familiar with the =20= >> insides of it will be able to create a patch of out this easily... >> >> I measured L1 cache misses as follows on the Pentium 4 machines =20 >> available to me: >> >> perfex -e 0x3B000/0x12000204@0x8000000C --p4pe=3D0x1000001 = --p4pmv=3D0x1 >> >> L2 cache misses rates are trivial from this, just change --p4pe to =20= >> 0x1000002. >> >> Breaking this down: >> >> CCCR: 0x3B000 >> >> bits 16-17 ('3'): measure for any active thread >> bits 12-15 ('B'): bit 12 enables the counters, bits 13-15 select =20 >> ESCR 05h >> >> These settings are the same for the instr_completed event, no =20 >> surprise there. >> >> ESCR: 0x12000204 >> >> bits 20-27 ('12'): bits 21-24 select 09h, being replay_event >> bits 4-7 ('2'): bits 5 set to count NBOGUS tagged =C2=B5ops >> bits 0-3 ('4'): bit 2 set, enabled counting for thread 0 user-level >> >> counter: 0x8000000C >> >> bits 24-27 ('8'): enables fast rdpcm >> bits 0-3 ('C'): 0Ch, which corresponds to MSR_IQ_COUNTER0 >> >> This speficies counting replay_event at an appropriate counter, =20 >> but only tagged =C2=B5ops will be counted. Tagging is specified by =20= >> setting the appropriate bits in IA32_PEBS_ENABLE and =20 >> MSR_PEBS_MATRIX_VERT (see Table A-10 in Intel docs). Using perfex, =20= >> this is done with --p4pe and --p4pmv respectively. >> >> In IA32_PEBS_ENABLE, bits 0 and 24 need to be set, resulting in =20 >> 0x1000001. Table A-10 in the Intel docs say to also enable bit 25, =20= >> but that's only needed when using PEBS (and we are not in this =20 >> case). MSR_PEBS_MATRIX_VERT only needs bit 0 to be set, according =20 >> to Table A-10, hence 0x1. >> >> If something isn't clear in the details above, please let me know, =20= >> and I'll try and explain. >> >> Now, for the validation of this, I used two SPEC CPU2000 =20 >> benchmarks, art and mcf, which are notorious for having a large =20 >> amount of cache misses. I've also measured cache miss rates for =20 >> these on an Opteron 244 and a Core 2 Duo (same statically linked =20 >> binaries used on all machines, compiled/linked with gcc 4.1.2 -O2 -=20= >> static). The graphs are uploaded at http://www.elis.ugent.be/=20 >> ~kehoste/PAPI_cache_misses. If you want these for future =20 >> reference, make sure to make a local copy of these, because I =20 >> can't guarantee they will be up there forever. To me, these =20 >> numbers make perfect sense. >> >> Two notes I should make: the L2 misses for the Core 2 Duo machine =20 >> are so low that they are not showing in the graph; and one thing =20 >> which might seem strange at first is that the L1 miss rate for art =20= >> on the model 2 Pentium4 (8K L1-D) are _lower_ than the model 3/4 =20 >> Pentium 4s (16K L1-D). I think this can be explained because the =20 >> latter models probably have more aggressive instruction =20 >> prefetching, which causes more L1 data entries to be pushed out, =20 >> and hence more L1-D cache misses. >> >> Any comments on this are highly appreciated. >> >> K. >> >> -- >> >> Computer Science is no more about computers than astronomy is =20 >> about telescopes. (E. W. Dijkstra) >> >> Kenneth Hoste >> ELIS - Ghent University >> email: ken...@el... >> blog: http://www.elis.ugent.be/~kehoste/blog >> website: http://www.elis.ugent.be/~kehoste >> >> ---------------------------------------------------------------------=20= >> ---- >> SF.Net email is sponsored by: >> Check out the new SourceForge.net Marketplace. >> It's the best place to buy or sell services >> for just about anything Open Source. >> http://ad.doubleclick.net/clk;164216239;13503038;w?http://sf.net/=20 >> marketplace_______________________________________________ >> perfmon2-devel mailing list >> per...@li... >> https://lists.sourceforge.net/lists/listinfo/perfmon2-devel > > --=20 > > Computer Science is no more about computers than astronomy is about =20= > telescopes. (E. W. Dijkstra) > > Kenneth Hoste > ELIS - Ghent University > email: ken...@el... > blog: http://www.elis.ugent.be/~kehoste/blog > website: http://www.elis.ugent.be/~kehoste > > _______________________________________________ > Ptools-perfapi mailing list > Pto...@cs... > http://lists.cs.utk.edu/listinfo/ptools-perfapi |