Re: [perfmon2] [Ptools-perfapi] L1 data cache misses on Pentium 4

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

Sorry, we didn't mean to ignore you. This is great stuff.  We've =20
needed these definitions for a long time. Are the cache misses data =20
cache or I cache or both? It's worth digging through libpfm to see if =20=

this even can be specified symbolically. Dan, you're our PAPI P4 =20
expert, any thoughts?

Phil

On Dec 20, 2007, at 3:55 AM, Kenneth Hoste wrote:

>
> Nobody has comments on this? Do the settings seem reasonable? Or am =20=

> I just dreaming I got this right?
>
> K.
>
> On 14 Dec 2007, at 11:04, Kenneth Hoste wrote:
>
>> Hi,
>>
>> I think I have it figured out... I ran some tests with perfex, and =20=

>> the numbers I'm getting seem valid to me. I don't have any patch =20
>> for PAPI or libpfm, but I suspect people who are familiar with the =20=

>> insides of it will be able to create a patch of out this easily...
>>
>> I measured L1 cache misses as follows on the Pentium 4 machines =20
>> available to me:
>>
>> perfex -e 0x3B000/0x12000204@0x8000000C --p4pe=3D0x1000001 =
--p4pmv=3D0x1
>>
>> L2 cache misses rates are trivial from this, just change --p4pe to =20=

>> 0x1000002.
>>
>> Breaking this down:
>>
>> CCCR: 0x3B000
>>
>> bits 16-17 ('3'): measure for any active thread
>> bits 12-15 ('B'): bit 12 enables the counters, bits 13-15 select =20
>> ESCR 05h
>>
>> These settings are the same for the instr_completed event, no =20
>> surprise there.
>>
>> ESCR: 0x12000204
>>
>> bits 20-27 ('12'): bits 21-24 select 09h, being replay_event
>> bits 4-7 ('2'): bits 5 set to count NBOGUS tagged =C2=B5ops
>> bits 0-3 ('4'): bit 2 set, enabled counting for thread 0 user-level
>>
>> counter: 0x8000000C
>>
>> bits 24-27 ('8'): enables fast rdpcm
>> bits 0-3 ('C'): 0Ch, which corresponds to MSR_IQ_COUNTER0
>>
>> This speficies counting replay_event at an appropriate counter, =20
>> but only tagged =C2=B5ops will be counted. Tagging is specified by =20=

>> setting the appropriate bits in IA32_PEBS_ENABLE and =20
>> MSR_PEBS_MATRIX_VERT (see Table A-10 in Intel docs). Using perfex, =20=

>> this is done with --p4pe and --p4pmv respectively.
>>
>> In IA32_PEBS_ENABLE, bits 0 and 24 need to be set, resulting in =20
>> 0x1000001. Table A-10 in the Intel docs say to also enable bit 25, =20=

>> but that's only needed when using PEBS (and we are not in this =20
>> case). MSR_PEBS_MATRIX_VERT only needs bit 0 to be set, according =20
>> to Table A-10, hence 0x1.
>>
>> If something isn't clear in the details above, please let me know, =20=

>> and I'll try and explain.
>>
>> Now, for the validation of this, I used two SPEC CPU2000 =20
>> benchmarks, art and mcf, which are notorious for having a large =20
>> amount of cache misses. I've also measured cache miss rates for =20
>> these on an Opteron 244 and a Core 2 Duo (same statically linked =20
>> binaries used on all machines, compiled/linked with gcc 4.1.2 -O2 -=20=

>> static). The graphs are uploaded at http://www.elis.ugent.be/=20
>> ~kehoste/PAPI_cache_misses. If you want these for future =20
>> reference, make sure to make a local copy of these, because I =20
>> can't guarantee they will be up there forever. To me, these =20
>> numbers make perfect sense.
>>
>> Two notes I should make: the L2 misses for the Core 2 Duo machine =20
>> are so low that they are not showing in the graph; and one thing =20
>> which might seem strange at first is that the L1 miss rate for art =20=

>> on the model 2 Pentium4 (8K L1-D) are _lower_ than the model 3/4 =20
>> Pentium 4s (16K L1-D). I think this can be explained because the =20
>> latter models probably have more aggressive instruction =20
>> prefetching, which causes more L1 data entries to be pushed out, =20
>> and hence more L1-D cache misses.
>>
>> Any comments on this are highly appreciated.
>>
>> K.
>>
>> --
>>
>> Computer Science is no more about computers than astronomy is =20
>> about telescopes. (E. W. Dijkstra)
>>
>> Kenneth Hoste
>> ELIS - Ghent University
>> email: ken...@el...
>> blog: http://www.elis.ugent.be/~kehoste/blog
>> website: http://www.elis.ugent.be/~kehoste
>>
>> ---------------------------------------------------------------------=20=

>> ----
>> SF.Net email is sponsored by:
>> Check out the new SourceForge.net Marketplace.
>> It's the best place to buy or sell services
>> for just about anything Open Source.
>> http://ad.doubleclick.net/clk;164216239;13503038;w?http://sf.net/=20
>> marketplace_______________________________________________
>> perfmon2-devel mailing list
>> per...@li...
>> https://lists.sourceforge.net/lists/listinfo/perfmon2-devel
>
> --=20
>
> Computer Science is no more about computers than astronomy is about =20=

> telescopes. (E. W. Dijkstra)
>
> Kenneth Hoste
> ELIS - Ghent University
> email: ken...@el...
> blog: http://www.elis.ugent.be/~kehoste/blog
> website: http://www.elis.ugent.be/~kehoste
>
> _______________________________________________
> Ptools-perfapi mailing list
> Pto...@cs...
> http://lists.cs.utk.edu/listinfo/ptools-perfapi