From: Konstantinos K. <K.K...@sm...> - 2007-02-17 14:40:20
|
Hi, I run cachegrind on a AMD Athlon 4200+ 64x2 system and get these results: Cachegrind, an I1/D1/L2 cache profiler. Copyright (C) 2002-2006, and GNU GPL'd, by Nicholas Nethercote et al. Using LibVEX rev 1658, a library for dynamic binary translation. Copyright (C) 2004-2006, and GNU GPL'd, by OpenWorks LLP. Using valgrind-3.2.1, a dynamic binary instrumentation framework. Copyright (C) 2000-2006, and GNU GPL'd, by Julian Seward et al. For more details, rerun with: -v I refs: 3,641,862,524 I1 misses: 1,050 L2i misses: 1,041 I1 miss rate: 0.00% L2i miss rate: 0.00% D refs: 1,355,109,542 (839,732,151 rd + 515,377,391 wr) D1 misses: 2,299,014 ( 1,426,783 rd + 872,231 wr) L2d misses: 1,603,411 ( 812,790 rd + 790,621 wr) D1 miss rate: 0.1% ( 0.1% + 0.1% ) L2d miss rate: 0.1% ( 0.0% + 0.1% ) L2 refs: 2,300,064 ( 1,427,833 rd + 872,231 wr) L2 misses: 1,604,452 ( 813,831 rd + 790,621 wr) L2 miss rate: 0.0% ( 0.0% + 0.1% ) -Are these results valid? The miss rates should be must higher, if they are calculated as references/misses. -Does cachegrind simulate hardware prefetching mechanisms. If yes, does it adjust its functionality to the characteristics of the specific processor? Thanks. |
From: Nicholas N. <nj...@cs...> - 2007-02-17 23:18:50
|
On Sat, 17 Feb 2007, Konstantinos Krikellas wrote: > I run cachegrind on a AMD Athlon 4200+ 64x2 system and get these > results: > > I refs: 3,641,862,524 > I1 misses: 1,050 > L2i misses: 1,041 > I1 miss rate: 0.00% > L2i miss rate: 0.00% > > D refs: 1,355,109,542 (839,732,151 rd + 515,377,391 wr) > D1 misses: 2,299,014 ( 1,426,783 rd + 872,231 wr) > L2d misses: 1,603,411 ( 812,790 rd + 790,621 wr) > D1 miss rate: 0.1% ( 0.1% + 0.1% ) > L2d miss rate: 0.1% ( 0.0% + 0.1% ) > > L2 refs: 2,300,064 ( 1,427,833 rd + 872,231 wr) > L2 misses: 1,604,452 ( 813,831 rd + 790,621 wr) > L2 miss rate: 0.0% ( 0.0% + 0.1% ) > > -Are these results valid? The miss rates should be must higher, if they > are calculated as references/misses. If you are talking about the L2 miss rates, it's calculated as (D refs) / (L2 misses)). > -Does cachegrind simulate hardware prefetching mechanisms. If yes, does > it adjust its functionality to the characteristics of the specific > processor? No. This is basically impossible to simulate without knowing exactly how the processors prefetches, which I believe is not publically-known information. Nick |
From: Josef W. <Jos...@gm...> - 2007-02-18 02:42:07
|
On Sunday 18 February 2007, Nicholas Nethercote wrote: > On Sat, 17 Feb 2007, Konstantinos Krikellas wrote: > > D refs: 1,355,109,542 (839,732,151 rd + 515,377,391 wr) > > D1 misses: 2,299,014 ( 1,426,783 rd + 872,231 wr) > > L2d misses: 1,603,411 ( 812,790 rd + 790,621 wr) > > D1 miss rate: 0.1% ( 0.1% + 0.1% ) > > L2d miss rate: 0.1% ( 0.0% + 0.1% ) > > > > L2 refs: 2,300,064 ( 1,427,833 rd + 872,231 wr) > > L2 misses: 1,604,452 ( 813,831 rd + 790,621 wr) > > L2 miss rate: 0.0% ( 0.0% + 0.1% ) > > > > -Are these results valid? The miss rates should be must higher, if they > > are calculated as references/misses. > > If you are talking about the L2 miss rates, it's calculated as > (D refs) / (L2 misses)). Shouldn't this be (L2 misses) / (D refs) ? > > -Does cachegrind simulate hardware prefetching mechanisms. If yes, does > > it adjust its functionality to the characteristics of the specific > > processor? > > No. This is basically impossible to simulate without knowing exactly how > the processors prefetches, which I believe is not publically-known > information. In Callgrind, you can switch on simulation of a stream prefetcher (AFAIK similar to the L2 hardware prefetcher in some Intel processors). However, Cachegrind/Callgrind does not know anything about timings. So with the prefetcher switched on, it will be assumed that all references the prefetcher generates will be found in the cache when the data really is needed. Ie. you get the best case possible. This is probably not near to any reality, but by comparing the results with the ones without the prefetcher switched on, you can see which parts of your code are "prefetcher friendly". In fact, you could argue that cachegrind simulates a worst case prefetcher (ie. one which has no effect at all). Josef |
From: Nicholas N. <nj...@cs...> - 2007-02-18 05:23:41
|
On Sun, 18 Feb 2007, Josef Weidendorfer wrote: >> If you are talking about the L2 miss rates, it's calculated as >> (D refs) / (L2 misses)). > > Shouldn't this be (L2 misses) / (D refs) ? Yes, my mistake. Nick |