|
From: David C. <dcc...@ac...> - 2012-05-12 06:02:01
|
On 5/11/2012 8:04 PM, Michael Andronov wrote:
> I'm looking for some help/hint to explain the results I'm observing with the following peace of code:
> "
> …
> struct CmpIndexValue {
> static SListOp compare(const IndexValue& left,IndexValue& right,ulong ity) {
> __builtin_prefetch(&right);
> __builtin_prefetch(*((void **)&right)); //?!! Why D1mr are observed there?
> int cmp=left.key.cmp(right.key,(TREE_KT)ity); return cmp<0?SLO_LT:cmp>0?SLO_GT:SLO_NOOP;
> }
> };
> …
> "
> where the `right` is the reference for some structure like:
> (gdb) print right
> $1 = (IndexValue&) @0x7ffff5ad13e0: {key = {ptr = {p = 0x7ffff5ad13d8, l = 5}, ... }
> (gdb) x /4x (void **)&right
> 0x7ffff5ad13e0: 0xf5ad13d8 0x00007fff 0x00000005 0x00000000
> (gdb) x /4x *(void **)&right
> 0x7ffff5ad13d8: 0x048056a1 0x01000016 0xf5ad13d8 0x00007fff
> (gdb)
>
> The expected behaviour:
> - the line - __builtin_prefetch(*((void **)&right)); - should trigger the prefetch (64 bytes of my machine) from address {key = {ptr = {p = 0x7ffff5ad13d8… .
> - when the actual access to right.key.ptr.p will occur - within left.key.cmp() - the values should be within Ld1…
>
> The observed behaviour is different, however:
> - the kcachegrind is reporting significant 11.5% D1mr misses at __builtin_prefetch(*((void **)&right)); line. Which looks a bit strange and unexpected for me...
>
> Actually, if that line is commented out, then approximately the same amount of D1mr losses is shown later, within left.key.cmp() function, at attempt to access right.key.ptr.p values.
> So, to the 'certain degree', putting the __builtin_prefetch() function is doing the job and performing prefetching…
> But the idea was to eliminate the D1mr misses, not 'to move' them around the code. ;)
>
> I would be very grateful for any hint for direction to understand and/or explanation where my expectations are wrong and/or what I'm doing wrong.
>
>
The goal of prefetching is to get the data you are going to need while
your code is doing something else. You have told the compiler that a
specific value is going to be used very soon, so it should ask the CPU
to ensure that the value is in the cache. It sounds like 11.5% of the
time, the value is not in the cache and must be retrieved from main
memory. Only a cache-friendly reorganization of your data or access
patterns can reduce this, and if your working set size exceeds the cache
size, you're more or less stuck with that miss rate. You could move the
misses around in the code, but you can't get rid of them by prefetching.
Alternatively, you could rewrite the code to do more work between the
prefetch request and the actual use of the data. Right now you are
dereferencing the variable named /right/ immediately after the prefetch,
so the CPU will stall 11.5% of the time. If there was more work to do,
the CPU would not be idle while the cache line was being loaded.
--
David Chapman dcc...@ac...
Chapman Consulting -- San Jose, CA
Software Development Done Right.
www.chapman-consulting-sj.com
|