Thread: Re: [Algorithms] P3 Prefetching.

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

At 04:55 PM 1/29/2001 +0000, you wrote:

>what I've got is an algorithm that reads short strips of memory (about 20 
>bytes each) from 6 seperate locations - performs a calculation using the 
>data and writes a row of result values to a 7th address.
>
>BUT - I know the address of the strips I'll be wanting after the current 
>one. So I figured I ought to be able to prefech the 6 addresses whilst 
>calculating on the current lines...

    Have you considered the fact that prefetching will read a 32-byte-long 
32-byte-aligned piece of memory? If your 20-byte strip crosses a 
32-byte-align boundary, then you'll need two prefetches to get it. 
Prefetching the first and last byte should fix that.

>I've put the prefetch calls into the loop but they only give a slight 
>speed improvement. and when I look in VTune I see that I'm still stalling 
>waiting for the data as I was before... and I have no explaination as to 
>why the prefetches are not taking place...
>the loop takes about 200 cycles to use a strip of data so there ought to 
>be plenty time for the prefetch to have completed...

    Do you touch other memory during those 200 cycles? Maybe you're 
prefetching that data and it gets evicted before it's used. Also, take into 
account that the cache is only 2-way, so that you can only (IIRC) have up 
to two pieces of data from the same portion of two different 4K pages.

>I'm most puzzled.

    Yes, cache optimizations have a way to surprise you like no other. :)


    Salutaciones,
                               JCAB

---------------------------------------------------------------------
Juan Carlos "JCAB" Arevalo Baeza    | http://www.roningames.com
Senior Technology programmer        | mailto:jc...@ro...
Ronin Entertainment                 | ICQ: 10913692
                        (my opinions are only mine)
JCAB's Rumblings: http://www.metro.net/jcab/Rumblings/html/index.html

Thread: Re: [Algorithms] P3 Prefetching.

gdalgorithms-list