From: Nick K. <nic...@ma...> - 2003-01-09 18:31:01
|
Hello, Jonathan! On Thu, 09 Jan 2003 07:34:39 +0000 you wrote: > I found the same mistake in both sse_memcpy and mmx2_memcpy. They both > presume that prefetchnta prefetches 64 bytes. In actual fact, the p3 > prefetches 32 bytes and the p4 prefetches 128 bytes. The patch optimizes > it correctly for the p3. If you want to optimize for the p4 you should > really use movdqa/movdqu. > > Please apply to the tree. > > CC any replies to me as I am not on the list. > Thanks for the patch! I've improved your patch (see mplayerxp's CVSlog) to perform run-time cache-line size detection. What about movdqa/movdqu - these insns perform serialized memory storing (means - they can't perform unordeder memory writing like movntps, movntpd, mopnti insns so they are not the best solution for fastest memory writing. WBR! Nick |