Re: [xine-devel] memcpy.c speedup patch

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Hi Jonathan,

On Thu, 9 Jan 2003, Jonathan Brown wrote:

> I found the same mistake in both sse_memcpy and mmx2_memcpy. They both
> presume that prefetchnta prefetches 64 bytes. In actual fact, the p3
> prefetches 32 bytes and the p4 prefetches 128 bytes. The patch optimizes
> it correctly for the p3. If you want to optimize for the p4 you should
> really use movdqa/movdqu.
>
> Please apply to the tree.

[...]

thanks for looking into this and sending a patch - i just tried it, but
couldn't really reproduce the improvements you measured. my cpuinfo says

processor       : 0
vendor_id       : GenuineIntel
cpu family      : 6
model           : 8
model name      : Pentium III (Coppermine)
stepping        : 10
cpu MHz         : 1005.050
cache size      : 256 KB
fdiv_bug        : no
hlt_bug         : no
f00f_bug        : no
coma_bug        : no
fpu             : yes
fpu_exception   : yes
cpuid level     : 2
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 sep mtrr pge mca cmov
pat pse36 mmx fxsr sse
bogomips        : 2005.40

...and like i said - for me, the jitter between subsequently measuring the
same code a few times is bigger than the differences i see between the
unpatched and the patched memcpy routines.

maybe you or someone else can shed some more (experimental or facts-based)
light on this matter :)

cheers, thanks again,

Heiko