Re: [linuxsh-dev] Fixup for the fast SH4 memcpy

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Hi David

On Thu, 27 Feb 2003 17:41:44 +1000 da...@sn... wrote:

> I found a bug today in the fast SH4 memcpy that Stuart Menefy graciously
> wrote for us :-).

All I can say is nuts, and sorry!

I had a suspicion that there was still a lurking problem, as using
this memcpy instead of glibc's memcpy caused bash to generate some
strange errors after a while. However I was never able to pin down the
circumstances. Before I could investigate further I got diverted onto
some other work.

I thought I'd run through all possible tests before releasing this,
but you found one I'd missed. As soon as I added your size and
alignments to my test code it spotted it as well. I've now given up on
testing only selected alignments, and am doing exhaustive tests!

> Attached is a patch,  and the changed original just in case.  I would
> appreciate any feedback on the correctness/speed implications.  I ran
> it through an extensive alignment/boundary case test without any problems
> and it fixed my original crash :-)

Once I had a test case to throw at it, I fixed it independently, and
came up with an almost identical fix, a copy of which is
attached. Either will fix it, and I can't see any correctness
problems with either.

Performance wise, they are identical: both add one cycle. The only way
I can see of avoiding the extra test is to use the byte at a time copy
for remaining lengths 4 to 7.  Best case (4 bytes left) this would
increase the time from 12 cycles to 19, so I don't think its worth it
to save one cycle in the 8 to 31 case, which will take a minimum of
15 cycles anyway (timings from the 1: label to ret instr).

So thank you for all the hard work you must have put in tracking this
down, and sorry again.

Stuart