From: Stuart M. <stu...@st...> - 2003-02-28 01:28:52
|
Hi David On Thu, 27 Feb 2003 17:41:44 +1000 da...@sn... wrote: > I found a bug today in the fast SH4 memcpy that Stuart Menefy graciously > wrote for us :-). All I can say is nuts, and sorry! I had a suspicion that there was still a lurking problem, as using this memcpy instead of glibc's memcpy caused bash to generate some strange errors after a while. However I was never able to pin down the circumstances. Before I could investigate further I got diverted onto some other work. I thought I'd run through all possible tests before releasing this, but you found one I'd missed. As soon as I added your size and alignments to my test code it spotted it as well. I've now given up on testing only selected alignments, and am doing exhaustive tests! > Attached is a patch, and the changed original just in case. I would > appreciate any feedback on the correctness/speed implications. I ran > it through an extensive alignment/boundary case test without any problems > and it fixed my original crash :-) Once I had a test case to throw at it, I fixed it independently, and came up with an almost identical fix, a copy of which is attached. Either will fix it, and I can't see any correctness problems with either. Performance wise, they are identical: both add one cycle. The only way I can see of avoiding the extra test is to use the byte at a time copy for remaining lengths 4 to 7. Best case (4 bytes left) this would increase the time from 12 cycles to 19, so I don't think its worth it to save one cycle in the 8 to 31 case, which will take a minimum of 15 cycles anyway (timings from the 1: label to ret instr). So thank you for all the hard work you must have put in tracking this down, and sorry again. Stuart |