From: David M. <da...@sn...> - 2003-02-28 03:11:12
|
Jivin Stuart Menefy lays it down ... > Hi David > > On Thu, 27 Feb 2003 17:41:44 +1000 da...@sn... wrote: > > > I found a bug today in the fast SH4 memcpy that Stuart Menefy graciously > > wrote for us :-). > > All I can say is nuts, and sorry! No problem, even with the bug the performance improvements are well worth the effort and I would rather debug the existing version than write my own from scratch. It must have taken a bit of work to get it going ;-) > I had a suspicion that there was still a lurking problem, as using > this memcpy instead of glibc's memcpy caused bash to generate some > strange errors after a while. However I was never able to pin down the > circumstances. Before I could investigate further I got diverted onto > some other work. > > I thought I'd run through all possible tests before releasing this, > but you found one I'd missed. As soon as I added your size and > alignments to my test code it spotted it as well. I've now given up on > testing only selected alignments, and am doing exhaustive tests! Yeah, I wrote a quick exhaustive test to check my changes before I posted them. Took about 15 minutes to run but was worth it ;-) I can post a copy if you like. > > Attached is a patch, and the changed original just in case. I would > > appreciate any feedback on the correctness/speed implications. I ran > > it through an extensive alignment/boundary case test without any problems > > and it fixed my original crash :-) > > Once I had a test case to throw at it, I fixed it independently, and > came up with an almost identical fix, a copy of which is > attached. Either will fix it, and I can't see any correctness > problems with either. > > Performance wise, they are identical: both add one cycle. The only way > I can see of avoiding the extra test is to use the byte at a time copy > for remaining lengths 4 to 7. Best case (4 bytes left) this would > increase the time from 12 cycles to 19, so I don't think its worth it > to save one cycle in the 8 to 31 case, which will take a minimum of > 15 cycles anyway (timings from the 1: label to ret instr). Cool, I'll use you version so I am in sync with what you have. Also I think the cmp is easier to understand when reading the code ;-) > So thank you for all the hard work you must have put in tracking this > down, and sorry again. No need to be sorry :-) Thanks, Davidm -- David McCullough: Ph: +61 7 3435 2815 http://www.SnapGear.com da...@sn... Fx: +61 7 3891 3630 Custom Embedded Solutions + Security |