From: Kieran O'M. <kie...@gm...> - 2007-09-13 10:41:05
|
Hi all, In light of the disappointing performance of the SSE2 code for small vectors and matrices I made a few small changes to the code. Specifically I changed the calculation of return values and the handling of leftover elements to use SSE intrinsics instead of normal operations. The new timings (attached) show the results. SSE2 is now faster than, or only marginally slower, in all cases. Therefore I have altered the config so that SSE2 support is enabled by default if the hardware supports it. Cheers, Kieran |