The code should be 2x unrolled and largely optimized
for the PowerPC dual FPU with proper instruction
grouping and some pipe analysis.
Seems that even gcc can arrange code that is faster
than current Asm on the PowerPC FPU.
This HAS to be fixed.
Log in to post a comment.