From: Gwenole B. <gb...@di...> - 2004-02-20 18:03:05
|
Hi, I have just committed some tentative SSE & MMX optimizations for AltiVec emulation. You need gcc recent enough to take benefit. e.g. gcc 3.2.2 with the respective intrinsics in <mmintrin.h> & <xmmintrin.h>. This is generic code that can be improved even more. Still, the 1.8 GHz Opteron here performs at around 783 MegFlops on AltiVec Fractal Carbon, i.e. the exact half performance than my PBG4/400. I really would like to reach 1+ GigaFlops in emulation. ;-) However, I doubt this will be reached with the current "JIT1" dynamic translation engine. Obviously, we can't translate all AltiVec instructions to SSE code simply because that's not practical, especially saturating variants as we have to update the SAT bit and there is not extra flag in x86 land to know that a value saturated. i.e. we only have approx 30 native code templates for key VMX instructions used e.g. in AltiVec Fractal Carbon. I hope I haven't broken anything. One possible next step is to use the run-time assembler I wrote last year for the new JIT infrastructure. That way, we can remove some useless instructions in the process. I don't know yet what you can expect next, but at least the CPU emulation is now in an interesting shape. ;-) Bye, Gwenole. |