|
From: Nicolas J. <nic...@fr...> - 2002-11-13 20:26:25
|
On Wednesday 13 November 2002 18:58, Steve Harris wrote: > > I heard the P4 heavily relies on optimal SSE2 optimizations in order to > > deliver maximum performance and it seems that both gcc and icc do not > > work optimally in this regard. > > SSE, not SSE2 IIRC. SSE2 is still only 128bits wide, and uses 64bit floats > so it can only go two-way. Gcc and even icc are not really good at code vectorisation. IMHA it is a better idea to parallel the code manually using the SSE instructions, you will get better performances. I can try to look at the code, and see if there is room for optimisations. But I'm very new to this project, and I think there is more experimented programmers than me on this list. -- Nicolas Justin - <nic...@fr...> |