Go to www.codeplay.com, and try their compiler. It
optimizes win32 binaries to have them run as fast as
possible, by making your source use MMX, 3D-NOW, and
processor-specific stuff. Once I have a successful
compile with Visual C++ I'm going to try with VectorC.