Could someone explain how a "CPU dispatcher" works? Is there a way to tell the hardware which set of SSE instructions you would like to use?
I have heard that MS's D3DX uses the same method to determine which SSE version is supported, at run-time.
In any case, I don't understand how it is possible to create code that is scheduled as well as code that is written using purely intrinsics for a target SSE version, and 'inline' vectorized functions are inlined.
So you generate multiple versions of functions at compile-time, and choose the appropriate one(s) at runtime using CPUID. I guess this is not as awesome as my naive vision of certain SSE(N) instructions being replaced by fallback SSE(N-1) instructions in the instruction stream.. somehow.
My biggest beef with the CPU dispatch method, then, is that you cannot write a small function that you really want to be inlined within larger functions, if there are multiple versions that must be chosen from at runtime. Sigh, guess I would just have to stick with SSE2 intrinsics then, and just replace low-level inlinable functions with SSE3 (and higher) intrinsics where possible, and compile a new binary for those procs.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Could someone explain how a "CPU dispatcher" works? Is there a way to tell the hardware which set of SSE instructions you would like to use?
I have heard that MS's D3DX uses the same method to determine which SSE version is supported, at run-time.
In any case, I don't understand how it is possible to create code that is scheduled as well as code that is written using purely intrinsics for a target SSE version, and 'inline' vectorized functions are inlined.
(Ref about SSEPLUS CPU dispatcher: Last post at bottom of this page: http://aceshardware.freeforums.org/intel-avx-kills-amd-sse5-t538.html\)
Link fix: http://aceshardware.freeforums.org/intel-avx-kills-amd-sse5-t538.html
I guess to kind of answer my own question, I found:
http://www.ncsa.uiuc.edu/UserInfo/Resources/Software/Intel/Compilers/8.1/c_ug/lin1138.htm
So you generate multiple versions of functions at compile-time, and choose the appropriate one(s) at runtime using CPUID. I guess this is not as awesome as my naive vision of certain SSE(N) instructions being replaced by fallback SSE(N-1) instructions in the instruction stream.. somehow.
My biggest beef with the CPU dispatch method, then, is that you cannot write a small function that you really want to be inlined within larger functions, if there are multiple versions that must be chosen from at runtime. Sigh, guess I would just have to stick with SSE2 intrinsics then, and just replace low-level inlinable functions with SSE3 (and higher) intrinsics where possible, and compile a new binary for those procs.