#59 Config option to force SSE math for P4


Pentium 4 and related chips (such as Pentium M) have
very slow calculation with NaN values, when using 387
math. For large numbers of NaNs and matrix
multiplication, NaNs can result in factor of 100

However, SSE floating point does not suffer this
penalty. Adding compiler flags that force use of SSE,
like -mfpmath=sse can, in some cases lead to ATLAS
libraries that have no NaN penalty - at least for
matrix multiplication. However, this is fragile, as
other ATLAS kernel configurations are possible which do
not obey these compiler options, and do incur the

Because the slowdown for NaNs is so severe, and the P4
chips are so common, it would be good to have a
configure option which could prevent any kernel from
using non-SSE math - or skip over kernels where this
was not possible.


  • R. Clint Whaley

    R. Clint Whaley - 2006-05-27

    Logged In: YES


    OK, this has been knocking around in the back of my brain
    for a while. I initially said no way, due to the
    architectural defaults problem, which would put an ongoing
    burden on me for every architecture and release. On the
    other hand, I usually don't mind so much doing a one-time
    work in order to support a small minority of users that
    really need a particular feature.

    This line of reasoning made me realize there might be middle
    ground, where I do some work to enable you to do what you
    need much less painfully, while still not saddling me with a
    maintainence nightmare.

    The way I figure it, you have several problems, but the main
    one is that even if you opt-out of my architectural defaults
    (which may require the x87 unit), you have no easy way to
    build ATLAS w/o grabbing some x87 code. What I *might* be
    able to do is provide a mechanism for building ATLAS w/o
    architectural defaults that obeys the no-x87 dictum, and
    then getting good defaults would be essentially your
    responsibility (there are some ideas for making this easy,
    and perhaps fairly automated, that we can discuss later),
    but you could build the lib using the search . . .

    You can avoid x87 code in generated code via changing
    compiler flags, which leaves the multiple implementation
    code as the possible spanner in the spokes. To avoid this,
    I could add a "uses x87" to every index file. With this in
    place, either I or you could write a script that removes all
    "uses x87" kernels from the index scripts before
    installation, and you should be golden . . .

    Therefore, you would add the -fpmath flags manually
    yourself, and not use architecture defaults, but then ATLAS
    would provide the tools to ensure that the generated lib has
    no x87 code. This might not bulletproof if compilers change
    w/o our knowledge, but I don't think this'll be a big issue
    in practice . . .

    Now, this is not a 15-minute change, so I'm not definitely
    saying I'll do it, but would this interest you?


  • R. Clint Whaley

    R. Clint Whaley - 2006-05-27
    • milestone: --> Would_be_nice
    • assigned_to: nobody --> rwhaley

Log in to post a comment.

Get latest updates about Open Source Projects, Conferences and News.

Sign up for the SourceForge newsletter:

No, thanks