|
From: Matthew B. <mat...@gm...> - 2006-02-03 18:35:39
|
Hi, This is just to flag up a problem I ran into for matlab, which is that Pentium 3s and 4s have very very slow standard math performance with NaN values - for example adding to an NaN value on my machine is about 22 times slower than adding to a non-NaN value. This can become a very big problem with matrix multiplication if there are a significant number of NaNs. I explained the problem here, for matlab and the software I have been working with: http://www.mrc-cbu.cam.ac.uk/Imaging/Common/spm_intel_tune.shtml To illustrate, I've attached a timing script, running on current svn numpy linked with a standard P4 optimized ATLAS library. It (dot) multiples a 200x200 array of ones by a) another 200x200 array of ones and b) a 200x200 array of NaNs: ones * ones: 0.017460 ones * NaNs: 2.323742 proportion: 133.090452 Happily, for the Pentium 4, you can solve the problem by forcing the chip to do floating point math with the SSE instructions, which do not have this NaN penalty. So, the solution was only to recompile the ATLAS libraries with extra gcc flags forcing the use of SSE math (see the page above) - or use the Intel Math Kernel libraries, which appear to have already used this trick. Here's output from numpy linked to the recompiled ATLAS libraries: ones * ones: 0.026638 ones * NaNs: 0.023987 proportion: 0.900473 I wonder if it would be worth considering distributing the recompiled libraries by default in any binary releases? Or include a test like this one in the benchmarks to warn users about this problem? Best, Matthew |