From: Paul Khuong <pvk@pv...>  20090620 11:28:50

Good evening. Before committing the SSE complex patch, I'd like your opinion on other changes I made in that area: 1. We currently compute complexcomplex division with: (let* ((rx (realpart x)) (ix (imagpart x)) (ry (realpart y)) (iy (imagpart y))) (if (> (abs ry) (abs iy)) (let* ((r (/ iy ry)) (dn (* ry (+ 1 (* r r))))) (complex (/ (+ rx (* ix r)) dn) (/ ( ix (* rx r)) dn))) (let* ((r (/ ry iy)) (dn (* iy (+ 1 (* r r))))) (complex (/ (+ (* rx r) ix) dn) (/ ( (* ix r) rx) dn))))) Knuth suggests to compute dn as (+ ry (* r iy)) or (+ iy (* r ry)) in his tome on seminumerical algorithms. Indeed, squaring r only to cancel some of it away is fishy. Unless someone can think of a justification for the current implementation, I believe that change won't be a problem. 2. In any case, the implementation above is far from complete. Compared to, e.g., the behaviour mandated by the C99 specification, we fail to handle pretty much all the edge cases (except for gratuitous precision loss compared to a more naive implementation). On the other hand, in a performanceoriented application, the user probably wants something closer to the above than what GCC has to emit to conform to C99. I don't think C99 is noted for the sanity of its complexes either... 3. On a somewhat related note, the spec says that we should compute complexfloat/realfloat operations as though the real was first upgraded to a complex. That's not quite what happens, since we have some transforms that optimise such cases. Thus, (* 1.0 #c(1.0 #.sbext:singlefloatpositiveinfinity)) => #c(1.0 #.SBEXT:SINGLEFLOATPOSITIVEINFINITY), but (* #c(1.0 0.0) #c(1.0 #.sbext:singlefloatpositiveinfinity)) => #C(#<SINGLEFLOAT quiet NaN> #.SBEXT:SINGLEFLOATPOSITIVE INFINITY) ; or an FP condition. Basically, respecting the spec to the letter here would probably imply generating slower code that sometimes returns NaNs or less precise results, or adding branches to check for zero imaginary parts (and probably other edge cases as noted above) in complex/complex operations (but that's lossy since 0.0 does not necessarily mean 0, etc.). Is it time for a fastmathstyle declaration? Other thoughts? Paul Khuong 