Re: [atlas-devel] -mfpmath=387 (snrm2 miscompiled on x86_64)
Brought to you by:
rwhaley,
tonyc040457
From: Ian M. <ia...@ph...> - 2006-12-13 23:17:46
|
On Wed, 13 Dec 2006, Clint Whaley wrote: > Dmitri, > > OK, here's my bootifool C implementation (calling f77 blas): > I have run this with ref blas and ACML, and that produces: > >max(|C-C'|) = 0.000000e+00 > > While ATLAS gives: > max(|C-C'|) = 7.105427e-15 > > So, similar to what you have (though not exact). ATLAS gets 0 error for > some small problems, but nothing else. I'll have to build an all-SSE > lib and see if the problem goes away (an all-x87 lib could still have > probs, as any store rounds). As I say, it's not immediately obvious to > me why this is happening . . . At least for the ref BLAS, this is fairly clear: for A*A', the matrix elements of C(i,j) are inner_prod(row(i), row(j)), and with no blocking the inner products ought to be commutative, even with FMA, so C(i,j) = C(j,i) exactly. With blocking it could be exactly symmetric too, as long as the partial results are accumulated in the right order. If the pure SSE version has the same problem, then this might be the source. Cheers, Ian |