From: Andrew S. <str...@as...> - 2004-12-06 02:10:48
|
On Dec 5, 2004, at 9:04 AM, Jochen Voss wrote: > Hello everybody, > > On Wed, Dec 01, 2004 at 06:38:47PM -0800, Andrew Straw wrote: >> It sounds like you may have hit a nasty glibc bug that caused me much >> head scratching over the months. Check this thread: >> >> http://aspn.activestate.com/ASPN/Mail/Message/numpy-discussion/2207861 >> >> Bottom line: >> >> numarray does The Right Thing and attempts to set up floating point >> exception handling, but older versions of glibc (such as that in >> Debian >> sarge) have a bug whereby the floating point error bits in the SSE are >> not properly cleared, leading to a SIGFPE terminating the program the >> next time the SSE unit is used. > Yes, this was the problem. I applied the patch from > > http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=279294 > > and the problem disappeared. Does this imply that on current > Debian/unstable systems matplotlib can only be used with > python-numeric and not with python-numarray? You can remove atlas3-sse2 (and perhaps atlas3-sse), and it should run OK. Sorry I didn't remember this earlier. I have to admit that I'm a little disappointed in the speed of this bugfix going into the Debian sources. I'd think that since I submitted a 2 line (1 of which was comments) patch over a month ago, which was copied directly from upstream, this would be about 2 minutes of work for someone... Maybe I should have put it at a higher priority than "Normal". > By the way: I am still interested in my original question. How would > I use a debugger etc. to find the problem myself in such a situation? If you find out, let me know! Seriously, having a process killed by the kernel because of a signal was difficult for me to debug. (Python is supposed to insulate you from this kind of low-level stuff and generally does a fantastic job.) I had no idea where the FPE signal was coming from or why. I first came across this in the context of a numarray/Intel IPP program. Because IPP is closed source, I didn't go very far initially, and just converted my program to Numeric. Then, I encountered the same thing purely within numarray and knew it was within my grasp. By making a minimal program that exhibited the bug, I determined that the floating point error checking and setting code executed on import of numarray.ieeespecial would cause a floating point error (SIGFPE) in later matrix calls (e.g. numarray.linear_algebra.singular_value_decomposition). I began to suspect the SSE unit because this code ran fine when I compiled numarray with its built-in lapack_lite. Strangely, running python from within gdb did not terminate or indicate that a SIGFPE was raised. This I'd like to understand a bit better... Cheers! Andrew |