Re: [matplotlib-devel] debugging python code

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

On Dec 5, 2004, at 9:04 AM, Jochen Voss wrote:

> Hello everybody,
>
> On Wed, Dec 01, 2004 at 06:38:47PM -0800, Andrew Straw wrote:
>> It sounds like you may have hit a nasty glibc bug that caused me much
>> head scratching over the months.  Check this thread:
>>
>> http://aspn.activestate.com/ASPN/Mail/Message/numpy-discussion/2207861
>>
>> Bottom line:
>>
>> numarray does The Right Thing and attempts to set up floating point
>> exception handling, but older versions of glibc (such as that in 
>> Debian
>> sarge) have a bug whereby the floating point error bits in the SSE are
>> not  properly cleared, leading to a SIGFPE terminating the program the
>> next time the SSE unit is used.
> Yes, this was the problem.  I applied the patch from
>
>     http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=279294
>
> and the problem disappeared.  Does this imply that on current
> Debian/unstable systems matplotlib can only be used with
> python-numeric and not with python-numarray?

You can remove atlas3-sse2 (and perhaps atlas3-sse), and it should run 
OK.  Sorry I didn't remember this earlier.

I have to admit that I'm a little disappointed in the speed of this 
bugfix going into the Debian sources.  I'd think that since I submitted 
a 2 line (1 of which was comments) patch over a month ago, which was 
copied directly from upstream, this would be about 2 minutes of work 
for someone...  Maybe I should have put it at a higher priority than 
"Normal".

> By the way: I am still interested in my original question.  How would
> I use a debugger etc. to find the problem myself in such a situation?

If you find out, let me know!  Seriously, having a process killed by 
the kernel because of a signal was difficult for me to debug. (Python 
is supposed to insulate you from this kind of low-level stuff and 
generally does a fantastic job.) I had no idea where the FPE signal was 
coming from or why. I first came across this in the context of a 
numarray/Intel IPP program.  Because  IPP is closed source, I didn't go 
very far initially, and just converted my program to Numeric. Then, I 
encountered the same thing purely within numarray and knew it was 
within my grasp.  By making a minimal program that exhibited the bug, I 
determined that the floating point error checking and setting code 
executed on import of numarray.ieeespecial would cause a floating point 
error (SIGFPE) in later matrix calls (e.g. 
numarray.linear_algebra.singular_value_decomposition).  I began to 
suspect the SSE unit because this code ran fine when I compiled 
numarray with its built-in lapack_lite.

Strangely, running python from within gdb did not terminate or indicate 
that a SIGFPE was raised.  This I'd like to understand a bit better...

Cheers!
Andrew