From: Jochen V. <vo...@se...> - 2004-12-01 18:11:05
|
Hello, I tried to use matplotlib with numarray instead of numeric for the first time, but it seems that it fails for me even for the simplest examples. For example the script from matplotlib.matlab import * from math import sin t=3Darange(0,10,0.1) x=3Dmap(lambda tt:sin(tt), t) plot(t,x) savefig("out.ps") fails during the savefig call with a floating point exception. Unfortunately there is no backtrace printed, so I have no idea where exactly the problem lies. Questions: Is there any easy way to get more information about where the failure happens? Is this specific problem known? All the best, Jochen --=20 http://seehuhn.de/ |
From: Jochen V. <vo...@se...> - 2004-12-01 18:40:04
|
Hello Perry, On Wed, Dec 01, 2004 at 01:28:57PM -0500, Perry Greenfield wrote: > why not "x =3D sin(t)"? That should work. No need to use map or math.sin true, this work as such but the crash is still there. > Does it plot interactively? Do you get any error message or does it > just crash out of python? No, it does not plot interactively either. If I run the script from matplotlib.matlab import * t=3Darange(0,10,0.1) x=3Dsin(t) plot(t,x) print "fisch" show() I get the following output: voss@plonk [~/src/mpl/test] ./test.py --numarray =20 fisch Floating point exception Thank you very much, Jochen --=20 http://seehuhn.de/ |
From: John H. <jdh...@ac...> - 2004-12-01 19:18:36
|
>>>>> "Jochen" == Jochen Voss <vo...@se...> writes: Jochen> I get the following output: Jochen> voss@plonk [~/src/mpl/test] ./test.py --numarray fisch Jochen> Floating point exception I suggest you rm -rf site-packages/matplotlib and site-packages/numarray and your matplotlib build directory. Then do a clean install of both packages, matplotlib cvs and numarray 1.1.1 or cvs. Then try again, being mindful of which backend you are running under with -dSomeBackend and perhaps increasing the verbose level. JDH |
From: Andrew S. <str...@as...> - 2004-12-02 02:38:57
|
Jochen Voss wrote: >I get the following output: > > voss@plonk [~/src/mpl/test] ./test.py --numarray > fisch > Floating point exception > >Thank you very much, >Jochen > > It sounds like you may have hit a nasty glibc bug that caused me much head scratching over the months. Check this thread: http://aspn.activestate.com/ASPN/Mail/Message/numpy-discussion/2207861 Bottom line: numarray does The Right Thing and attempts to set up floating point exception handling, but older versions of glibc (such as that in Debian sarge) have a bug whereby the floating point error bits in the SSE are not properly cleared, leading to a SIGFPE terminating the program the next time the SSE unit is used. One solution: Rebuild glibc with the appropriate patch. |
From: Jochen V. <vo...@se...> - 2004-12-05 17:13:47
|
Hello everybody, On Wed, Dec 01, 2004 at 06:38:47PM -0800, Andrew Straw wrote: > It sounds like you may have hit a nasty glibc bug that caused me much=20 > head scratching over the months. Check this thread: >=20 > http://aspn.activestate.com/ASPN/Mail/Message/numpy-discussion/2207861 >=20 > Bottom line: >=20 > numarray does The Right Thing and attempts to set up floating point=20 > exception handling, but older versions of glibc (such as that in Debian= =20 > sarge) have a bug whereby the floating point error bits in the SSE are=20 > not properly cleared, leading to a SIGFPE terminating the program the=20 > next time the SSE unit is used. Yes, this was the problem. I applied the patch from http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=3D279294 and the problem disappeared. Does this imply that on current Debian/unstable systems matplotlib can only be used with python-numeric and not with python-numarray? By the way: I am still interested in my original question. How would I use a debugger etc. to find the problem myself in such a situation? Many thanks, Jochen --=20 http://seehuhn.de/ |
From: Andrew S. <str...@as...> - 2004-12-06 02:10:48
|
On Dec 5, 2004, at 9:04 AM, Jochen Voss wrote: > Hello everybody, > > On Wed, Dec 01, 2004 at 06:38:47PM -0800, Andrew Straw wrote: >> It sounds like you may have hit a nasty glibc bug that caused me much >> head scratching over the months. Check this thread: >> >> http://aspn.activestate.com/ASPN/Mail/Message/numpy-discussion/2207861 >> >> Bottom line: >> >> numarray does The Right Thing and attempts to set up floating point >> exception handling, but older versions of glibc (such as that in >> Debian >> sarge) have a bug whereby the floating point error bits in the SSE are >> not properly cleared, leading to a SIGFPE terminating the program the >> next time the SSE unit is used. > Yes, this was the problem. I applied the patch from > > http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=279294 > > and the problem disappeared. Does this imply that on current > Debian/unstable systems matplotlib can only be used with > python-numeric and not with python-numarray? You can remove atlas3-sse2 (and perhaps atlas3-sse), and it should run OK. Sorry I didn't remember this earlier. I have to admit that I'm a little disappointed in the speed of this bugfix going into the Debian sources. I'd think that since I submitted a 2 line (1 of which was comments) patch over a month ago, which was copied directly from upstream, this would be about 2 minutes of work for someone... Maybe I should have put it at a higher priority than "Normal". > By the way: I am still interested in my original question. How would > I use a debugger etc. to find the problem myself in such a situation? If you find out, let me know! Seriously, having a process killed by the kernel because of a signal was difficult for me to debug. (Python is supposed to insulate you from this kind of low-level stuff and generally does a fantastic job.) I had no idea where the FPE signal was coming from or why. I first came across this in the context of a numarray/Intel IPP program. Because IPP is closed source, I didn't go very far initially, and just converted my program to Numeric. Then, I encountered the same thing purely within numarray and knew it was within my grasp. By making a minimal program that exhibited the bug, I determined that the floating point error checking and setting code executed on import of numarray.ieeespecial would cause a floating point error (SIGFPE) in later matrix calls (e.g. numarray.linear_algebra.singular_value_decomposition). I began to suspect the SSE unit because this code ran fine when I compiled numarray with its built-in lapack_lite. Strangely, running python from within gdb did not terminate or indicate that a SIGFPE was raised. This I'd like to understand a bit better... Cheers! Andrew |
From: Jochen V. <vo...@se...> - 2004-12-06 18:43:28
|
Hello Andrew, On Sun, Dec 05, 2004 at 06:10:47PM -0800, Andrew Straw wrote: > I have to admit that I'm a little disappointed in the speed of this=20 > bugfix going into the Debian sources. I'd think that since I submitted= =20 > a 2 line (1 of which was comments) patch over a month ago, which was=20 > copied directly from upstream, this would be about 2 minutes of work=20 > for someone... Yes, this can be a pain. I think the first thing to do is to add more information to the bug report log. I guess the the Debian libc-maintainers are short of time and have problems to easily see whether the bug is actually a bug and the fix is actually a fix. I will try to add more information to the bug report log. Maybe this helps the patch being applied. All the best, Jochen --=20 http://seehuhn.de/ |
From: Jochen V. <vo...@se...> - 2004-12-06 18:53:36
|
Hello Andrew, On Mon, Dec 06, 2004 at 06:42:03PM +0000, Jochen Voss wrote: > I will try to add more information to the bug report log. > Maybe this helps the patch being applied. It turns out that I do not understand enough of this to produce an illustrative example. Do you have a minimal C program which terminates with SIGFPE because of this bug where it shouldn't? All the best, Jochen --=20 http://seehuhn.de/ |
From: Perry G. <pe...@st...> - 2004-12-06 20:33:57
|
On Dec 5, 2004, at 9:10 PM, Andrew Straw wrote: > > On Dec 5, 2004, at 9:04 AM, Jochen Voss wrote: > >> Hello everybody, >> >> On Wed, Dec 01, 2004 at 06:38:47PM -0800, Andrew Straw wrote: >>> It sounds like you may have hit a nasty glibc bug that caused me much >>> head scratching over the months. Check this thread: >>> >>> http://aspn.activestate.com/ASPN/Mail/Message/numpy-discussion/ >>> 2207861 >>> >>> Bottom line: >>> >>> numarray does The Right Thing and attempts to set up floating point >>> exception handling, but older versions of glibc (such as that in >>> Debian >>> sarge) have a bug whereby the floating point error bits in the SSE >>> are >>> not properly cleared, leading to a SIGFPE terminating the program >>> the >>> next time the SSE unit is used. >> Yes, this was the problem. I applied the patch from >> > I really appreciate Andrew's diagnosing the original problem and particularly in recognizing it as possibility here. This is a nasty kind of bug to figure out. >> By the way: I am still interested in my original question. How would >> I use a debugger etc. to find the problem myself in such a situation? > > If you find out, let me know! Seriously, having a process killed by > the kernel Us too! Perry |
From: Steve C. <ste...@ya...> - 2004-12-07 02:51:23
|
On Mon, 2004-12-06 at 15:34 -0500, Perry Greenfield wrote: > I really appreciate Andrew's diagnosing the original problem and > particularly in recognizing it as possibility here. This is a nasty > kind of bug to figure out. The original bug was reported to numarray developers via the SourceForge bug tracking system back in February, the glibc patch was also applied in February. From Numarray 1.0 onwards a 'Special Note' has been included in the file numarray/Doc/Install.txt referencing the problem. I believe the SourceForge bug report was the one 870660 Numarray: CFLAGS build problem yet for some reason I can't locate it anymore. Perhaps thats one of the reasons that the problem keeps getting rediscovered. This is the glibc bug report http://sources.redhat.com/bugzilla/show_bug.cgi?id=10 Steve |
From: Andrew S. <str...@as...> - 2004-12-07 07:07:29
|
On Dec 6, 2004, at 6:52 PM, Steve Chaplin wrote: > On Mon, 2004-12-06 at 15:34 -0500, Perry Greenfield wrote: >> I really appreciate Andrew's diagnosing the original problem and >> particularly in recognizing it as possibility here. This is a nasty >> kind of bug to figure out. > The original bug was reported to numarray developers Probably by the too-modest Steve Chaplin, I suspect. I forgot in my previous email that a significant component of my late-phase debugging consisted of emailing the numarray list, and getting an email from Steven Chaplin, who had independently diagnosed the problem. He had already gone much further than I -- he's the one who submitted the bug report and patch to the glibc itself: > This is the glibc bug report > http://sources.redhat.com/bugzilla/show_bug.cgi?id=10 Jochen, that bug report contains a C program which replicates the bug. Perhaps you could send that test program to the Debian bug tracking system to spur patching? (There is an additional comment on the glibc bugzilla page saying "The test program isn't really testing what it is supposed to (the SSE status is never touched) but the SSE control change is indeed wrong." You may want to address this first if you're up for this kind of low-level fun.) To summarize, we owe a big thanks to Steve Chaplin. A heartfelt thanks, Steve! Cheers! Andrew |
From: Perry G. <pe...@st...> - 2004-12-07 16:27:09
|
On Dec 7, 2004, at 2:07 AM, Andrew Straw wrote: > On Dec 6, 2004, at 6:52 PM, Steve Chaplin wrote: > >> On Mon, 2004-12-06 at 15:34 -0500, Perry Greenfield wrote: >>> I really appreciate Andrew's diagnosing the original problem and >>> particularly in recognizing it as possibility here. This is a nasty >>> kind of bug to figure out. >> The original bug was reported to numarray developers > > Probably by the too-modest Steve Chaplin, I suspect. I forgot in my > previous email that a significant component of my late-phase debugging > consisted of emailing the numarray list, and getting an email from > Steven Chaplin, who had independently diagnosed the problem. He had > already gone much further than I -- he's the one who submitted the bug > report and patch to the glibc itself: Sorry about that. I should have also thanked Steve for doing the hard part. |
From: Todd M. <jm...@st...> - 2004-12-07 11:28:23
|
On Tue, 2004-12-07 at 10:52 +0800, Steve Chaplin wrote: > On Mon, 2004-12-06 at 15:34 -0500, Perry Greenfield wrote: > > I really appreciate Andrew's diagnosing the original problem and > > particularly in recognizing it as possibility here. This is a nasty > > kind of bug to figure out. > The original bug was reported to numarray developers via the SourceForge > bug tracking system back in February, the glibc patch was also applied > in February. From Numarray 1.0 onwards a 'Special Note' has been > included in the file numarray/Doc/Install.txt referencing the problem. > > I believe the SourceForge bug report was the one > 870660 Numarray: CFLAGS build problem > yet for some reason I can't locate it anymore. Here's the "numarray bugs tracker" link for this report: http://sourceforge.net/tracker/index.php?func=detail&aid=870660&group_id=1369&atid=450446 My guess is that you were looking in the "numpy bugs tracker" where the bug was originally filed but which is supposed to be for Numeric: http://sourceforge.net/tracker/?group_id=1369&atid=101369 Numarray bugs which are filed in the numpy bugs tracker are moved to the numarray bugs tracker. They're both on the same SF project, but the numarray tracker is more hidden. I'm sorry this is confusing. Regards, Todd |
From: Steve C. <ste...@ya...> - 2004-12-06 09:34:43
|
> By the way: I am still interested in my original question. How would > I use a debugger etc. to find the problem myself in such a situation? I should know the answer because I created the glibc patch that fixes the problem, but it was back in February and I can't remember all the details. It started when I thought, why should I run my AthlonXP in '386 emulator mode' when I can use 'gcc -march=athlon-xp' and actually benefit from the extra instructions my processor supports. This worked fine until I compiled numarray and it failed its own tests with a floating-point exception. But if I used the default gcc settings it worked OK. I filed a numarray bug report (which I can no longer locate, perhaps they get deleted after a certain date), they looked at it and said it was probably a gcc bug. I filed a gcc bug report, and they closed it saying it was not a gcc bug. Then I thought it might be a bug with the way kernel handles FP exceptions and started looking through the kernel sources, but did not make much progress. So I went back to the numarray source code and tried no narrow down where the problem was occurring. Now to answer your question: Consider you are on a TV game show where you have to guess a number x in the range 1 to y and are told 'higher', 'lower' or 'correct' after each turn. You can use a binary search and always guess the mid point of the range - you are either correct or eliminate half of the possibilities each turn, so in ceil log(y, 2) turns or less you locate the correct number. You can use a similar kind of binary search to locate bugs in software. You know the bug occurs on some line x of the source code with y lines. Use gdb and insert breakpoints in the code (I think I just inserted printf() statements instead of using gdb) and see if the error occurs before or after the breakpoint, move the breakpoint and try again. The problem is that source code is rarely a linear list of statements in one file that are executed in order, but a set of procedures/functions in many files where the execution order can vary. You can start at the main () function, split it in half and insert a breakpoint (or printf()) run it and see in which half the error occurs, repeat the process working your way down into other functions until you pinpoint the error. Hope that makes sense. You could now reinstall the old glibc, forget that you know that glibc is the problem and start again to locate the bug, it will be useful practise for the next bug that comes along! Steve |