From: Hazen B. <hba...@ma...> - 2014-04-06 21:35:50
|
Hello, In the process of converting some of the PLplot demos to Lisp as part of the cl-plplot project I found that example 29 triggers a floating point exception. This is not something that a C compiler will normally report, but you can make it (or at least gcc) more strict about this. I believe that the problem occurs in the plP_wcpcx() function in src/plcvt.c, where we attempt to convert -3.917103e+15 to an integer resulting in an overflow error. The easiest way to see this is to pull my fe_demo branch from here: https://github.com/HazenBabcock/PLplot Or you can look at the two files that I modified to demonstrate this issue: https://github.com/HazenBabcock/PLplot/blob/fe_demo/examples/c/x29c.c https://github.com/HazenBabcock/PLplot/blob/fe_demo/src/plcvt.c -Hazen |
From: Alan W. I. <ir...@be...> - 2014-04-07 02:17:00
|
On 2014-04-06 17:35-0400 Hazen Babcock wrote: > > Hello, > > In the process of converting some of the PLplot demos to Lisp as part of > the cl-plplot project I found that example 29 triggers a floating point > exception. This is not something that a C compiler will normally report, > but you can make it (or at least gcc) more strict about this. I believe > that the problem occurs in the plP_wcpcx() function in src/plcvt.c, > where we attempt to convert -3.917103e+15 to an integer resulting in an > overflow error. > > The easiest way to see this is to pull my fe_demo branch from here: > https://github.com/HazenBabcock/PLplot > > Or you can look at the two files that I modified to demonstrate this issue: > https://github.com/HazenBabcock/PLplot/blob/fe_demo/examples/c/x29c.c > https://github.com/HazenBabcock/PLplot/blob/fe_demo/src/plcvt.c > Hi Hazen: Obviously, the attempted conversion should generate an integer overflow, but I frankly don't understand why it generates a _floating-point_ exception since clearly -3.917103e+15 is a valid floating-point number. Do you have some mental model for why there was a generated floating-point exception in this case? That question is just to satisfy my curiosity about how you found the issue, and the much more important question is why in the world numbers like -3.917103e+15 are being generated by example 29? One possibility is some variable is unintialized so -3.917103e+15 is just random numerical garbage, but I checked that possibility with valgrind and got the following absolutely clean result: ==19527== Memcheck, a memory error detector ==19527== Copyright (C) 2002-2011, and GNU GPL'd, by Julian Seward et al. ==19527== Using Valgrind-3.7.0 and LibVEX; rerun with -h for copyright info ==19527== Command: examples/c/x29c -dev psc -o test.psc ==19527== ==19527== ==19527== HEAP SUMMARY: ==19527== in use at exit: 0 bytes in 0 blocks ==19527== total heap usage: 2,481 allocs, 2,481 frees, 351,494 bytes allocated ==19527== ==19527== All heap blocks were freed -- no leaks are possible ==19527== ==19527== For counts of detected and suppressed errors, rerun with: -v ==19527== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 4 from 4) plcvt.c contains code for converting between various plot coordinate systems. Please indicate what device you are using since that should affect some of those transformations. plP_wcpcx converts world coordinates to physical device coordinate using the PLINT value of the following transformation plsc->wpxoff + plsc->wpxscl * x where all those values being combined together are PLFLT. It is hard to figure out how an integer physical device coordinate (e.g., pixels) could correspond to -3.917103e+15 so despite the good valgrind result I am still wondering whether you are dealing with numerical garbage of some kind. Alan __________________________ Alan W. Irwin Astronomical research affiliation with Department of Physics and Astronomy, University of Victoria (astrowww.phys.uvic.ca). Programming affiliations with the FreeEOS equation-of-state implementation for stellar interiors (freeeos.sf.net); the Time Ephemerides project (timeephem.sf.net); PLplot scientific plotting software package (plplot.sf.net); the libLASi project (unifont.org/lasi); the Loads of Linux Links project (loll.sf.net); and the Linux Brochure Project (lbproject.sf.net). __________________________ Linux-powered Science __________________________ |
From: Hazen B. <hba...@ma...> - 2014-04-07 18:30:51
|
On 4/6/2014 10:16 PM, Alan W. Irwin wrote: > On 2014-04-06 17:35-0400 Hazen Babcock wrote: >> In the process of converting some of the PLplot demos to Lisp as part of >> the cl-plplot project I found that example 29 triggers a floating point >> exception. This is not something that a C compiler will normally report, >> but you can make it (or at least gcc) more strict about this. I believe >> that the problem occurs in the plP_wcpcx() function in src/plcvt.c, >> where we attempt to convert -3.917103e+15 to an integer resulting in an >> overflow error. > > Obviously, the attempted conversion should generate an integer > overflow, but I frankly don't understand why it generates a > _floating-point_ exception since clearly -3.917103e+15 is a valid > floating-point number. Do you have some mental model for why there > was a generated floating-point exception in this case? Sorry, I did not explain that very well. The exception that is triggered is FE_INVALID, which I think in this case is caused by the floating point number being too large to convert to an integer. > That question is just to satisfy my curiosity about how you found the > issue, and the much more important question is why in the world > numbers like -3.917103e+15 are being generated by example 29? > > One possibility is some variable is unintialized so -3.917103e+15 > is just random numerical garbage, but I checked that possibility > with valgrind and got the following absolutely clean result: > > ==19527== Memcheck, a memory error detector > ==19527== Copyright (C) 2002-2011, and GNU GPL'd, by Julian Seward et > al. > ==19527== Using Valgrind-3.7.0 and LibVEX; rerun with -h for copyright > info > ==19527== Command: examples/c/x29c -dev psc -o test.psc > ==19527== ==19527== ==19527== HEAP SUMMARY: > ==19527== in use at exit: 0 bytes in 0 blocks > ==19527== total heap usage: 2,481 allocs, 2,481 frees, 351,494 bytes > allocated > ==19527== ==19527== All heap blocks were freed -- no leaks are possible > ==19527== ==19527== For counts of detected and suppressed errors, rerun > with: -v > ==19527== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 4 from 4) > > plcvt.c contains code for converting between various plot coordinate > systems. Please indicate what device you are using since that > should affect some of those transformations. I tested the xwin, xcairo and qtwidget drivers. They all give slightly different numbers but they are all around -4.0e15. > plP_wcpcx converts world coordinates to physical device > coordinate using the PLINT value of the following transformation > > plsc->wpxoff + plsc->wpxscl * x The exception is triggered by the value of plsc->wpxoff, which is ~4.0e15. However plsc->wpxscl is also suspiciously large at ~2.0e12, but it gets multiplied by x which is 0.0 in the call that causes the exception. > where all those values being combined together are PLFLT. > > It is hard to figure out how an integer physical device coordinate (e.g., > pixels) could correspond to -3.917103e+15 so despite the good valgrind > result I am still wondering whether you are dealing with numerical > garbage of some kind. It looks that way, but I'm not having much luck figuring out where this garbage is coming from. The crash occurs on the call to plbox() when creating page 5, which is the first page of this example that just shows a single step. -Hazen |
From: Alan W. I. <ir...@be...> - 2014-04-07 19:00:08
|
On 2014-04-07 14:30-0400 Hazen Babcock wrote: > On 4/6/2014 10:16 PM, Alan W. Irwin wrote: >> On 2014-04-06 17:35-0400 Hazen Babcock wrote: >>> In the process of converting some of the PLplot demos to Lisp as part of >>> the cl-plplot project I found that example 29 triggers a floating point >>> exception. This is not something that a C compiler will normally report, >>> but you can make it (or at least gcc) more strict about this. I believe >>> that the problem occurs in the plP_wcpcx() function in src/plcvt.c, >>> where we attempt to convert -3.917103e+15 to an integer resulting in an >>> overflow error. >> >> Obviously, the attempted conversion should generate an integer >> overflow, but I frankly don't understand why it generates a >> _floating-point_ exception since clearly -3.917103e+15 is a valid >> floating-point number. Do you have some mental model for why there >> was a generated floating-point exception in this case? > > Sorry, I did not explain that very well. The exception that is triggered is > FE_INVALID, which I think in this case is caused by the floating point number > being too large to convert to an integer. Interesting. I am surprised such conversion would cause such an exception, but I indeed verify (with your modified x29c.c) that exception occurs on my platform as well, and I am glad it is possible (at least in this case) to detect integer overflows this way. More importantly this verification puts me in position to follow up on this bug which I will do since the time transformations used in example 29 are largely my responsibility. > >> That question is just to satisfy my curiosity about how you found the >> issue, and the much more important question is why in the world >> numbers like -3.917103e+15 are being generated by example 29? >> >> One possibility is some variable is unintialized so -3.917103e+15 >> is just random numerical garbage, but I checked that possibility >> with valgrind and got the following absolutely clean result: >> >> ==19527== Memcheck, a memory error detector >> ==19527== Copyright (C) 2002-2011, and GNU GPL'd, by Julian Seward et >> al. >> ==19527== Using Valgrind-3.7.0 and LibVEX; rerun with -h for copyright >> info >> ==19527== Command: examples/c/x29c -dev psc -o test.psc >> ==19527== ==19527== ==19527== HEAP SUMMARY: >> ==19527== in use at exit: 0 bytes in 0 blocks >> ==19527== total heap usage: 2,481 allocs, 2,481 frees, 351,494 bytes >> allocated >> ==19527== ==19527== All heap blocks were freed -- no leaks are possible >> ==19527== ==19527== For counts of detected and suppressed errors, rerun >> with: -v >> ==19527== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 4 from 4) >> >> plcvt.c contains code for converting between various plot coordinate >> systems. Please indicate what device you are using since that >> should affect some of those transformations. > > I tested the xwin, xcairo and qtwidget drivers. They all give slightly > different numbers but they are all around -4.0e15. > >> plP_wcpcx converts world coordinates to physical device >> coordinate using the PLINT value of the following transformation >> >> plsc->wpxoff + plsc->wpxscl * x > > The exception is triggered by the value of plsc->wpxoff, which is ~4.0e15. > However plsc->wpxscl is also suspiciously large at ~2.0e12, but it gets > multiplied by x which is 0.0 in the call that causes the exception. > >> where all those values being combined together are PLFLT. >> >> It is hard to figure out how an integer physical device coordinate (e.g., >> pixels) could correspond to -3.917103e+15 so despite the good valgrind >> result I am still wondering whether you are dealing with numerical >> garbage of some kind. > > It looks that way, but I'm not having much luck figuring out where this > garbage is coming from. The crash occurs on the call to plbox() when creating > page 5, which is the first page of this example that just shows a single > step. Time transformations such as occur for example 29 are notorious for significance loss issues for badly chosen epochs. For example, if the epoch chosen is the zero of Julian dates, then the number of seconds since that epoch is roughly 2.e11. I am pretty sure the default epoch chosen for the PLplot time transformations is considerably better than that bad choice, but I may have missed something when I set that up so I will take a look. Alan __________________________ Alan W. Irwin Astronomical research affiliation with Department of Physics and Astronomy, University of Victoria (astrowww.phys.uvic.ca). Programming affiliations with the FreeEOS equation-of-state implementation for stellar interiors (freeeos.sf.net); the Time Ephemerides project (timeephem.sf.net); PLplot scientific plotting software package (plplot.sf.net); the libLASi project (unifont.org/lasi); the Loads of Linux Links project (loll.sf.net); and the Linux Brochure Project (lbproject.sf.net). __________________________ Linux-powered Science __________________________ |
From: Alan W. I. <ir...@be...> - 2014-04-07 22:00:17
|
On 2014-04-07 12:00-0700 Alan W. Irwin wrote: > [...]I will take a look. The GNU debugger (gdb) made this a really easy issue to solve. It turned out the issue was an integer overflow in an unused (in the plbox case) calculation of physical coordinates corresponding to the default 0., 0., world coordinates of the unused axis origin which was very far from the actual range of world coordinates used in the plot for example 29 page 5 and above. I eliminated this issue (as of revision 13099) by making sure the physical coordinates of the axis origin were only calculated when needed (i.e., the "a" option was specified and the corresponding axis line was somewhere in the viewport). This tightening of the logic makes the physical coordinate of the axis line a reasonable integer number which by definition cannot overflow. Thanks, Hazen, for spotting this long-standing bug which would have occurred in the past for any plbox call where the range of data was relatively far from zero so the calculated (but unused) axis line was very far outside the actual viewport. We were "lucky" that example 29 provided such a plbox call so we could find and fix this bug. Alan __________________________ Alan W. Irwin Astronomical research affiliation with Department of Physics and Astronomy, University of Victoria (astrowww.phys.uvic.ca). Programming affiliations with the FreeEOS equation-of-state implementation for stellar interiors (freeeos.sf.net); the Time Ephemerides project (timeephem.sf.net); PLplot scientific plotting software package (plplot.sf.net); the libLASi project (unifont.org/lasi); the Loads of Linux Links project (loll.sf.net); and the Linux Brochure Project (lbproject.sf.net). __________________________ Linux-powered Science __________________________ |
From: Alan W. I. <ir...@be...> - 2014-04-08 03:08:26
|
Hi Hazen: I have just had a further idea. For comprehensive testing situations it might be good to have the option to call feenableexcept (the C library function that you used to help debug x29c.c that should be available for c99 according to its Linux man page) from within the PLplot library (say as the result of plinit). If you agree (a) that idea would work and (b) it would be useful, would you be willing to implement it in C for the case when the PLPLOT_ENABLE_FLOAT_EXCEPT C macro is #defined? If so, I would be willing to do the rest on the CMake side (create a CMake option for this and propagate it to the corresponding C macro for the compilation of the source file where you have implemented the feenableexcept call). Alan __________________________ Alan W. Irwin Astronomical research affiliation with Department of Physics and Astronomy, University of Victoria (astrowww.phys.uvic.ca). Programming affiliations with the FreeEOS equation-of-state implementation for stellar interiors (freeeos.sf.net); the Time Ephemerides project (timeephem.sf.net); PLplot scientific plotting software package (plplot.sf.net); the libLASi project (unifont.org/lasi); the Loads of Linux Links project (loll.sf.net); and the Linux Brochure Project (lbproject.sf.net). __________________________ Linux-powered Science __________________________ |
From: Hazen B. <hba...@ma...> - 2014-04-15 11:44:53
|
On 4/7/2014 11:08 PM, Alan W. Irwin wrote: > I have just had a further idea. For comprehensive testing situations > it might be good to have the option to call feenableexcept (the C > library function that you used to help debug x29c.c that should be > available for c99 according to its Linux man page) from within the > PLplot library (say as the result of plinit). If you agree (a) that > idea would work and (b) it would be useful, would you be willing to > implement it in C for the case when the PLPLOT_ENABLE_FLOAT_EXCEPT C > macro is #defined? If so, I would be willing to do the rest on the > CMake side (create a CMake option for this and propagate it to the > corresponding C macro for the compilation of the source file where you > have implemented the feenableexcept call). I have tested the idea and it works. I'm not sure about the utility though. Based on my tests it already caught the only error in the examples that it is going to catch. After the git transition I can create a branch that will catch floating point exceptions and you can merge it (or not) into the master branch based on your feeling about it's utility. best, -Hazen |
From: Hazen B. <hba...@ma...> - 2015-05-11 21:23:31
|
On 04/15/2014 07:44 AM, Hazen Babcock wrote: > On 4/7/2014 11:08 PM, Alan W. Irwin wrote: >> I have just had a further idea. For comprehensive testing situations >> it might be good to have the option to call feenableexcept (the C >> library function that you used to help debug x29c.c that should be >> available for c99 according to its Linux man page) from within the >> PLplot library (say as the result of plinit). If you agree (a) that >> idea would work and (b) it would be useful, would you be willing to >> implement it in C for the case when the PLPLOT_ENABLE_FLOAT_EXCEPT C >> macro is #defined? If so, I would be willing to do the rest on the >> CMake side (create a CMake option for this and propagate it to the >> corresponding C macro for the compilation of the source file where you >> have implemented the feenableexcept call). > > I have tested the idea and it works. I'm not sure about the utility > though. Based on my tests it already caught the only error in the > examples that it is going to catch. After the git transition I can > create a branch that will catch floating point exceptions and you can > merge it (or not) into the master branch based on your feeling about > it's utility. Much time passes during which I never implement this branch as promised, and it looks like some more floating point exceptions have crept in. More specifically I'm seeing floating point exceptions in x25, x30 and x33. The problem seems to be with plgradient() when it uses a software fallback gradient (I was testing with the xwin driver). You can pull a branch with floating point exception trapping enabled for these examples here: https://github.com/HazenBabcock/PLplot/tree/fpe_x25_x30_x33 The problem is occurring in notcrossed() in plfill.c at line 2040 when converting from a PLFLT to a PLINT. It looks like fxintersect and fyintersect can take on values that are too large to be converted to integers. Checking for this fixes the problem, but maybe this is indicating some problem upstream? -Hazen |
From: Alan W. I. <ir...@be...> - 2015-05-11 22:05:26
|
On 2015-05-11 16:23-0400 Hazen Babcock wrote: > On 04/15/2014 07:44 AM, Hazen Babcock wrote: >> On 4/7/2014 11:08 PM, Alan W. Irwin wrote: >>> I have just had a further idea. For comprehensive testing situations >>> it might be good to have the option to call feenableexcept (the C >>> library function that you used to help debug x29c.c that should be >>> available for c99 according to its Linux man page) from within the >>> PLplot library (say as the result of plinit). If you agree (a) that >>> idea would work and (b) it would be useful, would you be willing to >>> implement it in C for the case when the PLPLOT_ENABLE_FLOAT_EXCEPT C >>> macro is #defined? If so, I would be willing to do the rest on the >>> CMake side (create a CMake option for this and propagate it to the >>> corresponding C macro for the compilation of the source file where you >>> have implemented the feenableexcept call). >> >> I have tested the idea and it works. I'm not sure about the utility >> though. Based on my tests it already caught the only error in the >> examples that it is going to catch. After the git transition I can >> create a branch that will catch floating point exceptions and you can >> merge it (or not) into the master branch based on your feeling about >> it's utility. > > Much time passes during which I never implement this branch as promised, and > it looks like some more floating point exceptions have crept in. More > specifically I'm seeing floating point exceptions in x25, x30 and x33. The > problem seems to be with plgradient() when it uses a software fallback > gradient (I was testing with the xwin driver). > > You can pull a branch with floating point exception trapping enabled for > these examples here: > https://github.com/HazenBabcock/PLplot/tree/fpe_x25_x30_x33 > > The problem is occurring in notcrossed() in plfill.c at line 2040 when > converting from a PLFLT to a PLINT. It looks like fxintersect and fyintersect > can take on values that are too large to be converted to integers. Checking > for this fixes the problem, but maybe this is indicating some problem > upstream? Hi Hazen: I am glad to hear you have created a topic branch that catches floating point exceptions. However, I am not keen on pulling a PLplot topic branch from github for the reasons discussed in README.developers. So could you either share your topic branch using "git format-patch" or else just push it to our official repo yourself? The former is preferred if the work is incomplete (i.e., does not include a cmake option to control when the PLPLOT_ENABLE_FLOAT_EXCEPT macro is #defined). Furthermore, if the work is incomplete, I can finish the cmake aspects of it as promised above and amend your commit accordingly. Once we have implemented an option to check for floating-point exceptions, then that means any of us should be able to confirm the ones you have found and find others in the future due to the continued evolution of our examples. And, of course, this option should allow us to fix those floating-point exceptions as time permits. Alan __________________________ Alan W. Irwin Astronomical research affiliation with Department of Physics and Astronomy, University of Victoria (astrowww.phys.uvic.ca). Programming affiliations with the FreeEOS equation-of-state implementation for stellar interiors (freeeos.sf.net); the Time Ephemerides project (timeephem.sf.net); PLplot scientific plotting software package (plplot.sf.net); the libLASi project (unifont.org/lasi); the Loads of Linux Links project (loll.sf.net); and the Linux Brochure Project (lbproject.sf.net). __________________________ Linux-powered Science __________________________ |
From: Hazen B. <hba...@ma...> - 2015-05-12 13:17:46
Attachments:
CMakeCache.txt
|
On 05/11/2015 06:05 PM, Alan W. Irwin wrote: > On 2015-05-11 16:23-0400 Hazen Babcock wrote: >> >> Much time passes during which I never implement this branch as >> promised, and it looks like some more floating point exceptions have >> crept in. More specifically I'm seeing floating point exceptions in >> x25, x30 and x33. The problem seems to be with plgradient() when it >> uses a software fallback gradient (I was testing with the xwin driver). >> >> You can pull a branch with floating point exception trapping enabled >> for these examples here: >> https://github.com/HazenBabcock/PLplot/tree/fpe_x25_x30_x33 >> >> The problem is occurring in notcrossed() in plfill.c at line 2040 when >> converting from a PLFLT to a PLINT. It looks like fxintersect and >> fyintersect can take on values that are too large to be converted to >> integers. Checking for this fixes the problem, but maybe this is >> indicating some problem upstream? > > Hi Hazen: > > I am glad to hear you have created a topic branch that catches > floating point exceptions. However, I am not keen on pulling a PLplot > topic branch from github for the reasons discussed in > README.developers. So could you either share your topic branch using > "git format-patch" or else just push it to our official repo yourself? > The former is preferred if the work is incomplete (i.e., does not > include a cmake option to control when the PLPLOT_ENABLE_FLOAT_EXCEPT > macro is #defined). Furthermore, if the work is incomplete, I can > finish the cmake aspects of it as promised above and amend your commit > accordingly. > > Once we have implemented an option to check for floating-point > exceptions, then that means any of us should be able to confirm the > ones you have found and find others in the future due to the continued > evolution of our examples. And, of course, this option should allow > us to fix those floating-point exceptions as time permits. Hi Alan, Sorry, that branch was not meant for incorporation into PLplot which is why I did not follow accepted protocols. I thought it might make it easier for others to see the floating point exception problem for these 3 examples. I think we can enable floating point exception trapping more easily using the -ffpe-trap option provided by gfortran. This won't require us to mark up all of our C examples, though we might instead need to check that the fortran compiler is gfortran? https://gcc.gnu.org/onlinedocs/gfortran/Debugging-Options.html -ffpe-trap=invalid,zero,overflow,underflow Compiling PLplot with this fortran flag (CMAKE_Fortran_FLAGS) "worked" for me in that it core dumps on the fortran equivalents of the C examples I mentioned above. I've attached my CMakeCache.txt file. -Hazen |