Thread: [Plplot-devel] floating point exception in example 29

Cross-platform, scientific graphics plotting library

Brought to you by: airwin, andrewross, furnish, hbabcock

plplot-devel

[Plplot-devel] floating point exception in example 29

From: Hazen B. <hba...@ma...> - 2014-04-06 21:35:50

Hello,

In the process of converting some of the PLplot demos to Lisp as part of 
the cl-plplot project I found that example 29 triggers a floating point 
exception. This is not something that a C compiler will normally report, 
but you can make it (or at least gcc) more strict about this. I believe 
that the problem occurs in the plP_wcpcx() function in src/plcvt.c, 
where we attempt to convert -3.917103e+15 to an integer resulting in an 
overflow error.

The easiest way to see this is to pull my fe_demo branch from here:
https://github.com/HazenBabcock/PLplot

Or you can look at the two files that I modified to demonstrate this issue:
https://github.com/HazenBabcock/PLplot/blob/fe_demo/examples/c/x29c.c
https://github.com/HazenBabcock/PLplot/blob/fe_demo/src/plcvt.c

-Hazen

Re: [Plplot-devel] floating point exception in example 29

From: Alan W. I. <ir...@be...> - 2014-04-07 02:17:00

On 2014-04-06 17:35-0400 Hazen Babcock wrote:

>
> Hello,
>
> In the process of converting some of the PLplot demos to Lisp as part of
> the cl-plplot project I found that example 29 triggers a floating point
> exception. This is not something that a C compiler will normally report,
> but you can make it (or at least gcc) more strict about this. I believe
> that the problem occurs in the plP_wcpcx() function in src/plcvt.c,
> where we attempt to convert -3.917103e+15 to an integer resulting in an
> overflow error.
>
> The easiest way to see this is to pull my fe_demo branch from here:
> https://github.com/HazenBabcock/PLplot
>
> Or you can look at the two files that I modified to demonstrate this issue:
> https://github.com/HazenBabcock/PLplot/blob/fe_demo/examples/c/x29c.c
> https://github.com/HazenBabcock/PLplot/blob/fe_demo/src/plcvt.c
>

Hi Hazen:

Obviously, the attempted conversion should generate an integer
overflow, but I frankly don't understand why it generates a
_floating-point_ exception since clearly -3.917103e+15 is a valid
floating-point number.  Do you have some mental model for why there
was a generated floating-point exception in this case?

That question is just to satisfy my curiosity about how you found the
issue, and the much more important question is why in the world
numbers like -3.917103e+15 are being generated by example 29?

One possibility is some variable is unintialized so -3.917103e+15
is just random numerical garbage, but I checked that possibility
with valgrind and got the following absolutely clean result:

==19527== Memcheck, a memory error detector
==19527== Copyright (C) 2002-2011, and GNU GPL'd, by Julian Seward et
al.
==19527== Using Valgrind-3.7.0 and LibVEX; rerun with -h for copyright
info
==19527== Command: examples/c/x29c -dev psc -o test.psc
==19527== 
==19527== 
==19527== HEAP SUMMARY:
==19527==     in use at exit: 0 bytes in 0 blocks
==19527==   total heap usage: 2,481 allocs, 2,481 frees, 351,494 bytes
allocated
==19527== 
==19527== All heap blocks were freed -- no leaks are possible
==19527== 
==19527== For counts of detected and suppressed errors, rerun with: -v
==19527== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 4 from 4)

plcvt.c contains code for converting between various plot coordinate
systems.  Please indicate what device you are using since that
should affect some of those transformations.

plP_wcpcx converts world coordinates to physical device
coordinate using the PLINT value of the following transformation

plsc->wpxoff + plsc->wpxscl * x

where all those values being combined together are PLFLT.

It is hard to figure out how an integer physical device coordinate (e.g.,
pixels) could correspond to -3.917103e+15 so despite the good valgrind
result I am still wondering whether you are dealing with numerical
garbage of some kind.

Alan
__________________________
Alan W. Irwin

Astronomical research affiliation with Department of Physics and Astronomy,
University of Victoria (astrowww.phys.uvic.ca).

Programming affiliations with the FreeEOS equation-of-state
implementation for stellar interiors (freeeos.sf.net); the Time
Ephemerides project (timeephem.sf.net); PLplot scientific plotting
software package (plplot.sf.net); the libLASi project
(unifont.org/lasi); the Loads of Linux Links project (loll.sf.net);
and the Linux Brochure Project (lbproject.sf.net).
__________________________

Linux-powered Science
__________________________

Re: [Plplot-devel] floating point exception in example 29

From: Hazen B. <hba...@ma...> - 2014-04-07 18:30:51

On 4/6/2014 10:16 PM, Alan W. Irwin wrote:
> On 2014-04-06 17:35-0400 Hazen Babcock wrote:
>> In the process of converting some of the PLplot demos to Lisp as part of
>> the cl-plplot project I found that example 29 triggers a floating point
>> exception. This is not something that a C compiler will normally report,
>> but you can make it (or at least gcc) more strict about this. I believe
>> that the problem occurs in the plP_wcpcx() function in src/plcvt.c,
>> where we attempt to convert -3.917103e+15 to an integer resulting in an
>> overflow error.
>
> Obviously, the attempted conversion should generate an integer
> overflow, but I frankly don't understand why it generates a
> _floating-point_ exception since clearly -3.917103e+15 is a valid
> floating-point number.  Do you have some mental model for why there
> was a generated floating-point exception in this case?

Sorry, I did not explain that very well. The exception that is triggered 
is FE_INVALID, which I think in this case is caused by the floating 
point number being too large to convert to an integer.

> That question is just to satisfy my curiosity about how you found the
> issue, and the much more important question is why in the world
> numbers like -3.917103e+15 are being generated by example 29?
>
> One possibility is some variable is unintialized so -3.917103e+15
> is just random numerical garbage, but I checked that possibility
> with valgrind and got the following absolutely clean result:
>
> ==19527== Memcheck, a memory error detector
> ==19527== Copyright (C) 2002-2011, and GNU GPL'd, by Julian Seward et
> al.
> ==19527== Using Valgrind-3.7.0 and LibVEX; rerun with -h for copyright
> info
> ==19527== Command: examples/c/x29c -dev psc -o test.psc
> ==19527== ==19527== ==19527== HEAP SUMMARY:
> ==19527==     in use at exit: 0 bytes in 0 blocks
> ==19527==   total heap usage: 2,481 allocs, 2,481 frees, 351,494 bytes
> allocated
> ==19527== ==19527== All heap blocks were freed -- no leaks are possible
> ==19527== ==19527== For counts of detected and suppressed errors, rerun
> with: -v
> ==19527== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 4 from 4)
>
> plcvt.c contains code for converting between various plot coordinate
> systems.  Please indicate what device you are using since that
> should affect some of those transformations.

I tested the xwin, xcairo and qtwidget drivers. They all give slightly 
different numbers but they are all around -4.0e15.

> plP_wcpcx converts world coordinates to physical device
> coordinate using the PLINT value of the following transformation
>
> plsc->wpxoff + plsc->wpxscl * x

The exception is triggered by the value of plsc->wpxoff, which is 
~4.0e15. However plsc->wpxscl is also suspiciously large at ~2.0e12, but 
it gets multiplied by x which is 0.0 in the call that causes the exception.

> where all those values being combined together are PLFLT.
>
> It is hard to figure out how an integer physical device coordinate (e.g.,
> pixels) could correspond to -3.917103e+15 so despite the good valgrind
> result I am still wondering whether you are dealing with numerical
> garbage of some kind.

It looks that way, but I'm not having much luck figuring out where this 
garbage is coming from. The crash occurs on the call to plbox() when 
creating page 5, which is the first page of this example that just shows 
a single step.

-Hazen

Re: [Plplot-devel] floating point exception in example 29

From: Alan W. I. <ir...@be...> - 2014-04-07 19:00:08

On 2014-04-07 14:30-0400 Hazen Babcock wrote:

> On 4/6/2014 10:16 PM, Alan W. Irwin wrote:
>> On 2014-04-06 17:35-0400 Hazen Babcock wrote:
>>> In the process of converting some of the PLplot demos to Lisp as part of
>>> the cl-plplot project I found that example 29 triggers a floating point
>>> exception. This is not something that a C compiler will normally report,
>>> but you can make it (or at least gcc) more strict about this. I believe
>>> that the problem occurs in the plP_wcpcx() function in src/plcvt.c,
>>> where we attempt to convert -3.917103e+15 to an integer resulting in an
>>> overflow error.
>> 
>> Obviously, the attempted conversion should generate an integer
>> overflow, but I frankly don't understand why it generates a
>> _floating-point_ exception since clearly -3.917103e+15 is a valid
>> floating-point number.  Do you have some mental model for why there
>> was a generated floating-point exception in this case?
>
> Sorry, I did not explain that very well. The exception that is triggered is 
> FE_INVALID, which I think in this case is caused by the floating point number 
> being too large to convert to an integer.

Interesting.  I am surprised such conversion would cause such an
exception, but I indeed verify (with your modified x29c.c) that
exception occurs on my platform as well, and I am glad it is possible
(at least in this case) to detect integer overflows this way.  More
importantly this verification puts me in position to follow up on this
bug which I will do since the time transformations used in example 29
are largely my responsibility.

>
>> That question is just to satisfy my curiosity about how you found the
>> issue, and the much more important question is why in the world
>> numbers like -3.917103e+15 are being generated by example 29?
>> 
>> One possibility is some variable is unintialized so -3.917103e+15
>> is just random numerical garbage, but I checked that possibility
>> with valgrind and got the following absolutely clean result:
>> 
>> ==19527== Memcheck, a memory error detector
>> ==19527== Copyright (C) 2002-2011, and GNU GPL'd, by Julian Seward et
>> al.
>> ==19527== Using Valgrind-3.7.0 and LibVEX; rerun with -h for copyright
>> info
>> ==19527== Command: examples/c/x29c -dev psc -o test.psc
>> ==19527== ==19527== ==19527== HEAP SUMMARY:
>> ==19527==     in use at exit: 0 bytes in 0 blocks
>> ==19527==   total heap usage: 2,481 allocs, 2,481 frees, 351,494 bytes
>> allocated
>> ==19527== ==19527== All heap blocks were freed -- no leaks are possible
>> ==19527== ==19527== For counts of detected and suppressed errors, rerun
>> with: -v
>> ==19527== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 4 from 4)
>> 
>> plcvt.c contains code for converting between various plot coordinate
>> systems.  Please indicate what device you are using since that
>> should affect some of those transformations.
>
> I tested the xwin, xcairo and qtwidget drivers. They all give slightly 
> different numbers but they are all around -4.0e15.
>
>> plP_wcpcx converts world coordinates to physical device
>> coordinate using the PLINT value of the following transformation
>> 
>> plsc->wpxoff + plsc->wpxscl * x
>
> The exception is triggered by the value of plsc->wpxoff, which is ~4.0e15. 
> However plsc->wpxscl is also suspiciously large at ~2.0e12, but it gets 
> multiplied by x which is 0.0 in the call that causes the exception.
>
>> where all those values being combined together are PLFLT.
>> 
>> It is hard to figure out how an integer physical device coordinate (e.g.,
>> pixels) could correspond to -3.917103e+15 so despite the good valgrind
>> result I am still wondering whether you are dealing with numerical
>> garbage of some kind.
>
> It looks that way, but I'm not having much luck figuring out where this 
> garbage is coming from. The crash occurs on the call to plbox() when creating 
> page 5, which is the first page of this example that just shows a single 
> step.

Time transformations such as occur for example 29 are notorious for
significance loss issues for badly chosen epochs. For example, if the
epoch chosen is the zero of Julian dates, then the number of seconds
since that epoch is roughly 2.e11.  I am pretty sure the default epoch
chosen for the PLplot time transformations is considerably better than
that bad choice, but I may have missed something when I set that up so
I will take a look.

Alan
__________________________
Alan W. Irwin

Astronomical research affiliation with Department of Physics and Astronomy,
University of Victoria (astrowww.phys.uvic.ca).

Programming affiliations with the FreeEOS equation-of-state
implementation for stellar interiors (freeeos.sf.net); the Time
Ephemerides project (timeephem.sf.net); PLplot scientific plotting
software package (plplot.sf.net); the libLASi project
(unifont.org/lasi); the Loads of Linux Links project (loll.sf.net);
and the Linux Brochure Project (lbproject.sf.net).
__________________________

Linux-powered Science
__________________________

Re: [Plplot-devel] floating point exception in example 29

From: Alan W. I. <ir...@be...> - 2014-04-07 22:00:17

On 2014-04-07 12:00-0700 Alan W. Irwin wrote:

> [...]I will take a look.

The GNU debugger (gdb) made this a really easy issue to solve. It
turned out the issue was an integer overflow in an unused (in the
plbox case) calculation of physical coordinates corresponding to the
default 0., 0., world coordinates of the unused axis origin which was
very far from the actual range of world coordinates used in the plot
for example 29 page 5 and above.

I eliminated this issue (as of revision 13099) by making sure the
physical coordinates of the axis origin were only calculated when
needed (i.e., the "a" option was specified and the corresponding axis
line was somewhere in the viewport).  This tightening of the logic
makes the physical coordinate of the axis line a reasonable integer
number which by definition cannot overflow.

Thanks, Hazen, for spotting this long-standing bug which would have
occurred in the past for any plbox call where the range of data was
relatively far from zero so the calculated (but unused) axis line was
very far outside the actual viewport.  We were "lucky" that example 29
provided such a plbox call so we could find and fix this bug.

Alan
__________________________
Alan W. Irwin

Astronomical research affiliation with Department of Physics and Astronomy,
University of Victoria (astrowww.phys.uvic.ca).

Programming affiliations with the FreeEOS equation-of-state
implementation for stellar interiors (freeeos.sf.net); the Time
Ephemerides project (timeephem.sf.net); PLplot scientific plotting
software package (plplot.sf.net); the libLASi project
(unifont.org/lasi); the Loads of Linux Links project (loll.sf.net);
and the Linux Brochure Project (lbproject.sf.net).
__________________________

Linux-powered Science
__________________________

Re: [Plplot-devel] floating point exception in example 29

From: Alan W. I. <ir...@be...> - 2014-04-08 03:08:26

Hi Hazen:

I have just had a further idea. For comprehensive testing situations
it might be good to have the option to call feenableexcept (the C
library function that you used to help debug x29c.c that should be
available for c99 according to its Linux man page) from within the
PLplot library (say as the result of plinit).  If you agree (a) that
idea would work and (b) it would be useful, would you be willing to
implement it in C for the case when the PLPLOT_ENABLE_FLOAT_EXCEPT C
macro is #defined?  If so, I would be willing to do the rest on the
CMake side (create a CMake option for this and propagate it to the
corresponding C macro for the compilation of the source file where you
have implemented the feenableexcept call).

Alan
__________________________
Alan W. Irwin

Astronomical research affiliation with Department of Physics and Astronomy,
University of Victoria (astrowww.phys.uvic.ca).

Programming affiliations with the FreeEOS equation-of-state
implementation for stellar interiors (freeeos.sf.net); the Time
Ephemerides project (timeephem.sf.net); PLplot scientific plotting
software package (plplot.sf.net); the libLASi project
(unifont.org/lasi); the Loads of Linux Links project (loll.sf.net);
and the Linux Brochure Project (lbproject.sf.net).
__________________________

Linux-powered Science
__________________________

Re: [Plplot-devel] floating point exception in example 29

From: Hazen B. <hba...@ma...> - 2014-04-15 11:44:53

On 4/7/2014 11:08 PM, Alan W. Irwin wrote:
> I have just had a further idea. For comprehensive testing situations
> it might be good to have the option to call feenableexcept (the C
> library function that you used to help debug x29c.c that should be
> available for c99 according to its Linux man page) from within the
> PLplot library (say as the result of plinit).  If you agree (a) that
> idea would work and (b) it would be useful, would you be willing to
> implement it in C for the case when the PLPLOT_ENABLE_FLOAT_EXCEPT C
> macro is #defined?  If so, I would be willing to do the rest on the
> CMake side (create a CMake option for this and propagate it to the
> corresponding C macro for the compilation of the source file where you
> have implemented the feenableexcept call).

I have tested the idea and it works. I'm not sure about the utility 
though. Based on my tests it already caught the only error in the 
examples that it is going to catch. After the git transition I can 
create a branch that will catch floating point exceptions and you can 
merge it (or not) into the master branch based on your feeling about 
it's utility.

best,
-Hazen

Re: [Plplot-devel] floating point exception in example 29

From: Hazen B. <hba...@ma...> - 2015-05-11 21:23:31

On 04/15/2014 07:44 AM, Hazen Babcock wrote:
> On 4/7/2014 11:08 PM, Alan W. Irwin wrote:
>> I have just had a further idea. For comprehensive testing situations
>> it might be good to have the option to call feenableexcept (the C
>> library function that you used to help debug x29c.c that should be
>> available for c99 according to its Linux man page) from within the
>> PLplot library (say as the result of plinit).  If you agree (a) that
>> idea would work and (b) it would be useful, would you be willing to
>> implement it in C for the case when the PLPLOT_ENABLE_FLOAT_EXCEPT C
>> macro is #defined?  If so, I would be willing to do the rest on the
>> CMake side (create a CMake option for this and propagate it to the
>> corresponding C macro for the compilation of the source file where you
>> have implemented the feenableexcept call).
>
> I have tested the idea and it works. I'm not sure about the utility
> though. Based on my tests it already caught the only error in the
> examples that it is going to catch. After the git transition I can
> create a branch that will catch floating point exceptions and you can
> merge it (or not) into the master branch based on your feeling about
> it's utility.

Much time passes during which I never implement this branch as promised, 
and it looks like some more floating point exceptions have crept in. 
More specifically I'm seeing floating point exceptions in x25, x30 and 
x33. The problem seems to be with plgradient() when it uses a software 
fallback gradient (I was testing with the xwin driver).

You can pull a branch with floating point exception trapping enabled for 
these examples here:
https://github.com/HazenBabcock/PLplot/tree/fpe_x25_x30_x33

The problem is occurring in notcrossed() in plfill.c at line 2040 when 
converting from a PLFLT to a PLINT. It looks like fxintersect and 
fyintersect can take on values that are too large to be converted to 
integers. Checking for this fixes the problem, but maybe this is 
indicating some problem upstream?

-Hazen

Re: [Plplot-devel] floating point exception in example 29

From: Alan W. I. <ir...@be...> - 2015-05-11 22:05:26

On 2015-05-11 16:23-0400 Hazen Babcock wrote:

> On 04/15/2014 07:44 AM, Hazen Babcock wrote:
>> On 4/7/2014 11:08 PM, Alan W. Irwin wrote:
>>> I have just had a further idea. For comprehensive testing situations
>>> it might be good to have the option to call feenableexcept (the C
>>> library function that you used to help debug x29c.c that should be
>>> available for c99 according to its Linux man page) from within the
>>> PLplot library (say as the result of plinit).  If you agree (a) that
>>> idea would work and (b) it would be useful, would you be willing to
>>> implement it in C for the case when the PLPLOT_ENABLE_FLOAT_EXCEPT C
>>> macro is #defined?  If so, I would be willing to do the rest on the
>>> CMake side (create a CMake option for this and propagate it to the
>>> corresponding C macro for the compilation of the source file where you
>>> have implemented the feenableexcept call).
>> 
>> I have tested the idea and it works. I'm not sure about the utility
>> though. Based on my tests it already caught the only error in the
>> examples that it is going to catch. After the git transition I can
>> create a branch that will catch floating point exceptions and you can
>> merge it (or not) into the master branch based on your feeling about
>> it's utility.
>
> Much time passes during which I never implement this branch as promised, and 
> it looks like some more floating point exceptions have crept in. More 
> specifically I'm seeing floating point exceptions in x25, x30 and x33. The 
> problem seems to be with plgradient() when it uses a software fallback 
> gradient (I was testing with the xwin driver).
>
> You can pull a branch with floating point exception trapping enabled for 
> these examples here:
> https://github.com/HazenBabcock/PLplot/tree/fpe_x25_x30_x33
>
> The problem is occurring in notcrossed() in plfill.c at line 2040 when 
> converting from a PLFLT to a PLINT. It looks like fxintersect and fyintersect 
> can take on values that are too large to be converted to integers. Checking 
> for this fixes the problem, but maybe this is indicating some problem 
> upstream?

Hi Hazen:

I am glad to hear you have created a topic branch that catches
floating point exceptions.  However, I am not keen on pulling a PLplot
topic branch from github for the reasons discussed in
README.developers.  So could you either share your topic branch using
"git format-patch" or else just push it to our official repo yourself?
The former is preferred if the work is incomplete (i.e., does not
include a cmake option to control when the PLPLOT_ENABLE_FLOAT_EXCEPT
macro is #defined). Furthermore, if the work is incomplete, I can
finish the cmake aspects of it as promised above and amend your commit
accordingly.

Once we have implemented an option to check for floating-point
exceptions, then that means any of us should be able to confirm the
ones you have found and find others in the future due to the continued
evolution of our examples.  And, of course, this option should allow
us to fix those floating-point exceptions as time permits.

Alan
__________________________
Alan W. Irwin

Astronomical research affiliation with Department of Physics and Astronomy,
University of Victoria (astrowww.phys.uvic.ca).

Programming affiliations with the FreeEOS equation-of-state
implementation for stellar interiors (freeeos.sf.net); the Time
Ephemerides project (timeephem.sf.net); PLplot scientific plotting
software package (plplot.sf.net); the libLASi project
(unifont.org/lasi); the Loads of Linux Links project (loll.sf.net);
and the Linux Brochure Project (lbproject.sf.net).
__________________________

Linux-powered Science
__________________________

Re: [Plplot-devel] floating point exception in example 29

From: Hazen B. <hba...@ma...> - 2015-05-12 13:17:46

Attachments: CMakeCache.txt

On 05/11/2015 06:05 PM, Alan W. Irwin wrote:
> On 2015-05-11 16:23-0400 Hazen Babcock wrote:
>>
>> Much time passes during which I never implement this branch as
>> promised, and it looks like some more floating point exceptions have
>> crept in. More specifically I'm seeing floating point exceptions in
>> x25, x30 and x33. The problem seems to be with plgradient() when it
>> uses a software fallback gradient (I was testing with the xwin driver).
>>
>> You can pull a branch with floating point exception trapping enabled
>> for these examples here:
>> https://github.com/HazenBabcock/PLplot/tree/fpe_x25_x30_x33
>>
>> The problem is occurring in notcrossed() in plfill.c at line 2040 when
>> converting from a PLFLT to a PLINT. It looks like fxintersect and
>> fyintersect can take on values that are too large to be converted to
>> integers. Checking for this fixes the problem, but maybe this is
>> indicating some problem upstream?
>
> Hi Hazen:
>
> I am glad to hear you have created a topic branch that catches
> floating point exceptions.  However, I am not keen on pulling a PLplot
> topic branch from github for the reasons discussed in
> README.developers.  So could you either share your topic branch using
> "git format-patch" or else just push it to our official repo yourself?
> The former is preferred if the work is incomplete (i.e., does not
> include a cmake option to control when the PLPLOT_ENABLE_FLOAT_EXCEPT
> macro is #defined). Furthermore, if the work is incomplete, I can
> finish the cmake aspects of it as promised above and amend your commit
> accordingly.
>
> Once we have implemented an option to check for floating-point
> exceptions, then that means any of us should be able to confirm the
> ones you have found and find others in the future due to the continued
> evolution of our examples.  And, of course, this option should allow
> us to fix those floating-point exceptions as time permits.

Hi Alan,

Sorry, that branch was not meant for incorporation into PLplot which is 
why I did not follow accepted protocols. I thought it might make it 
easier for others to see the floating point exception problem for these 
3 examples.

I think we can enable floating point exception trapping more easily 
using the -ffpe-trap option provided by gfortran. This won't require us 
to mark up all of our C examples, though we might instead need to check 
that the fortran compiler is gfortran?

https://gcc.gnu.org/onlinedocs/gfortran/Debugging-Options.html

-ffpe-trap=invalid,zero,overflow,underflow

Compiling PLplot with this fortran flag (CMAKE_Fortran_FLAGS) "worked" 
for me in that it core dumps on the fortran equivalents of the C 
examples I mentioned above. I've attached my CMakeCache.txt file.

-Hazen