Menu

#64 OP variant stops with fatal floating-point exception

Done
nobody
None
Medium
Organon variants
Defect
2022-07-13
2022-05-13
No

I wrote to the FVS Helpdesk a month ago regarding a strange crash I observe when running the OP (Organon) variant of FVS. I do not observe this crash while running the same keyword file through the WC or PN variants of FVS.

The OP variant crashes due to a floating-point exception:

Program received signal SIGFPE: Floating-point exception - erroneous arithmetic operation

Another thing I noticed is that this crash occurs only if the length of my simulation exceeds 16 cycles. (Odd, yes?)

Anyway, I’m not much of a Fortran programmer, but I just couldn’t let it be without at least making an attempt to narrow down the source of the problem. So I fired up a debugger and found the line that leads to the SIGFPE. It turns out this is line 942 in FVSmodel/trunk/organon/htgrowth.f:

FERTX2=EXP(PF3*(XTIME-YF(1))+PF4*(SI_1/100.0)**PF5)

This particular floating-point exception is the result of underflow. It seems to come about in the call to the EXP function. The argument to EXP here is negative. Trying to raise e to a negative power of this magnitude evidently results in a value so tiny that underflow occurs.

Moreover, it seems that since the value of XTIME increases as the cycle number increases, therefore the negative number passed to EXP gets more and more negative as cycles pass. This probably explains why the crash only comes about with a certain number of cycles in the simulation.

Whether or not this is the best approach to take, it seems the crash at least can be made to go away by treating some intermediate results as doubles (REAL*8) instead of the REAL*4 values they are now.

Widening the argument to EXP fixes the problem I first identified, and widening the intermediate variable FERTX2 fixes a problem that is revealed in the subsequent line if the result of the call to EXP is narrowed again too soon. Finally, we can explicitly convert the value of FERTADJ back to REAL*4 to avoid a compiler warning.

I've attached a copy of the file with my proposed changes for your review.

1 Attachments

Related

Tickets: #64

Discussion

  • Erin Crosland

    Erin Crosland - 2022-05-13

    I don't know if it might make sense to change REAL*4 to REAL*8 everywhere in Organon?

     
  • Donald Robinson

    Donald Robinson - 2022-05-13

    This kind of math bug might mean that the routine has not been exercised over a broad range of inputs and species. I know that the R-tools Fortran compiler is less forgiving than the Intel compiler, but it is the standard compiler for FVS. I think that changing to double precision isn't the first thing to try. If I were in your place I'd pull apart the equation (which you did) and then see if you can choose a lower bound on the input value that will not break when passed to exp(). You could compute it in advance and then bound it if needed so it doesn't underflow. Good luck!

     

    Last edit: Donald Robinson 2022-05-13
    • Nick Crookston

      Nick Crookston - 2022-05-13

      I completely agree with Don. If you can change the logic to trap the
      condition that is causing the underflow, please post the code here and it
      will likely be incorporated into the program.

      On Fri, May 13, 2022 at 7:55 AM Donald Robinson donrobinson@users.sourceforge.net wrote:

      This kind of math bug might mean that the routine has not been exercised
      over a broad range of inputs and species. I know that the R-tools Fortran
      compiler is less forgiving than the Intel compiler, but it is the standard
      compiler for FVS. I think that changing to double precision isn't the first
      thing to try. If I were in your place I'd pull apart the equation (which
      you did) and then see if you can choose a lower bound on the input value
      that will not break the behaviour. You could compute it in advance and then
      bound it if needed so it doesn't underflow. Good luck!


      [tickets:#64] OP variant stops with fatal floating-point exception

      Status: New
      Created: Fri May 13, 2022 02:20 AM UTC by Erin Crosland
      Last Updated: Fri May 13, 2022 02:58 AM UTC
      Owner: nobody
      Attachments:

      I wrote to the FVS Helpdesk a month ago regarding a strange crash I
      observe when running the OP (Organon) variant of FVS. I do not observe this
      crash while running the same keyword file through the WC or PN variants of
      FVS.

      The OP variant crashes due to a floating-point exception:

      Program received signal SIGFPE: Floating-point exception - erroneous arithmetic operation

      Another thing I noticed is that this crash occurs only if the length of my
      simulation exceeds 16 cycles. (Odd, yes?)

      Anyway, I’m not much of a Fortran programmer, but I just couldn’t let it
      be without at least making an attempt to narrow down the source of the
      problem. So I fired up a debugger and found the line that leads to the
      SIGFPE. It turns out this is line 942 in FVSmodel/trunk/organon/htgrowth.f:

      FERTX2=EXP(PF3*(XTIME-YF(1))+PF4*(SI_1/100.0)**PF5)

      This particular floating-point exception is the result of underflow. It
      seems to come about in the call to the EXP function. The argument to EXP
      here is negative. Trying to raise e to a negative power of this magnitude
      evidently results in a value so tiny that underflow occurs.

      Moreover, it seems that since the value of XTIME increases as the cycle
      number increases, therefore the negative number passed to EXP gets more and
      more negative as cycles pass. This probably explains why the crash only
      comes about with a certain number of cycles in the simulation.

      Whether or not this is the best approach to take, it seems the crash at
      least can be made to go away by treating some intermediate results as
      doubles (REAL*8) instead of the REAL*4 values they are now.

      Widening the argument to EXP fixes the problem I first identified, and
      widening the intermediate variable FERTX2 fixes a problem that is revealed
      in the subsequent line if the result of the call to EXP is narrowed again
      too soon. Finally, we can explicitly convert the value of FERTADJ back to
      REAL*4 to avoid a compiler warning.

      I've attached a copy of the file with my proposed changes for your review.


      Sent from sourceforge.net because you indicated interest in <
      https://sourceforge.net/p/open-fvs/tickets/64/>

      To unsubscribe from further messages, please visit <
      https://sourceforge.net/auth/subscriptions/>

      --
      Nicholas L. Crookston
      Forestry Research Consultant
      Moscow Idaho USA

       

      Related

      Tickets: #64

    • Erin Crosland

      Erin Crosland - 2022-05-13

      OK, that makes sense. I'll work on finding the lower bound and preventing anything smaller from heading into exp().

      FWIW, my thinking around making everything doubles is just that what if there are other timebombs of this nature strewn around the codebase just waiting for a value slightly too big or too small to come along. But I guess we can identify them if and when they crop up and do similar bounding on the values if needed.

       
  • Erin Crosland

    Erin Crosland - 2022-05-17

    I've found that underflow starts to occur when the value passed to exp is less than -87 (actually about -87.3, but I rounded), so I went ahead and prevented that from happening. I hope I've understood correctly and this is what you had in mind. I've attached a file that incorporates my changes.

    Edit: Well, it's been an interested google rabbit hole, but I found that at least one reason not to use subnormal/denormal numbers is that they can require a lot more clock cycles to work with as compared with normal numbers. So maybe just go ahead with the bounding and ignore the rest of what I wrote.


    I had another idea. Let me know what you think.

    By default floating-point underflow isn't a fatal error. In fact, at least at lower compiler optimization levels, the default behavior is "gradual underflow," whereby, using so-called "subnormal numbers," values that are normally too small to represent may still be represented with some loss of precision. (I'm far from an expert, but this is what I discovered in my googling.)

    It seems that the only reason FVS stops on underflow is due to the settings that are passed to the compiler with the -ffpe-trap switch (see line 63 of the makefile). If this switch is modified to omit the underflow and the denormal options, then the program does not stop at the location I identified earlier, even without my adding a lower bound on the argument to exp.

    Instead, all that we see is a note printed to the console when the program terminates, indicating that the underflow flag had been set--meaning underflow occurred somewhere in the program. (This is required by the Fortran standard.)


    Perhaps there's a reason FVS is compiled to terminate on underflow. On the other hand, although I wasn't able to find anything definitive as to best practices, some of the answers I found on StackOverflow indicated that it may not always be the right move. For its part, the gfortran documentation simply indicates that invalid, zero, and overflow exceptions probably ought to be trapped because they indicate serious errors, while it omits underflow from the list, suggesting it isn't a serious error. This makes sense if gradual underflow fills the gap.

    I did a few comparisons, looking at the results I get for the FERTX2 variable at the first cycle where the underflow crash was occurring. I tried three different approaches for avoiding program termination: widen to double-precision floating point; make no changes to the code but revise the compiler switch as described above; or add a lower bound on the value passed to exp.

    Here are the results:
    6.1969737521332938E-040 (changed to double-precision)
    6.19697621E-40 (removed compiler switch, allowing gradual underflow without crashing)
    1.64581145E-38 (applied a lower bound)

    Note how similar the second result is to the first. This suggests that allowing the default "muddling through by gradual underflow" behavior may give us adequate results.

    I mean, I know we're talking about infinitesimal values in any case, so maybe I'm making too much of this. But it's a thought.

     

    Last edit: Erin Crosland 2022-05-17
  • Mike VanDyck

    Mike VanDyck - 2022-05-17

    Erin,
    First, thanks for looking into this so thoroughly and identifying the exact cause of the issue. Based on the edit to your last post, I think you came to this conclusion anyway, but the simple solution of preventing the mathematical issue (an underflow in this case) is often the best fix. It prevents the issue rather than reducing its probability, and it essentially eliminates unintended consequences elsewhere in the code. I know you understand this well, but we’re trying to approximate natural systems with equations that often don’t behave well at the extremes, so there are lots of instances where logical bounds are necessary. Thanks again for your efforts on this.

     
    • Erin Crosland

      Erin Crosland - 2022-05-19

      Yeah, thanks for the explanation. That makes sense, particularly the part about avoiding unintended consequences elsewhere.

      I'm just grateful for the opportunity to learn a little more about floating-point numbers. :)

       
  • Mike VanDyck

    Mike VanDyck - 2022-05-20
    • status: New --> Done
     
  • Erin Crosland

    Erin Crosland - 2022-07-05

    I just downloaded the latest FVS executables from the Forest Service (the installer is dated 7/1/2022), but OP crashes like it used to before this fix. Is there anything more I can do to help get this into a release?

     

Log in to post a comment.