gnuplot / Bugs / #2787 using: Unexpected evaluation as “undefined” due to $N returning NaN

Ethan Merritt - 2025-04-16

There may be several related issues here, but the heart of the matter is your very pertinent observation that the comma operator for serial evaluation is not acting as a proper sequence point. In particular, error conditions are not cleared in between successive expression in (<exp1>,<exp2>,<exp3>,...).

This is fixable, and the fix is now in version 6.1 via commit 2cf853b028

With that in place, it is possible to modify the definition of isnan in your sample script as follows:

isnan(x) = (TEMP=(x==x)?0:1 , TEMP)

With this change all the plots in your first sample script work as I think you expected them to, and the second script using stats also runs as expected.

It may well be, however, that other unexpected (or poorly documented) cases exist. In general gnuplot 6 tries to figure out whether or not an input data value that is missing /NaN/undefined is required in order to plot that point. If it is required, the point marked as invalid. If it is not required, the point is considered valid and the result (usually NaN) is stored in the corresponding slot of the data structure for that point. The problem is, what exactly does "required" mean? Both x and y are obviously required in order to plot a point a [x,y]. But if the plot command references a z value but the z value is missing, what then? For example

plot DATA using 1:2:3 zsort with points

The point can be drawn, but the missing or NaN z value affects the sort order. Does that make the data point valid or invalid? As it happens, in this case the points are sorted (incorrectly) using some artefactual value that ends up in z. I think this is neither obvious nor documented.

If you have further thoughts on the matter, please attach them here or raise the issue for discussion on the gnuplot-beta mailing list.

Last edit: Ethan Merritt 2025-04-16
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Hiroki Motoyoshi - 2025-04-16
  
  Thank you for looking into this so quickly.
  
  If you have further thoughts on the matter, please attach them here or raise the issue for discussion on the gnuplot-beta mailing list.
  
  Here is how I think missing and invalid data should be handled in the using specification.
  
  Handling of Missing and Invalid Data in Parentheses Evaluation in the using Specification
  
  Missing Data
  
  When missing data is referenced using $N or column(N), the evaluation of the parentheses is determined as missing, even during intermediate steps of sequential evaluation. If we consider missing data as inherently inaccessible, then references to missing data via other column access functions—such as valid(N), strcol(N), or timecolumn(N, "timefmt")—should result in the same outcome.
  
  Invalid Data
  
  When invalid data is referenced using $N or column(N), the evaluation of the parentheses is not immediately determined as invalid at that point. Intermediate results during sequential evaluation do not affect the outcome; only the final result determines the evaluation of the parentheses. Invalid data is represented as NaN, which can be used in comparisons and arithmetic operations with other numeric values.
  
  There may be differences of opinion. Especially, the part regarding 'valid', 'strcol', and 'timecolumn' may have significant impact due to potential compatibility issues.
  
  Aside from that, I believe the behavior can be implemented by applying the following modifications in addition to commit "3e1f93bd".
  
  diff --git a/src/datafile.c b/src/datafile.c index d3a45f2ab..60e7c9606 100644 --- a/src/datafile.c +++ b/src/datafile.c @@ -2784,7 +2784,7 @@ f_column(union argument *arg) push(Gcomplex(&a, not_a_number(), (double)DF_MISSING)); df_missing_data_in_expression = TRUE; } else if (df_column[column-1].good != DF_GOOD) { - undefined = TRUE; + /* undefined = TRUE; */ push(Gcomplex(&a, not_a_number(), 0.0)); } else push(Gcomplex(&a, df_column[column - 1].datum, 0.0));
  
  While this change might unintentionally interfere with the intended behavior of the original fix for Bug #1896, the rationale behind the current behaviour of '$N' is still unclear to me.
  
  This is a script for verifying the differences between missing and invalid data.
  In case 1 and 3, NaN values in the data file are treated as missing, so we expect bars to be drawn only at x = 1, 3, and 7. In case 2 and 4, NaN values in the data file are treated as invalid, so we expect bars to be drawn for all x values from 1 to 7. The behavior differs before and after applying the patch. As mentioned above, 'valid(N)' does not trigger the missing flag, so after applying the patch, case 3 will produce the same plot as case 2 and 4.
  
  $data <<EOD 1 3 1 2 NaN 5 3 4 6 4 5 NaN 5 NaN 3 6 8 NaN 7 6 6 8 NaN 9 EOD max(a,b) = (a > b) ? a : b invalid(N) = (valid(N)) ? 0 : 1 select_column(m,n) = invalid(m) ? column(n) : invalid(n) ? column(m) : max(column(m), column(n)) isnan(x) = (x==x) ? 0 : 1 select_value(a,b) = isnan(a) ? b : isnan(b) ? a : max(a,b) set xrange [0:9] set yrange [0:12] unset key set multiplot layout 2,2 set title "case 1: select_value : missing" noenhanced set datafile missing NaN plot $data using 1:(select_value($2,$3)):(1) with boxes set title "case 2: select_value : invalid" noenhanced set datafile missing replot set title "case 3: select_column : missng" noenhanced set datafile missing NaN plot $data using 1:(select_column(2,3)):(1) with boxes set title "case 4: select_column : invalid" noenhanced set datafile missing replot unset multiplot pause -1
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Ethan Merritt - 2025-04-17
    
    I am confused.
    I wonder if you are seeing the same result from running that script that I do?
    
    You say "after applying the patch case 3 will produce the same plot as case 2 and 4", but for me cases 2 and 4 yield different plots. This is true either for gnuplot 6.0 or 6.1 (output attached)
    
    Using the current git tip commit 2cf853b0 I can make the output for 2 be the same as 4 by correcting the definition of isnan() to use serial evaluation: isnan(X) = (TEMP=(X==X)?0:1),TEMP). But now cases 2 3 and 4 all produce the same output even without your additional patch.
    
    Are you seeing something different?
    
    6.0_or_current_6.1.png
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
    - Hiroki Motoyoshi - 2025-04-17
      
      The script compares the values in the second and third columns, and plots the valid and larger one.
      
      Attached are the outputs from gnuplot 6.0.2 and gnuplot 6.1.0 with patch [commit "3e1f93bd" & my patch]. These two figures match the expected changes resulting from the patch.
      
      In "case1" and "case2", all input data is referenced using "column(N)". This is the usage pattern addressed by this ticket. Case1 involves reading missing data and is therefore unaffected by the patch. In contrast, case2 involves reading invalid data and shows changes after the patch.
      
      In "case3" and "case4", "valid(N)" is used to ensure that "column(N)" is only accessed when the value is valid. When a value is invalid, the script avoids referencing "column(N)". So, case3, and case4 render correctly even in version 6.0.2, so no changes are observed with or without the patch.
      
      Among the four figures, only case2 should be affected by the patch.
      
      If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
      - Hiroki Motoyoshi - 2025-04-17
        
        Figures are attached here.
        
        sample602.png
        
        sample610withpatch.png
        
        If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
        
        Ethan Merritt - 2025-04-17
        
        It is the definition isnan(X) = (X==X)?0:1 that is incorrect. This does not work in a using specifier exactly because evaluation of X==X sets an error flag. This is exactly why the separate function valid(x) was introduced.
        
        The recent change to serial evaluation makes it possible to define isnan(X) to work the way you want, which is good, and "fixes" case 2 of the multiplot example.
        
        I am not following the rationale for the proposed patch. I can see how case (2) shows that it might be useful to provide a builtin function isnan(), but the patch would affect other cases than evaluation to NaN. It would suppress detection of other "invalid" cases as well. I think. I would have to hunt up or regenerate test scripts for the original problems that led to changes in that code section in 2017 (Bug #1896) 2018 (commit b8304eafc) and 2022 (commit c8d468de9).
        
        If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
        
        Hiroki Motoyoshi - 2025-04-17
        
        It is the definition isnan(X) = (X==X)?0:1 that is incorrect.
        
        I believe this function definition is actually working as intended.
        
        isnan(x) = x==x ? 0 : 1 f(x) = (x==0)? NaN : x print "isnan(1)", isnan(1) # => 0 print "isnan(NaN)", isnan(NaN) # => 1 print "isnan(f(1))", isnan(f(1)) # => 0 print "isnan(f(0))", isnan(f(0)) # => 1 $data <<EOD 1 NaN 3 NaN 5 EOD s = "" plot $data using (0):(s=s.sprintf("%d %d\n",isnan($1),valid(1)),0) print s # => # 0 1 # 1 0 # 0 1 # 1 0 # 0 1
        
        To me, it seems that the reason "isnan($1)" doesn't behave as expected within a "using" clause is because referencing $1 when it points to an invalid value causes the undefined flag to be set. Am I missing something here?
        
        I'd like to clarify that I didn't open this issue in order to be able to define "isnan". My concern is more about the side effect of referencing $N, which appears to be the actual issue.
        
        As you mentioned, it's not entirely clear what parts of the code might be affected by my patch, so I agree that we need to proceed cautiously. However, the fact that merely referencing a value sets the undefined flag seems like a side effect to me.
        
        If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
        
        Ethan Merritt - 2025-04-17
        
        I need to think about it some more and go back to look at the earlier modification.
        
        If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Ethan Merritt - 2025-06-04

Status: open --> closed-fixed
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

using: Unexpected evaluation as “undefined” due to $N returning NaN

A portable, multi-platform, command-line driven graphing utility

Priority

Searches

Help

#2787 using: Unexpected evaluation as “undefined” due to $N returning NaN

Discussion