From: Ethan A M. <me...@uw...> - 2022-03-02 22:37:50
|
On Wednesday, 2 March 2022 06:37:26 PST Peter Juhasz wrote: > On Wed, Mar 2, 2022 at 5:22 AM Ethan A Merritt <me...@uw...> wrote: > > > > Observations: > > > - it's as if the mere presence of a $X in the specification causes the > > > datum to be marked as invalid, and dropped entirely, if column X > > > doesn't contain data, no matter what the rest of the specification is. This is intentional. It may not be ideal but so far as I can see a better solution would be unreasonably difficult. Here's what is going on. $DATA << EOD 1 1 2 ? 3 3 EOD First consider set datafile missing "?" plot FOO using 1:2 with points The intent is to cleanly skip any lines where the value is "missing", and it is easy for the program to see that it should check column 2 for a missing value when data is being read in. Now let's make it a bit more complicated plot FOO using 1:(1./($1+$2)) The intent is again to skip any lines where the value is "missing". But now the program has to evaluate an expression to generate that value. Evaluating the expression first and then testing against "missing" cannot work and in fact the evaluation may blow up on divide-by-zero. So the input code tries to peek into the expression and see if any column values are needed in order to evaluate it, and if so whether those column values are actually present. This is really hard to do in general, but detecting "$<number>" in the expression is easy and that is a common case. So in the example here it can still easily detect that both columns 1 and 2 are needed for evaluation and it checks whether either of them is missing. Can't we do the same for column(N)? Not really. Perhaps for the very simple case "column(2)" but anything beyond that totally falls apart. Consider: plot for [i=1:N] FOO using 1:(column(i)) The evaluation table on input looks like this gnuplot> show at (column(i)) push i column We could detect the "column" function, but then what? What is i? Is it missing? And that's not the worst! An example that comes up from time to time in user queries is how to do what amounts to plot FOO using 1:(F(column($1))) Now it depends not only on column 1 (which yes we can detect easily) but also on some function of column whatever-column-1-points-to. I.e. if column 1 is missing we know to skip the line, but if column 1 contains "7" and column 7 is missing. Will F() blow up? The only thing saving us is that if it does blow up, that probably ends up with NaN as the y value, and we can check for that separately. "set datafile missing NaN" tries to do this behind the scenese. That leaves us with several problems. (1) It is easy to detect $<column> without actually evaluating and expression. So we do that and if it's missing, skip that line. (2) It is hard to detect indirect references to a column so that its status could be checked before trying to evaluate an expression it appears in. (3) The function valid() trips over problem (1) If your expression includes "valid($2)" then the pre-check will notice that evaluation requires column 2 and if it's missing the evaluation (including invocation of valid()) never happens. Any suggestions for improvement are welcome. cheers, Ethan |