Re: Weirdness in column specifications, column() vs dollar

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

On Wednesday, 2 March 2022 06:37:26 PST Peter Juhasz wrote:
> On Wed, Mar 2, 2022 at 5:22 AM Ethan A Merritt <me...@uw...> wrote:
> 
> > > Observations:
> > > - it's as if the mere presence of a $X in the specification causes the
> > > datum to be marked as invalid, and dropped entirely, if column X
> > > doesn't contain data, no matter what the rest of the specification is.

This is intentional.
It may not be ideal but so far as I can see a better solution 
would be unreasonably difficult.

Here's what is going on.

$DATA << EOD
1 1
2 ? 
3 3
EOD

First consider
  set datafile missing "?"
  plot FOO using 1:2 with points

The intent is to cleanly skip any lines where the value is "missing",
and it is easy for the program to see that it should check column 2
for a missing value when data is being read in.

Now let's make it a bit more complicated
  plot FOO using 1:(1./($1+$2))

The intent is again to skip any lines where the value is "missing".
But now the program has to evaluate an expression to generate that
value.  Evaluating the expression first and then testing against
"missing" cannot work and in fact the evaluation may blow up on
divide-by-zero. So the input code tries to peek into the expression
and see if any column values are needed in order to evaluate it,
and  if so whether those column values are actually present.
This is really hard to do in general, but detecting "$<number>" in
the expression is easy and that is a common case.  So in the
example here it can still easily detect that both columns 1 and 2 are
needed for evaluation and it checks whether either of them is missing.

Can't we do the same for column(N)?  Not really. Perhaps for
the very simple case "column(2)" but anything beyond that
totally falls apart.
Consider:
  plot for [i=1:N] FOO using 1:(column(i))

The evaluation table on input looks like this
gnuplot> show at (column(i))
        push i
        column
We could detect the "column" function, but then what?
What is i?  Is it missing?
And that's not the worst! An example that comes up from time to
time in user queries is how to do what amounts to

  plot FOO using 1:(F(column($1)))

Now it depends not only on column 1 (which yes we can detect easily)
but also on some function of column whatever-column-1-points-to.
I.e. if column 1 is missing we know to skip the line,
but if column 1 contains "7" and column 7 is missing. Will F() blow up?
The only thing saving us is that if it does blow up, that probably
ends up with NaN as the y value, and we can check for that separately.
"set datafile missing NaN" tries to do this behind the scenese.

That leaves us with several problems.

(1) It is easy to detect $<column> without actually evaluating
    and expression.  So we do that and if it's missing, skip that line.

(2) It is hard to detect indirect references to a column so that its
    status could be checked before trying to evaluate an expression
    it appears in.

(3) The function valid() trips over problem (1)
    If your expression includes "valid($2)" then the pre-check will
    notice that evaluation requires column 2 and if it's missing the
    evaluation (including invocation of valid()) never happens.

Any suggestions for improvement are welcome.

   cheers,
           Ethan

Re: Weirdness in column specifications, column() vs dollar

A portable, multi-platform, command-line driven graphing utility

Re: Weirdness in column specifications, column() vs dollar