Re: [gnuplot-beta]set missing malfunctions

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

On Wednesday 09 June 2004 04:41 am, Hans-Bernhard Broeker wrote:
> As you can see, the difference is that in 3.7 the '?' point is actually
> treated as missing, i.e. as if it simply wasn't there, whereas 4.0
> treats it as an undefined datapoint. The culprit modification appears to
> be the datafile.c:1161 ff., stamped by Ethan in October 2002.  It seems
> the rationale given in Ethan's comment is at odds with the documentation
> and with traditional behaviour.

Here is the entirety of the change at issue:
        /*
         * EAM - Oct 2002 Distinguish between DF_MISSING and DF_BAD.
         * Previous versions would never notify caller of either case.
         * Now missing data will be noted. Bad data should arguably be
         * noted also, but that would change existing default behavior.
         */
         else if ((column <= df_no_cols) && (df_column[column - 1].good == DF_MISSING))
              return DF_UNDEFINED;

The comment was correct. Without this change the code was IMHO broken,
and worked as it did only by accident.
datafile.c: df_tokenise() would internally set either DF_BAD or DF_MISSING,
but the only test made in determining the returned value was for whether
the point had been marked DF_GOOD or not.  That is, there was no
distinction in practice between missing data and unparsable data.
This behavior is identical to what you still get in version 4.0 if you specify
the "wrong" missing data character.

With the change, the missing data is explicitly reported as such by
the code in datafile.c.   It is debatable exactly what different plot modes
should do in the presence of missing data,  and if we are to revisit this
question I suggest that the code that would need to be changed is not
the section above, but rather the plot-style-specific code in plot2d.c
As I recall, the reason I noticed the original problem was that with the
version 3.7 code it was not possible to guarantee reasonable behavior
from missing data for box plots or candlesticks.  I don't know that I can 
reproduce an example of the failure quickly, but I can try if needed.

The other problem was that you could not tell gnuplot that '0' or 'NaN'
or some other numerically-parsable flag indicated missing data.
This was a problem for tabular data that used 0 as a place-holder.

> So: which do we change: the code or the docs?

I think the docs should be changed.  The current description 
correctly describes the behavior in the absence of an active
missing data character. What it really is describing is the behavior
in the presence of an unrecognized value in a numeric data field.
The whole point of having a "missing" flag is to get some different
behavior from this, right?

I suggest changing the section in the docs to be 

 Example:
       set datafile missing "NONE"
       set style data lines
       plot '-'
      [...  existing three cases with commentary]

       set datafile missing "?"
       plot '-'
      [... add comment that the explicit missing character flag
           will now cause all three of the above plot commands 
           to behave identically]

I am open to the suggestion that 'plot with lines' in particular
should be modified so that missing data does not produce
breaks in the line.  I don't have any strong opinion about that,
other than to note that if we do that it will be harder to cause 
a break in the line if you *do* want one.

-- 
Ethan A Merritt       merritt@u.washington.edu
Biomolecular Structure Center
Mailstop 357742
University of Washington, Seattle, WA 98195

Re: [gnuplot-beta]set missing malfunctions

A portable, multi-platform, command-line driven graphing utility

Re: [gnuplot-beta]set missing malfunctions