From: Ethan M. <merritt@u.washington.edu> - 2004-06-09 18:26:05
|
On Wednesday 09 June 2004 04:41 am, Hans-Bernhard Broeker wrote: > As you can see, the difference is that in 3.7 the '?' point is actually > treated as missing, i.e. as if it simply wasn't there, whereas 4.0 > treats it as an undefined datapoint. The culprit modification appears to > be the datafile.c:1161 ff., stamped by Ethan in October 2002. It seems > the rationale given in Ethan's comment is at odds with the documentation > and with traditional behaviour. Here is the entirety of the change at issue: /* * EAM - Oct 2002 Distinguish between DF_MISSING and DF_BAD. * Previous versions would never notify caller of either case. * Now missing data will be noted. Bad data should arguably be * noted also, but that would change existing default behavior. */ else if ((column <= df_no_cols) && (df_column[column - 1].good == DF_MISSING)) return DF_UNDEFINED; The comment was correct. Without this change the code was IMHO broken, and worked as it did only by accident. datafile.c: df_tokenise() would internally set either DF_BAD or DF_MISSING, but the only test made in determining the returned value was for whether the point had been marked DF_GOOD or not. That is, there was no distinction in practice between missing data and unparsable data. This behavior is identical to what you still get in version 4.0 if you specify the "wrong" missing data character. With the change, the missing data is explicitly reported as such by the code in datafile.c. It is debatable exactly what different plot modes should do in the presence of missing data, and if we are to revisit this question I suggest that the code that would need to be changed is not the section above, but rather the plot-style-specific code in plot2d.c As I recall, the reason I noticed the original problem was that with the version 3.7 code it was not possible to guarantee reasonable behavior from missing data for box plots or candlesticks. I don't know that I can reproduce an example of the failure quickly, but I can try if needed. The other problem was that you could not tell gnuplot that '0' or 'NaN' or some other numerically-parsable flag indicated missing data. This was a problem for tabular data that used 0 as a place-holder. > So: which do we change: the code or the docs? I think the docs should be changed. The current description correctly describes the behavior in the absence of an active missing data character. What it really is describing is the behavior in the presence of an unrecognized value in a numeric data field. The whole point of having a "missing" flag is to get some different behavior from this, right? I suggest changing the section in the docs to be Example: set datafile missing "NONE" set style data lines plot '-' [... existing three cases with commentary] set datafile missing "?" plot '-' [... add comment that the explicit missing character flag will now cause all three of the above plot commands to behave identically] I am open to the suggestion that 'plot with lines' in particular should be modified so that missing data does not produce breaks in the line. I don't have any strong opinion about that, other than to note that if we do that it will be harder to cause a break in the line if you *do* want one. -- Ethan A Merritt merritt@u.washington.edu Biomolecular Structure Center Mailstop 357742 University of Washington, Seattle, WA 98195 |