#1196 error with sparse data and column names

closed-fixed
nobody
Other (155)
5
2012-12-14
2012-12-13
Henry Eck
No

gnuplot 4.6, patch level 0. Reading a comma-delimited data file using column names, gnuplot fails to find the last column iff that column is sparse.

Discussion

  • Henry Eck
    Henry Eck
    2012-12-13

     
    Attachments
  • Ethan Merritt
    Ethan Merritt
    2012-12-13

    Clarification: The problem is not that the column data is "sparse" (contains many zeros), it is that some lines in the data file do not contain this column at all. Furthermore the previous column field is not terminated by a comma. So in that sense the error arises from an improperly formatted data file. Nevertheless, let me think about how the code might deal with this more gracefully.

    1) You might think that requesting column("q_hi") causes the program to count across the header row to discover that "q_hi" appears in column 14, then internally translating this into a request for column(14). And I suppose that indeed the code could have been written like that, but it wasn't. Instead each input data line is parsed column by column until there is a match to the requested column header string. If there is no match then an error is returned. Using the former approach rather than the latter may be possible, but would require extensive changes to the code.

    2) Because of the way the code is structured (see above) the error is discovered in the parsing code while executing the column() or stringcolumn() function. This means that it is not possible to immediately return a "missing data" flag, which would seem the logical thing to do.

    It would be easy to have column() return NaN, which is not quite the same thing as "missing", but I see several problems with this. I tested this on your data and it works the way you were probably expecting. However in other cases the distinction between NaN and "missing" might be important.. Furthermore, there is no string equivalent to NaN, so this approach wouldn't solve the case of a missing column being plotted via stringcolumn("header"). Finally, the more common cause of this error message is that indeed there is no such column header. If the code is changed to treat the lack of a match as NaN, then the user will see an unhelpful message like "no valid data found in file" rather than the more informative "could not find a column head FOO"..

    So at this point I don't see any easy solutions, other than to document the requirement for a consistent minimum number of data columns in a csv file.

     
  • Henry Eck
    Henry Eck
    2012-12-14

    Thanks for the very quick response. Let me add some more info:
    The file format is "data,data,data<CR>", ie comma-delimited with a CR terminating the line (no comma after last datum). The data field is sometimes empty (missing data). All columns are terminated by commas, except the last column, which is terminated by the <CR>. I did try putting a extra comma after the last column, ie in front of the <CR>, but the error still occurred.

    There are 2 easy work-arounds: One is to insure that the last column has no missing data. I modified my app so that it optionally adds a "pad" column with a constant value. Alternately, one can use column numbers instead of column names in the plot command, and the issue doesn't occur then either.

    Thanks for your help.

     
  • Ethan Merritt
    Ethan Merritt
    2012-12-14

    I have found a viable change in the code. It now treats the point as "undefined" in the condition that you are hitting rather than returning an error. This is not quite the same treatment as missing points in a numbered column (they are simply skipped rather than being marked as undefined) but the resulting graph is the same.

    The modified treatment is now in CVS for 4.6 and 4.7

     
  • Ethan Merritt
    Ethan Merritt
    2012-12-14

    • status: open --> closed-fixed