From: Mojca M. <moj...@gm...> - 2011-01-26 22:37:49
|
On Wed, Jan 26, 2011 at 05:16, sfeam (Ethan Merritt) wrote: > On Tuesday, January 25, 2011, Mojca Miklavec wrote: >> I don't know enough about gnuplot's source, so I don't know how >> difficult it is to change it, but if there is no problem to support >> comments (in both data files and scripts), I don't see why ignoring >> the first two bytes would not be doable. I consider it "equally hard". > > If you want to experiment with that approach, you can find the > relevant switch statement at line 201 of scanner.c (scanner): > > switch (expression[current]) { > case '#': /* DFK: add comments to gnuplot */ > goto endline; /* ignore the rest of the line */ > case '^': > case '+': > > That isn't going to help with data files, however. > Only with command lines that unexpectedly contain the BOM sequence. I can catch BOM with the following code: --- a/src/scanner.c +++ b/src/scanner.c @@ -114,8 +114,14 @@ scanner(char **expressionp, size_t *expressionlenp) /* leave space for dummy end token */ extend_token_table(); } - if (isspace((unsigned char) expression[current])) + if (isspace((unsigned char) expression[current])) { continue; /* skip the whitespace */ + } else if (((unsigned char)expression[current] == 0xef) && ((unsigned char)expression[current+1] == 0xbb) && ((unsigned char)expression[current+2] == 0xbf)) { + current += 2; + // optional warning + // int_warn(t_num, "Your file starts with a BOM character; you might want to remove it."); + continue; + } token[t_num].start_index = current; token[t_num].length = 1; token[t_num].is_token = TRUE; /* to start with... */ (NOTE 1: to avoid possible segmentation faults or other problems on files with less than 3 characters one would probably want to test if expression is long enough first. I didn't test if it really segfaults or not though, but it is probably polite to check if expression[current+2] is valid at all ...) (NOTE 2: I'm not sure if that is a good idea or not; one might want to set "utf-8" encoding by default in case that BOM is encountered. But on the other hand doing that might encourage users to always use BOM to avoid the need to set encoding.) This would catch any of the following: - gnuplot filewithbom.plt - gluplot < filewithbom.plt - load 'filewithboth.plt' However it wouldn't catch problematic datafiles (as already mentioned), but it might be enough to patch df_readascii in datafile.c to account for those as well. I didn't play with that yet, but I would like to know what you think about the patch mentioned above. Mojca |