On Tuesday, January 25, 2011, Mojca Miklavec wrote:
> On Tue, Jan 25, 2011 at 03:12, Allin Cottrell wrote:
> >
> > I'm not sure I'd call this a "fix". Wikipedia says of the BOM in
> > UTF-8:
> >
> > "While Unicode standard allows BOM in UTF-8, it does not require
> > or recommend it.
That same Wikipedia paragraph goes on to say:
The BOM will make a batch file not executable on Windows, so batch
files must be saved as ANSI, not Unicode[...] On any platform,
a UTF-8 BOM will interfere with the interpretation of source code
for compiler and tools that don't recognise it but could otherwise
handle UTF-8.
> However ... this has to be read as: gnuplot is not required to
> *output* files with BOM (and thus doesn't need to be fixed to create
> BOM marks in output), but it should better support them when *opening*
> external files. Even if the marks are not required by the standard,
> they are still there. Even worse ... from what some people here say
> they are even there by default in some standard Windows tools.
It is worse than you may think. Notepad cannot even read _it's own files_
reliably. I'm sure you can find many discussions on Notepad and the BOM
problem via Google; here are pointers to a couple:
http://www.eeggs.com/items/48383.html
http://www.datamystic.com/forums/viewtopic.php?t=586
Best to view it as some Windows-specific craziness that must be
stripped from the file when transferring it to unix/linux, exactly
the same as we must strip the extra ^M at the end of every line.
I realize that may leave you with a problem if you are both creating
and using the files on Windows, but I do not have a good solution for
that. I did come across several recommendations to replace Notepad with
Notepad++, which offers the option to edit and save UTF-8 files without
adding a BOM.
It's not just the script files, by the way. The same problem with
presence or absence of a BOM applies to data files as well, including
so far as I know binary files. So if you are unlucky enough to have
a binary data file that just happens to contain the BOM bit pattern at
the start, many Windows tools will handle it incorrectly.
> (But once again: I don't know the source good enough, so I have no
> idea how difficult it would be to fix that particular behaviour.)
A check for BOM would have to be made every time a file is opened.
So it might have to be handled in the readline library, and/or by providing
a custom fopen() routine. But even that wouldn't help if you fed the
input file to gnuplot via
gnuplot < my-file-with-BOM.gp
|