From: Ethan M. <merritt@u.washington.edu> - 2011-01-26 04:07:04
|
On Tuesday, January 25, 2011, Mojca Miklavec wrote: > On Tue, Jan 25, 2011 at 03:12, Allin Cottrell wrote: > > > > I'm not sure I'd call this a "fix". Wikipedia says of the BOM in > > UTF-8: > > > > "While Unicode standard allows BOM in UTF-8, it does not require > > or recommend it. That same Wikipedia paragraph goes on to say: The BOM will make a batch file not executable on Windows, so batch files must be saved as ANSI, not Unicode[...] On any platform, a UTF-8 BOM will interfere with the interpretation of source code for compiler and tools that don't recognise it but could otherwise handle UTF-8. > However ... this has to be read as: gnuplot is not required to > *output* files with BOM (and thus doesn't need to be fixed to create > BOM marks in output), but it should better support them when *opening* > external files. Even if the marks are not required by the standard, > they are still there. Even worse ... from what some people here say > they are even there by default in some standard Windows tools. It is worse than you may think. Notepad cannot even read _it's own files_ reliably. I'm sure you can find many discussions on Notepad and the BOM problem via Google; here are pointers to a couple: http://www.eeggs.com/items/48383.html http://www.datamystic.com/forums/viewtopic.php?t=586 Best to view it as some Windows-specific craziness that must be stripped from the file when transferring it to unix/linux, exactly the same as we must strip the extra ^M at the end of every line. I realize that may leave you with a problem if you are both creating and using the files on Windows, but I do not have a good solution for that. I did come across several recommendations to replace Notepad with Notepad++, which offers the option to edit and save UTF-8 files without adding a BOM. It's not just the script files, by the way. The same problem with presence or absence of a BOM applies to data files as well, including so far as I know binary files. So if you are unlucky enough to have a binary data file that just happens to contain the BOM bit pattern at the start, many Windows tools will handle it incorrectly. > (But once again: I don't know the source good enough, so I have no > idea how difficult it would be to fix that particular behaviour.) A check for BOM would have to be made every time a file is opened. So it might have to be handled in the readline library, and/or by providing a custom fopen() routine. But even that wouldn't help if you fed the input file to gnuplot via gnuplot < my-file-with-BOM.gp |