From: Allin C. <cot...@wf...> - 2017-03-24 19:24:39
|
On Thu, 23 Mar 2017, sfeam wrote: > On Thursday, 23 March 2017 08:09:14 PM Allin Cottrell wrote: >> Sorry, this is quite ticklish but I'll try to explain it as best I >> can. >> >> I'm not sure, from reading the gnuplot help on "encoding", of the >> exact scope and effect of giving a "set encoding XXX" command in a >> plot file. >> >> Here's the context: my program writes a gnuplot command file, >> designed to produce PNG output via the pngcairo "terminal", and >> among the users of the program are people working on Windows in >> Russian. There are two possible non-ASCII elements in the plot file: >> >> 1) the name of the output file (as in "set output 'OOO'"), which for >> MS Windows in Russian will be encoded in CP1251; and >> >> 2) strings occurring in titles, labels or whatever in the body of >> the plot: by default these will be in UTF-8, which is what pngcairo >> expects. >> >> At present I'm sticking a line into the plot file: >> >> set encoding utf8 >> >> which I hope is going to tell gnuplot, "Whatever you might think >> based on the fact that you're working on Windows in Russian, please >> interpret titles/labels as being in UTF-8." > > That much is fine. It also has the effect, for the png terminal and > some others, that when you specify a font by name it will try to find > a version of it that uses your specified encoding. OK so far! >> So here's the question: given that the output filename is in CP1251, >> is my "set encoding" line liable to interfere with gnuplot's output >> routine (for example, such that output cannot be written because >> some non-ASCII component of the path is non-existent, if the bytes >> are interpreted as UTF-8), or is gnuplot's I/O mechanism separate >> and insulated from "set encoding"? > > Gnuplot does not care what is in the string used as a file name. > Linux/unix also does not care what is in the string used as a file name. > Any sequence of bytes is a legal filename even if is not printable. > Windows - I'm not so sure. There are two ways that it might go wrong > on windows that I have heard of, and I suppose they might interact > badly. > Caveat: I don't use Windows myself, so I'm only repeating what I have > seen mentioned elsewhere. > > (1) Windows filesystems only allow certain encodings for file > names, and UTF-8 is not one of the allowed encodings. > https://msdn.microsoft.com/en-us/library/windows/desktop/dd317748(v=vs.85).aspx > > (2) At least some incarnations of Windows used a magic byte sequence > known as BOM to indicate the encoding used by a text file. If your gnuplot > script file contains UTF-8 anything, some Windows machines are unhappy > if it does not start with BOM. On the other hand if it _does_ start with BOM > then strings in the script file that are really CP1251 rather than UTF-8 > might (I am guessing) be converted inappropriately. > > So I think your question is actually a Windows + script file format question > rather than anything specific to gnuplot. I doubt that "set encoding" > matters, but mixing UTF-8 and CP1251 in the same script file may > be intrinsically problematic on Windows. I ran an experiment to try to assess this. Booted Windows 8 (ugh) and created a directory named Beauté (that's with an e-acute) on my Desktop. I then created two copies of a simple gnuplot script to produce a PNG file. Each included the line set output 'c:/users/cottrell/desktop/Beauté/test.png' (encoded in cp1251). The two files were identical except that one of them included the line set encoding utf8 before the "set output" line. (And the accented character in the output filename was the only non-ASCII character in the files.) I then called wgnuplot.exe on the two scripts from the command line in a cmd.exe window. The one without "set encoding utf8" worked to produce the PNG, the other didn't. To see what was happening I then tried opening wgnuplot interactively and using the "load" command to run the scripts. The variant without "set encoding" again worked fine; the other one gave: set output 'c:/users/cottrell/desktop/Beaut?/test.png' cannot open file; output not changed (note that in gnuplot's error message echoing the "set output" line the e-acute has been changed to a question mark, actually not an ASCII question mark but an "unrecognized glyph" symbol). It therefore seems that "set encoding" has somehow altered gnuplot's reading of the bytes in the output filename. (Once again, those bytes are identical in the two files.) If gnuplot had simply passed the incoming cp1251 bytes to the OS, surely the output file would have been opened OK in both cases. Allin Cottrell |