From: Allin C. <cot...@wf...> - 2020-10-25 18:08:24
|
I tried googling this but didn't find an answer -- sorry if I should have just tried harder! My question is: can gnuplot on Windows handle a unicode filename argument passed in UTF-16? As in path/to/wgnuplot.exe <UTF-16 input filename> TIA. -- Allin Cottrell Department of Economics Wake Forest University |
From: Allin C. <cot...@wf...> - 2020-11-01 00:36:31
Attachments:
misc.c.diff
winmain.c.diff
|
On Sun, 25 Oct 2020, Allin Cottrell wrote: > I tried googling this but didn't find an answer -- sorry if I should have > just tried harder! My question is: can gnuplot on Windows handle a unicode > filename argument passed in UTF-16? As in > > path/to/wgnuplot.exe <UTF-16 input filename> OK, that question was under-researched, but now I've done my homework. Sorry, this is a bit long but I hope I can arouse some interest in the topic. Why bother with UTF-16 filename arguments? Nowadays a fair number of Windows users construct paths (directory names or filenames) which are "out of codepage" -- that is, unicode names which cannot be represented in the (retro) "system codepage", which is typically just an 8-bit encoding. Since Windows has supported unicode since NT came out, it's a reasonable expectation that any filename one can construct on the platform should be accessible via any program of interest. But a program that restricts itself to the "ANSI" form of filenames simply cannot access files with out-of-codepage paths. (Sane modern OSes don't have this problem because they use UTF-8 throughout.) So what about gnuplot? I may be wrong but it seems to me that gnuplot on Windows is stuck with "ANSI" filenames at present. Even with UNICODE and _UNICODE defined when compiling the program, the command-line arguments are retrieved in winmain.c using either _argv or __argv (depending on the compiler), and these get the ANSI-form arguments (as opposed to __wargv which gets the arguments in UTF-16 form). It would be easy to swap out __argv for __wargv but by itself this would be very disruptive. The subsequent code in winmain.c, and then the code in plot.c (gnu_main) to which the args array is passed, all assumes the elements of argv are plain "char *", not "wide char" arrays. Handling UTF-16, which is chock-full of NUL bytes, would require lots of messy "ifdefs". I have a proposal for fixing this. I realise it may not be acceptable as it stands but maybe someone else might want to take it up. I'm attaching patches for src/win/winmain.c and src/misc.c for reference but here I'll try to explain the strategy. 1) In winmain.c, grab the command-line arguments as UTF-16 but immediately convert them to UTF-8, so they can handled by the regular string.h APIs, both here and in plot.c (gnu_main). 2) When we actually go to open a command-line file argument (loadpath_fopen, in misc.c, called from gnu_main), we first try opening the file using the filename as-is, but it that fails (and the filename validates as UTF-8) we convert it to UTF-16 and try again. Since UTF-8 is a superset of ASCII, ASCII filename arguments should pass through transparently. Within-codepage non-ASCII filenames should get converted back to UTF-16 and opened OK. And the bonus is that out-of-codepage arguments should also be converted and opened OK. I've tested this on Windows 10, with the system codepage set to Windows 1252 ("Western Europe"), and have successfully opened files with names in Russian and Greek. (I think this should also work if the user has the system codepage set to UTF-8 (65001), which is a "beta" option on Windows.) My implementation uses GLib APIs (nice and simple) to convert from UTF-16 to UTF-8 and back again (if needed). GLib is required anyway if one is building the Cairo-based terminals. I suppose one could use native Windows APIs to the same purpose but I suspect it would be a lot more bother. In my test setup this whole deal is triggered by the CFLAGS define -DWIDE_ARGS which is respected only when building for Windows -- and admittedly has only been tested when cross-compiling for Windows from Linux using Mingw-w64. In my mingw Makefile, I have: WIDE_ARGS = 1 ... ifdef WIDE_ARGS CFLAGS += -DWIDE_ARGS CFLAGS += $(shell pkg-config --cflags glib-2.0) endif -- Allin Cottrell Department of Economics Wake Forest University |
From: Ethan A M. <me...@uw...> - 2020-11-01 17:44:10
|
On Saturday, 31 October 2020 17:36:03 PST Allin Cottrell wrote: > On Sun, 25 Oct 2020, Allin Cottrell wrote: > > > I tried googling this but didn't find an answer -- sorry if I should have > > just tried harder! My question is: can gnuplot on Windows handle a unicode > > filename argument passed in UTF-16? As in > > > > path/to/wgnuplot.exe <UTF-16 input filename> > > OK, that question was under-researched, but now I've done my > homework. Sorry, this is a bit long but I hope I can arouse some > interest in the topic. I don't have any direct insight into this issue other than to note that the filesytem itself may be an issue. The standard Windows filesystems impose an encoding on filenames, whereas linux filesystems are agnostic to encoding; any null-terminated byte sequence not containing '/' is a legal file name. The following entry from the R developer blog is of interest https://developer.r-project.org/Blog/public/2020/05/02/utf-8-support-on-windows/ I gather from the discussion there that Windows-10 can be made to support UTF-8 as a native encoding, calling it "extended ASCII". In that mode R (and I suppose gnuplot) can use the existing generic linux code paths rather than multiple layers of text conversion. Ethan > > Why bother with UTF-16 filename arguments? Nowadays a fair number of > Windows users construct paths (directory names or filenames) which > are "out of codepage" -- that is, unicode names which cannot be > represented in the (retro) "system codepage", which is typically > just an 8-bit encoding. Since Windows has supported unicode since NT > came out, it's a reasonable expectation that any filename one can > construct on the platform should be accessible via any program of > interest. But a program that restricts itself to the "ANSI" form of > filenames simply cannot access files with out-of-codepage paths. > (Sane modern OSes don't have this problem because they use UTF-8 > throughout.) > > So what about gnuplot? I may be wrong but it seems to me that > gnuplot on Windows is stuck with "ANSI" filenames at present. Even > with UNICODE and _UNICODE defined when compiling the program, the > command-line arguments are retrieved in winmain.c using either _argv > or __argv (depending on the compiler), and these get the ANSI-form > arguments (as opposed to __wargv which gets the arguments in UTF-16 > form). > > It would be easy to swap out __argv for __wargv but by itself this > would be very disruptive. The subsequent code in winmain.c, and then > the code in plot.c (gnu_main) to which the args array is passed, all > assumes the elements of argv are plain "char *", not "wide char" > arrays. Handling UTF-16, which is chock-full of NUL bytes, would > require lots of messy "ifdefs". > > I have a proposal for fixing this. I realise it may not be > acceptable as it stands but maybe someone else might want to take it > up. I'm attaching patches for src/win/winmain.c and src/misc.c for > reference but here I'll try to explain the strategy. > > 1) In winmain.c, grab the command-line arguments as UTF-16 but > immediately convert them to UTF-8, so they can handled by the > regular string.h APIs, both here and in plot.c (gnu_main). > > 2) When we actually go to open a command-line file argument > (loadpath_fopen, in misc.c, called from gnu_main), we first try > opening the file using the filename as-is, but it that fails (and > the filename validates as UTF-8) we convert it to UTF-16 and try > again. > > Since UTF-8 is a superset of ASCII, ASCII filename arguments should > pass through transparently. Within-codepage non-ASCII filenames > should get converted back to UTF-16 and opened OK. And the bonus is > that out-of-codepage arguments should also be converted and opened > OK. > > I've tested this on Windows 10, with the system codepage set to > Windows 1252 ("Western Europe"), and have successfully opened files > with names in Russian and Greek. (I think this should also work if > the user has the system codepage set to UTF-8 (65001), which is a > "beta" option on Windows.) > > My implementation uses GLib APIs (nice and simple) to convert from > UTF-16 to UTF-8 and back again (if needed). GLib is required anyway > if one is building the Cairo-based terminals. I suppose one could > use native Windows APIs to the same purpose but I suspect it would > be a lot more bother. > > In my test setup this whole deal is triggered by the CFLAGS define > > -DWIDE_ARGS > > which is respected only when building for Windows -- and admittedly > has only been tested when cross-compiling for Windows from Linux > using Mingw-w64. In my mingw Makefile, I have: > > WIDE_ARGS = 1 > > ... > > ifdef WIDE_ARGS > CFLAGS += -DWIDE_ARGS > CFLAGS += $(shell pkg-config --cflags glib-2.0) > endif > > -- > Allin Cottrell > Department of Economics > Wake Forest University |
From: Allin C. <cot...@wf...> - 2020-11-01 20:24:37
|
On Sun, 1 Nov 2020, Ethan A Merritt wrote: > On Saturday, 31 October 2020 17:36:03 PST Allin Cottrell wrote: >> On Sun, 25 Oct 2020, Allin Cottrell wrote: >> >>> I tried googling this but didn't find an answer -- sorry if I should have >>> just tried harder! My question is: can gnuplot on Windows handle a unicode >>> filename argument passed in UTF-16? As in >>> >>> path/to/wgnuplot.exe <UTF-16 input filename> >> >> OK, that question was under-researched, but now I've done my >> homework. Sorry, this is a bit long but I hope I can arouse some >> interest in the topic. > > I don't have any direct insight into this issue other than to note > that the filesytem itself may be an issue. In some contexts, no doubt. But if we set aside exotica such as surrogate pairs, NTFS filenames are UTF-16 to a very good approximation. As such they are easily converted to UTF-8 to permit handling with good old C char * APIs, and easily converted back to UTF-16 for _wfopen() if required. > The following entry from the R developer blog is of interest > > https://developer.r-project.org/Blog/public/2020/05/02/utf-8-support-on-windows/ > > I gather from the discussion there that Windows-10 can be made to > support UTF-8 as a native encoding, calling it "extended ASCII". Interesting, yes, but at this point kinda science fiction. The practical issue at present is whether gnuplot wants to support out-of-codepage UTF-16 filenames on the Windows command line. It's not terribly difficult, as I tried to show. Sorry if I'm being repetitive, but right now if a create, say, a Russian-language filename on Windows and pass it as command-line argument to gnuplot, gnuplot will not be able to open the file because its name cannot be represented in my "system codepage". A program that reads the command line as UTF-16, however, will have no problem opening the file. Allin Cottrell |
From: Bastian M. <bma...@we...> - 2020-11-02 09:49:12
|
Right now, gnuplot is able to "load" file names with Unicode encoded names, i.e. the sequence set encoding utf8 load 'абвгдежзийклмнопрстуфхцчшщъыьэюяѐёђѓєѕіїјљњћќѝўџ.plt' will work just fine. Only loading via the command line does not work and that should indeed be improved. (I don't agree to use glib for that purpose - but that is easy to change). Bastian > -----Ursprüngliche Nachricht----- > Von: Allin Cottrell <cot...@wf...> > Gesendet: Sonntag, 1. November 2020 21:24 > An: Ethan A Merritt <me...@uw...> > Cc: gnuplot-beta <gnu...@li...> > Betreff: Re: filenames on MS Windows > > On Sun, 1 Nov 2020, Ethan A Merritt wrote: > > > On Saturday, 31 October 2020 17:36:03 PST Allin Cottrell wrote: > >> On Sun, 25 Oct 2020, Allin Cottrell wrote: > >> > >>> I tried googling this but didn't find an answer -- sorry if I should > >>> have just tried harder! My question is: can gnuplot on Windows > >>> handle a unicode filename argument passed in UTF-16? As in > >>> > >>> path/to/wgnuplot.exe <UTF-16 input filename> > >> > >> OK, that question was under-researched, but now I've done my > >> homework. Sorry, this is a bit long but I hope I can arouse some > >> interest in the topic. > > > > I don't have any direct insight into this issue other than to note > > that the filesytem itself may be an issue. > > In some contexts, no doubt. But if we set aside exotica such as surrogate pairs, > NTFS filenames are UTF-16 to a very good approximation. As such they are > easily converted to UTF-8 to permit handling with good old C char * APIs, and > easily converted back to > UTF-16 for _wfopen() if required. > > > The following entry from the R developer blog is of interest > > > > > > https://developer.r-project.org/Blog/public/2020/05/02/utf-8-support-o > > n-windows/ > > > > I gather from the discussion there that Windows-10 can be made to > > support UTF-8 as a native encoding, calling it "extended ASCII". > > Interesting, yes, but at this point kinda science fiction. The practical issue at > present is whether gnuplot wants to support out-of-codepage UTF-16 > filenames on the Windows command line. It's not terribly difficult, as I tried to > show. > > Sorry if I'm being repetitive, but right now if a create, say, a Russian-language > filename on Windows and pass it as command-line argument to gnuplot, > gnuplot will not be able to open the file because its name cannot be > represented in my "system codepage". A program that reads the command line > as UTF-16, however, will have no problem opening the file. > > Allin Cottrell |
From: Bastian M. <bma...@we...> - 2020-11-02 10:24:29
|
Allin, Thank you for this nice contribution. On Windows, we already replace fopen() (and many other functions), though. In particular, win_fopen() in winmain.c already handles encodings, including UTF-8. There are also two functions AnsiText() and UnicodeText() to convert to/from UTF16 according to gnuplot's encoding. They use the simple-to-use Windows functions WideCharToMultiByte() MultiByteToWideChar(), so no need for glib. That really poses the question if we should change the default internal encoding from "ANSI" (whatever that may be depends on the Windows locale) to UTF-8. I agree with that (and in fact my personal gnuplot.ini includes "set encoding utf8" since a long time). But how do we make this backward compatible since that will inevitably break load commands in old "ANSI" encoded user scripts? (as does your current patch btw) Possible solutions include a new command line option (like -u / --utf8) or a wgnuplot.ini setting. Bastian > -----Ursprüngliche Nachricht----- > Von: Allin Cottrell <cot...@wf...> > Gesendet: Sonntag, 1. November 2020 01:36 > An: gnuplot-beta <gnu...@li...> > Betreff: Re: filenames on MS Windows > > On Sun, 25 Oct 2020, Allin Cottrell wrote: > > > I tried googling this but didn't find an answer -- sorry if I should > > have just tried harder! My question is: can gnuplot on Windows handle > > a unicode filename argument passed in UTF-16? As in > > > > path/to/wgnuplot.exe <UTF-16 input filename> > > OK, that question was under-researched, but now I've done my homework. > Sorry, this is a bit long but I hope I can arouse some interest in the topic. > > Why bother with UTF-16 filename arguments? Nowadays a fair number of > Windows users construct paths (directory names or filenames) which are "out > of codepage" -- that is, unicode names which cannot be represented in the > (retro) "system codepage", which is typically just an 8-bit encoding. Since > Windows has supported unicode since NT came out, it's a reasonable > expectation that any filename one can construct on the platform should be > accessible via any program of interest. But a program that restricts itself to the > "ANSI" form of filenames simply cannot access files with out-of-codepage > paths. > (Sane modern OSes don't have this problem because they use UTF-8 > throughout.) > > So what about gnuplot? I may be wrong but it seems to me that gnuplot on > Windows is stuck with "ANSI" filenames at present. Even with UNICODE and > _UNICODE defined when compiling the program, the command-line arguments > are retrieved in winmain.c using either _argv or __argv (depending on the > compiler), and these get the ANSI-form arguments (as opposed to __wargv > which gets the arguments in UTF-16 form). > > It would be easy to swap out __argv for __wargv but by itself this would be > very disruptive. The subsequent code in winmain.c, and then the code in plot.c > (gnu_main) to which the args array is passed, all assumes the elements of argv > are plain "char *", not "wide char" > arrays. Handling UTF-16, which is chock-full of NUL bytes, would require lots of > messy "ifdefs". > > I have a proposal for fixing this. I realise it may not be acceptable as it stands > but maybe someone else might want to take it up. I'm attaching patches for > src/win/winmain.c and src/misc.c for reference but here I'll try to explain the > strategy. > > 1) In winmain.c, grab the command-line arguments as UTF-16 but immediately > convert them to UTF-8, so they can handled by the regular string.h APIs, both > here and in plot.c (gnu_main). > > 2) When we actually go to open a command-line file argument > (loadpath_fopen, in misc.c, called from gnu_main), we first try opening the file > using the filename as-is, but it that fails (and the filename validates as UTF-8) > we convert it to UTF-16 and try again. > > Since UTF-8 is a superset of ASCII, ASCII filename arguments should pass > through transparently. Within-codepage non-ASCII filenames should get > converted back to UTF-16 and opened OK. And the bonus is that out-of- > codepage arguments should also be converted and opened OK. > > I've tested this on Windows 10, with the system codepage set to Windows > 1252 ("Western Europe"), and have successfully opened files with names in > Russian and Greek. (I think this should also work if the user has the system > codepage set to UTF-8 (65001), which is a "beta" option on Windows.) > > My implementation uses GLib APIs (nice and simple) to convert from > UTF-16 to UTF-8 and back again (if needed). GLib is required anyway if one is > building the Cairo-based terminals. I suppose one could use native Windows > APIs to the same purpose but I suspect it would be a lot more bother. > > In my test setup this whole deal is triggered by the CFLAGS define > > -DWIDE_ARGS > > which is respected only when building for Windows -- and admittedly has only > been tested when cross-compiling for Windows from Linux using Mingw-w64. > In my mingw Makefile, I have: > > WIDE_ARGS = 1 > > ... > > ifdef WIDE_ARGS > CFLAGS += -DWIDE_ARGS > CFLAGS += $(shell pkg-config --cflags glib-2.0) endif > > -- > Allin Cottrell > Department of Economics > Wake Forest University |
From: Allin C. <cot...@wf...> - 2020-11-02 13:30:59
|
On Mon, 2 Nov 2020, Bastian Märkisch wrote: > Allin, > > Thank you for this nice contribution. On Windows, we already replace > fopen() (and many other functions), though. In particular, win_fopen() in > winmain.c already handles encodings, including UTF-8. There are also two > functions AnsiText() and UnicodeText() to convert to/from UTF16 according to > gnuplot's encoding. Yes, I'm aware of that and it's a very nice feature. > They use the simple-to-use Windows functions > WideCharToMultiByte() MultiByteToWideChar(), so no need for glib. Ah, OK. > That really poses the question if we should change the default internal > encoding from "ANSI" (whatever that may be depends on the Windows locale) to > UTF-8. I agree with that (and in fact my personal gnuplot.ini includes "set > encoding utf8" since a long time). > > But how do we make this backward compatible since that will inevitably break > load commands in old "ANSI" encoded user scripts? (as does your current > patch btw) Are you sure about that breakage, Bastian? I may be missing something, but here's my thinking: (1) I convert UTF-16 -> UTF-8 only for filenames coming off the Windows command-line in winmain.c, and (2) in my modified loadpath_fopen() I first try fopen() on the filename as-is, converting UTF-8 -> UTF-16 and calling _wfopen() only if fopen() fails and the filename validates as UTF-8. So I don't _think_ my patch is going to touch filenames read via the load command in a gnuplot script. Allin Cottrell |
From: Allin C. <cot...@wf...> - 2020-11-03 15:42:24
|
On Mon, 2 Nov 2020, Allin Cottrell wrote: > On Mon, 2 Nov 2020, Bastian Märkisch wrote: [...] > >> That really poses the question if we should change the default >> internal encoding from "ANSI" (whatever that may be depends on >> the Windows locale) to UTF-8. I agree with that (and in fact my >> personal gnuplot.ini includes "set encoding utf8" since a long >> time). >> >> But how do we make this backward compatible since that will >> inevitably break load commands in old "ANSI" encoded user >> scripts? (as does your current patch btw) > > Are you sure about that breakage, Bastian? [...] I'm not sure I've understood the circumstances under which you reckon my patch would break load commands, but I just tried an experiment which I think is relevant. This is on Windows 10 with system codepage 1252, using my patched wgnuplot.exe. I placed a gnuplot script in a directory named with Cyrillic characters (so not representable in CP1252), reading as follows: # loader script set encoding cp1252 load '<CP1252 filename>' where <CP1252 filename> is the full path to a second script, in a directory with a non-ASCII name representable in CP1252. (It contains an o-circumflex.) The second script reads thus: # plot script set term pngcairo set output '<CP1252 output>' plot sin(x) where <CP1252 output> is again a full path including the non-ASCII but CP-friendly directory name. I then called wgnuplot.exe on the "loader" script, passing its filename as UTF-16, and it all went fine: the loader was read OK, from its Cyrillic location; the load command worked; and the PNG was written successfully. -- Allin Cottrell Department of Economics Wake Forest University |
From: Ethan A M. <me...@uw...> - 2020-11-02 19:04:43
|
On Monday, 2 November 2020 01:48:42 PST Bastian Märkisch wrote: > Right now, gnuplot is able to "load" file names with Unicode encoded names, i.e. the sequence > set encoding utf8 > load 'абвгдежзийклмнопрстуфхцчшщъыьэюяѐёђѓєѕіїјљњћќѝўџ.plt' > will work just fine. But only if the encoding is set to utf8, right? Which is a bit counter-intuitive since on Windows the filename is not actually utf8. I had not realized before this that the encoding could affect how files are opened. The documentation under "encoding" and/or "utf8" could be expanded. Maybe also "set print|out|table" or add some general statement about filenames? Ethan > Only loading via the command line does not work and that should indeed be improved. (I don't agree to use glib for that purpose - but that is easy to change). > > Bastian > > > -----Ursprüngliche Nachricht----- > > Von: Allin Cottrell <cot...@wf...> > > Gesendet: Sonntag, 1. November 2020 21:24 > > An: Ethan A Merritt <me...@uw...> > > Cc: gnuplot-beta <gnu...@li...> > > Betreff: Re: filenames on MS Windows > > > > On Sun, 1 Nov 2020, Ethan A Merritt wrote: > > > > > On Saturday, 31 October 2020 17:36:03 PST Allin Cottrell wrote: > > >> On Sun, 25 Oct 2020, Allin Cottrell wrote: > > >> > > >>> I tried googling this but didn't find an answer -- sorry if I should > > >>> have just tried harder! My question is: can gnuplot on Windows > > >>> handle a unicode filename argument passed in UTF-16? As in > > >>> > > >>> path/to/wgnuplot.exe <UTF-16 input filename> > > >> > > >> OK, that question was under-researched, but now I've done my > > >> homework. Sorry, this is a bit long but I hope I can arouse some > > >> interest in the topic. > > > > > > I don't have any direct insight into this issue other than to note > > > that the filesytem itself may be an issue. > > > > In some contexts, no doubt. But if we set aside exotica such as surrogate pairs, > > NTFS filenames are UTF-16 to a very good approximation. As such they are > > easily converted to UTF-8 to permit handling with good old C char * APIs, and > > easily converted back to > > UTF-16 for _wfopen() if required. > > > > > The following entry from the R developer blog is of interest > > > > > > > > > https://developer.r-project.org/Blog/public/2020/05/02/utf-8-support-o > > > n-windows/ > > > > > > I gather from the discussion there that Windows-10 can be made to > > > support UTF-8 as a native encoding, calling it "extended ASCII". > > > > Interesting, yes, but at this point kinda science fiction. The practical issue at > > present is whether gnuplot wants to support out-of-codepage UTF-16 > > filenames on the Windows command line. It's not terribly difficult, as I tried to > > show. > > > > Sorry if I'm being repetitive, but right now if a create, say, a Russian-language > > filename on Windows and pass it as command-line argument to gnuplot, > > gnuplot will not be able to open the file because its name cannot be > > represented in my "system codepage". A program that reads the command line > > as UTF-16, however, will have no problem opening the file. > > > > Allin Cottrell > > -- Ethan A Merritt Biomolecular Structure Center, K-428 Health Sciences Bldg MS 357742, University of Washington, Seattle 98195-7742 |
From: Allin C. <cot...@wf...> - 2020-11-02 20:37:43
|
On Mon, 2 Nov 2020, Ethan A Merritt wrote: > On Monday, 2 November 2020 01:48:42 PST Bastian Märkisch wrote: >> Right now, gnuplot is able to "load" file names with Unicode encoded names, i.e. the sequence >> set encoding utf8 >> load 'абвгдежзийклмнопрстуфхцчшщъыьэюяѐёђѓєѕіїјљњћќѝўџ.plt' >> will work just fine. > > But only if the encoding is set to utf8, right? > Which is a bit counter-intuitive since on Windows the > filename is not actually utf8. Counter-intuitive maybe, but it's very convenient. "set encoding utf8" announces that filenames in the gnuplot script will be UTF-8 encoded (and the script will be readable cross-platform), but thanks to Bastian gnuplot knows they need to be recoded in the background to UTF-16 for reading from disk on Windows. -- Allin Cottrell Department of Economics Wake Forest University |
From: Bastian M. <bma...@we...> - 2020-11-03 15:57:31
|
> Gesendet: Montag, 02. November 2020 um 21:37 Uhr > Von: "Allin Cottrell" <cot...@wf...> > An: "Ethan A Merritt" <me...@uw...> > Cc: "Bastian Märkisch" <bma...@we...>, "gnuplot-beta" <gnu...@li...> > Betreff: Re: AW: filenames on MS Windows > > On Mon, 2 Nov 2020, Ethan A Merritt wrote: > > > On Monday, 2 November 2020 01:48:42 PST Bastian Märkisch wrote: > >> Right now, gnuplot is able to "load" file names with Unicode encoded names, i.e. the sequence > >> set encoding utf8 > >> load 'абвгдежзийклмнопрстуфхцчшщъыьэюяѐёђѓєѕіїјљњћќѝўџ.plt' > >> will work just fine. > > > > But only if the encoding is set to utf8, right? > > Which is a bit counter-intuitive since on Windows the > > filename is not actually utf8. > > Counter-intuitive maybe, but it's very convenient. "set encoding > utf8" announces that filenames in the gnuplot script will be UTF-8 > encoded (and the script will be readable cross-platform), but thanks > to Bastian gnuplot knows they need to be recoded in the background > to UTF-16 for reading from disk on Windows. Windows mostly uses UTF16 to support Unicode. gnuplot uses char-based (byte) encodings only and - as many other programs from the *nix world - uses UTF-8 for Unicode. Internally, gnuplot will use whatever encoding it is told to use by the user via "set encoding". The translation from and to UTF16 for input, output, file names, pipes, clipboard interaction etc. is transparent to the user. This translation is not done for command line arguments yet as pointed out by Allin. The scheme itself is common practise to "port" applications and was introduced in 2016. (Note that as of now, file _content_ cannot be read or written in UTF16 encoding.) Bastian |