Originally created by: *anonymous
Originally created by: dak@gnu.org
<http://permalink.gmane.org/gmane.comp.gnu.lilypond.devel/43243>
Lilypond talks utf-8. If the operating system decides to talk something else on its command line or its file names, appropriate conversions need to be put in place whenever Lilypond displays or generates a file name.
This apparently concerns Windows and likely also other systems.
Originally posted by: dak@gnu.org
We have
static void
setup_localisation ()
{
#if HAVE_GETTEXT
/* Enable locales */
setlocale (LC_ALL, "");
/* FIXME: check if this is still true.
Disable localisation of float values. */
setlocale (LC_NUMERIC, "C");
string localedir = LOCALEDIR;
if (char const *env = getenv ("LILYPOND_LOCALEDIR"))
localedir = env;
bindtextdomain ("lilypond", localedir.c_str ());
textdomain ("lilypond");
#endif
}
in main.cc. This is arguably wrong if LilyPond requires use of an
UTF-8 locale.
<URL:http://stackoverflow.com/questions/2050973/what-encoding-are-filenames-in-ntfs-stored-as>
weakly suggests that Windows might be using the currently set locale
for encoding/decoding file names with the usual function calls. To
check if that is the case: Wilbert, can you check what possibilities
you have setting environments variables and/or other things to suggest
to Windows that you are running with a particular character encoding?
I remember that on Linux, perl is rather annoyed when getting bad
locales, so it might be worth trying whether it is useful as a litmus
test on Windows as well:
dak@lola:~$ LC_ALL=gdg perl
perl: warning: Setting locale failed.
perl: warning: Please check that your locale settings:
LANGUAGE = "en",
LC_ALL = "gdg",
LC_MESSAGES = "en_US.UTF-8",
LC_COLLATE = "en_US.UTF-8",
LC_CTYPE = "en_US.UTF-8",
LANG = "en_US.UTF-8"
are supported and installed on your system.
perl: warning: Falling back to the standard locale ("C").
Maybe we really don't need much more than the right call to setlocale
to have everything peachy.
Originally posted by: dak@gnu.org
<URL:http://msdn.microsoft.com/en-us/library/x99tb11d%28v=VS.100%29.aspx> appears to suggest that Windows can't to UTF-8. If that's true, we will need to recode UTF-8 into UTF-16 (as the easiest option, probably not supporting 21-bit Unicode with surrogates as that would be of comparable complexity as UTF-8 is) and back again for accessing file names. Or writing to the terminal. Or pretty much anything else.
I have a hard time really believing that. But then I tend to have a hard time believing a lot of things about Windows.
Anybody with an actual Windows system who could corroborate one way or the other?
Originally posted by: PhilEHol...@googlemail.com
Windows certainly supports UTF-8 characters. As a general rule, adding an accented character to a file and saving as non-UTF-8 gives a "lilypond-windows.exe:4840): Pango-WARNING **: Invalid UTF-8 string passed to pango_layout_set_text()" error. However, saving as UTF-8 processes the text correctly. If there are no accented (i.e. non-ASCII) characters, then it's irrelevant on Windows whether it's UTF or ASCII - Lilypond processes both equally well.
Originally posted by: dak@gnu.org
That just means that Windows supports byte streams. Pango is not a Windows component.
UTF-8 support would mean supporting UTF-8 on the command line, in console input and output, and, regarding this issue, in filenames. The _names_ of the files are the problem, not the contents. How to interpret the contents is up to the application, and LilyPond uses Pango for typesetting them.
The question is whether the error messages on the console look sensibly, and whether file names are input and output as UTF-8 in a readable (and identical) manner on both console as well as log files.
Originally posted by: k-ohara5...@oco.net
1) Probably not the issue: In Windows the encoding used by the console is chosen by the user. This is analogous to setting the file encoding of a text editor.
We can use `chcp 65001` (set code-page to 65001) to set the the console encoding to utf-8. Then if we ask LilyPond to process this input from a UTF-8-encoded file,
#(display "schöne weiß Łodz ы́Бʁ فينا ζ غ")
we will see all the characters that out chosen font can print, boxes for the rest.
Redirected output from `lilypond test.ly > out.txt` goes byte-by-byte into the file *regardless* of console settings. So, I just set the editor or receiving program to expect UTF-8 and never bother with `chcp 65001`.
2) I confirm Wilbert's observations in the email linked at the first post.
Although Windows utilities like 'dir > out.txt' encode file names according to the `chcp` setting, Lilypond messages containing filenames remain in what looks like Latin1 encoding (even when some characters are not in Latin1).
It might be nice to use the UTF-8 for any text LilyPond outputs. Maybe MINGW, with the proper LOCALE, implements a translation between UTF-8 and the bytes it needs to send to the file system...
However, Windows users with various native languages are making LilyPond work, so caution, or laziness, seems wise.
Labels: Type-Enhancement
Originally posted by: simon.al...@mail.de
See issue 4317. Should this be merged?
Related
Issues:
#4317Originally posted by: dak@gnu.org
Looks like it.
I’m closing issue 4317 right now, merely because this one came first. Both have some amount of analysis and discussion in them, but it doesn’t make sense to have two open issues.
Related
Issues:
#4317Add Unicode filename support for Windows 10 1903+
On Windows 10 1903 and above, -A APIs support UTF-8 encodings.
This commit enables it by adding an application manifest.
https://docs.microsoft.com/en-us/windows/uwp/design/globalizing/use-utf8-code-page
If you use older Windows,
it has no effect and the behavior is the same as before.
For Cygwin environments, UTF-8 can be used originally,
so this commit doesn't add the application manifest.
http://codereview.appspot.com/575600044
Diff:
Passes make. make test-baseline and a full make doc.
Patch on countdown for Feb 4th
Patch counted down - please push.
I've pushed.
commit 62635fe1155fba0c91569d6785906100a8f2e88c