LilyIssues / Issues / #2173 Deal with file names not encoded in UTF-8

Google Importer - 2012-01-03

Originally posted by: dak@gnu.org

We have

static void
setup_localisation ()
{
#if HAVE_GETTEXT
/* Enable locales */
setlocale (LC_ALL, "");

/* FIXME: check if this is still true.
     Disable localisation of float values. */
setlocale (LC_NUMERIC, "C");

string localedir = LOCALEDIR;
if (char const *env = getenv ("LILYPOND_LOCALEDIR"))
    localedir = env;

bindtextdomain ("lilypond", localedir.c_str ());
textdomain ("lilypond");
#endif
}

in main.cc. This is arguably wrong if LilyPond requires use of an
UTF-8 locale.

<URL:http://stackoverflow.com/questions/2050973/what-encoding-are-filenames-in-ntfs-stored-as>
weakly suggests that Windows might be using the currently set locale
for encoding/decoding file names with the usual function calls. To
check if that is the case: Wilbert, can you check what possibilities
you have setting environments variables and/or other things to suggest
to Windows that you are running with a particular character encoding?

I remember that on Linux, perl is rather annoyed when getting bad
locales, so it might be worth trying whether it is useful as a litmus
test on Windows as well:

dak@lola:~$ LC_ALL=gdg perl
perl: warning: Setting locale failed.
perl: warning: Please check that your locale settings:
    LANGUAGE = "en",
    LC_ALL = "gdg",
    LC_MESSAGES = "en_US.UTF-8",
    LC_COLLATE = "en_US.UTF-8",
    LC_CTYPE = "en_US.UTF-8",
    LANG = "en_US.UTF-8"
    are supported and installed on your system.
perl: warning: Falling back to the standard locale ("C").

Maybe we really don't need much more than the right call to setlocale
to have everything peachy.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Google Importer - 2012-01-03

Originally posted by: dak@gnu.org

<URL:http://msdn.microsoft.com/en-us/library/x99tb11d%28v=VS.100%29.aspx> appears to suggest that Windows can't to UTF-8. If that's true, we will need to recode UTF-8 into UTF-16 (as the easiest option, probably not supporting 21-bit Unicode with surrogates as that would be of comparable complexity as UTF-8 is) and back again for accessing file names. Or writing to the terminal. Or pretty much anything else.

I have a hard time really believing that. But then I tend to have a hard time believing a lot of things about Windows.

Anybody with an actual Windows system who could corroborate one way or the other?

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Google Importer - 2012-01-03

Originally posted by: PhilEHol...@googlemail.com

Windows certainly supports UTF-8 characters. As a general rule, adding an accented character to a file and saving as non-UTF-8 gives a "lilypond-windows.exe:4840): Pango-WARNING **: Invalid UTF-8 string passed to pango_layout_set_text()" error. However, saving as UTF-8 processes the text correctly. If there are no accented (i.e. non-ASCII) characters, then it's irrelevant on Windows whether it's UTF or ASCII - Lilypond processes both equally well.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Google Importer - 2012-01-03

Originally posted by: dak@gnu.org

That just means that Windows supports byte streams. Pango is not a Windows component.

UTF-8 support would mean supporting UTF-8 on the command line, in console input and output, and, regarding this issue, in filenames. The _names_ of the files are the problem, not the contents. How to interpret the contents is up to the application, and LilyPond uses Pango for typesetting them.

The question is whether the error messages on the console look sensibly, and whether file names are input and output as UTF-8 in a readable (and identical) manner on both console as well as log files.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Google Importer - 2012-01-03

Originally posted by: k-ohara5...@oco.net

1) Probably not the issue: In Windows the encoding used by the console is chosen by the user. This is analogous to setting the file encoding of a text editor.

We can use `chcp 65001` (set code-page to 65001) to set the the console encoding to utf-8. Then if we ask LilyPond to process this input from a UTF-8-encoded file,
#(display "schöne weiß Łodz ы́Бʁ فينا ζ غ")
we will see all the characters that out chosen font can print, boxes for the rest.

Redirected output from `lilypond test.ly > out.txt` goes byte-by-byte into the file *regardless* of console settings. So, I just set the editor or receiving program to expect UTF-8 and never bother with `chcp 65001`.

2) I confirm Wilbert's observations in the email linked at the first post.
Although Windows utilities like 'dir > out.txt' encode file names according to the `chcp` setting, Lilypond messages containing filenames remain in what looks like Latin1 encoding (even when some characters are not in Latin1).

It might be nice to use the UTF-8 for any text LilyPond outputs. Maybe MINGW, with the proper LOCALE, implements a translation between UTF-8 and the bytes it needs to send to the file system...

However, Windows users with various native languages are making LilyPond work, so caution, or laziness, seems wise.

Labels: Type-Enhancement

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Google Importer - 2015-03-11

Originally posted by: simon.al...@mail.de

See issue 4317. Should this be merged?

Related

Issues: ~~#4317~~

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Google Importer - 2015-03-11

Originally posted by: dak@gnu.org

Looks like it.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Simon Albrecht - 2016-12-17

I’m closing issue 4317 right now, merely because this one came first. Both have some amount of analysis and discussion in them, but it doesn’t make sense to have two open issues.

Related

Issues: ~~#4317~~

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Masamichi Hosoda - 2020-02-01

Add Unicode filename support for Windows 10 1903+

On Windows 10 1903 and above, -A APIs support UTF-8 encodings.
This commit enables it by adding an application manifest.
https://docs.microsoft.com/en-us/windows/uwp/design/globalizing/use-utf8-code-page

If you use older Windows,
it has no effect and the behavior is the same as before.

For Cygwin environments, UTF-8 can be used originally,
so this commit doesn't add the application manifest.

http://codereview.appspot.com/575600044

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Anonymous - 2020-02-01

Description has changed:

Diff:

assigned_to: Masamichi Hosoda

Needs: -->

Type: -->
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Anonymous - 2020-02-01

Patch: new --> review
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Anonymous - 2020-02-01

Passes make. make test-baseline and a full make doc.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Anonymous - 2020-02-02

Patch: review --> countdown
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Anonymous - 2020-02-02

Patch on countdown for Feb 4th

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Anonymous - 2020-02-04

Patch: countdown --> push
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Anonymous - 2020-02-04

Patch counted down - please push.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Masamichi Hosoda - 2020-02-04

labels: --> Fixed_2_21_0

status: Started --> Fixed

Patch: push -->
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Masamichi Hosoda - 2020-02-04

I've pushed.

commit 62635fe1155fba0c91569d6785906100a8f2e88c

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Deal with file names not encoded in UTF-8

Issue Tracker for LilyPond

Searches

Help

#2173 Deal with file names not encoded in UTF-8

Discussion

Related

Related