On Thu, May 23, 2013 at 3:43 PM, Stanislav Frolov <frolosofsky@gmail.com> wrote:
On Thursday 23 May 2013 08:01:51 Matthew Mondor wrote:
> POSIX filenames may contain bytes which are often used to hold UTF-8
> characters on filesystems which allow this, but that too is only one of
> the available encoding options
I understand OS and locale specifics, but this solution seems an ugly low-level
hack for cross-platform high-level language. Am I wrong? Information about OS
is available in compilation phase, about locale - in runtime.

ECL does not do locales: it only does Unicode or Latin-1, depending on how you build it. You may use other codepages for data and external representations for streams, but you have to tell ECL which representation to use explicitly (except for the terminal on Windows)

However, filenames are not just about locales. The problem with filenames is that there is not a unique representation for filenames with extended characters, given that all filesystems define names based on bytes. If you read the link Matthew pass you, there I commented a very precise problem: on Windows, OS X and on Linux, programs encode filenames using utf-8, but there is not a unique representation for this, because such encodings include (or not) reordering of characters (normalization). Thus you may create a file with a Unicode name and retrieve that name from the operating system and it would be a non-equalp string (equal under Unicode's normalized string comparisons, though)

I have not yet solved this problem and you may complain, but it will not help much :-)


Instituto de Física Fundamental, CSIC
c/ Serrano, 113b, Madrid 28006 (Spain)