Ok, that's a fair use case. Please add it to the Wiki.

It is pretty horrible that the OS allows such shenanigans, but I guess that's a price we have to pay for building OS-es before Unicode existed.

Nothing prevents you from adding a \x80 at the end of the file name in the YAML file, using VI or Notepad or whatever. I bet that some YAML libraries will even load it into a non-valid UTF-8 "string" in memory, "illegal" though it may be.

But does this mean we need to mandate that all YAML implementations silently create invalid UTF-8 strings in memory? That seems a bit excessive...

If I had to deal with this use case, I'd use something like:

    filename: !badstr BadName\x80

Which is perfectly valid YAML. The application is fairly warned the scalar is an invalid string and is free to deal with it as it sees fit - up to and including loading it into a normal string object, if that works for it. At the same time, "innocent" YAML applications are not exposed to random exceptions raised by their too-strict string libraries.

Have fun,

    Oren Ben-Kiki

On Mon, Oct 31, 2011 at 8:31 PM, William Spitzak <spitzak@rhythm.com> wrote:
1. Set your locale to a UTF-8 one, as is default on all modern systems

2. Create GoodFile with non-ASCII UTF-8 characters in the filename

3. Using low-level code, create BadFile with invalid UTF-8 in the filename...

4. Now imagine a YAML file that has a structure that has a "filename" member...