It is pretty horrible that the OS allows such shenanigans, but I guess that's a price we have to pay for building OS-es before Unicode existed.
Nothing prevents you from adding a \x80 at the end of the file name in the YAML file, using VI or Notepad or whatever. I bet that some YAML libraries will even load it into a non-valid UTF-8 "string" in memory, "illegal" though it may be.
But does this mean we need to mandate that all YAML implementations silently create invalid UTF-8 strings in memory? That seems a bit excessive...
If I had to deal with this use case, I'd use something like:
filename: !badstr BadName\x80
Which is perfectly valid YAML. The application is fairly warned the scalar is an invalid string and is free to deal with it as it sees fit - up to and including loading it into a normal string object, if that works for it. At the same time, "innocent" YAML applications are not exposed to random exceptions raised by their too-strict string libraries.
On Mon, Oct 31, 2011 at 8:31 PM, William Spitzak <email@example.com>
1. Set your locale to a UTF-8 one, as is default on all modern systems
2. Create GoodFile with non-ASCII UTF-8 characters in the filename
3. Using low-level code, create BadFile with invalid UTF-8 in the filename...
4. Now imagine a YAML file that has a structure that has a "filename" member...