From: John H. <joh...@or...> - 2009-12-04 09:15:30
|
On 03/12/09 22:58, Matt Turner wrote: > I was wondering about dealing with non-ascii characters in paths. Simply > put, in what encoding are paths passed to a libfuse client? > > Unless something has happened since I last looked (and I don't think it has) the kernel and fuse neither know nor care what the encoding used for a file name is. With two exceptions. The octets 0x2f (/) and 0x00 are used to separate pathname components and terminate pathnames respectively. Of course, you would have trouble creating a file called "." or ".." (because they already exist) and if you had a file system that didn't have those all kinds of things would break in interesting ways. So, I can create a file called "\x88\x68", "\x68\x88", "\xe8\xa1\xa8" or "\x95\x5c" (representations of U+8868 in UCS-2LE, UTF-16, UTF-8 and SJIS) and how they look depends on the locale that the terminal is using to interpret character representations. The last one is interesting because it ends with 0x5c (\) but the file system doesn't much care about that. It is possible to create a file system that, for example, stores its pathnames as vectors of utf16 strings but you would have to make a decision about how to convert what the is passed into the kernel into UTF-16 -- that would have to be a mount option or something that you derive from somewhere (eg nl_langinfo(CODESET)) but it wouldn't stop someone creating a file called "\x95\x5c" because they happen to want to use SJIS but you were expecting UTF-8. Does that make sense? jch |