Got it... yes, I was afraid there could be some inconsistencies regarding how the name is encoded on the file system, which is why I decided to push the detection problem to someone else by providing an override environment variable. :)
In any case, yes the double conversion you pointed out is part of what I am looking at.
Thanks for pointing out the need to retain the plain char version for readlink (et. al). I will make sure to do this.. as I don't think I need the wide character value until after this code happens anyways.
From: Thomas Kluyver [takowl@...]
Sent: Monday, July 29, 2013 12:43 PM
To: primary discussion list for use and development of cx_Freeze
Subject: Re: [cx-freeze-users] Problems launching Frozen Python 3.3 application located in path with international characters.
On 29 July 2013 16:48, Steven Velez <steven.velez@...>> wrote:
Linux is my weakest platform, but if the filenames are stored as byte strings on disk (no reason to believe they are not), then how that byte string is interpreted is a function of the encoding of the bytes.
However, as I understand it (which admittedly isn't well), the 'interpretation' as a unicode string is dependent on the host system, not something stored in the filesystem. So if I name a file with a € character on a system using UTF-8, its name contains the bytes \xe2\x82\xac, and I still need to use those bytes if I access it on a system using Latin 1.
Anyway, looking at the code: in the base, we get a wide char filename (on Py3) from Py_GetProgramFullPath, which is then converted to a plain char filename using wcstombs, then converted back in SetExecutableFilename by cxString_FromString, which is a macro calling PyUnicode_Decode. Is this the issue you're looking at?
I agree that the double conversion seems superfluous. We need the plain-char version to call system functions like stat and readlink, but it should be possible to do that with a single conversion, so long as there's a neat way to do that.