From: Colin P. A. <co...@co...> - 2007-03-17 08:39:08
|
>>>>> "Colin" == Colin Paul Adams <co...@co...> writes: Colin> xgespr%C3%A4ch.xml Colin> The percent encoding equates to lower case a-umlaut. Colin> Now on my linux system, file name encoding is UTF-8 (I Colin> think), but the file supplied with the test suite exists on Colin> my system with the a-umlaut as a Latin-1 character. I am Colin> not sure if this is a bug with the unzip command, or what Colin> (in fact, I am very unsure about everything to do with this Colin> - the one thing I am certain about is that the percent Colin> encoding MUST be interpreted as UTF-8, and the file name Colin> decoded accordingly). Colin> Not surprsingly, the test fails on my system. Colin> But I begin to wonder if the test would not also have Colin> failed if the file name were correct as a UTF-8 file name. Colin> The resolver uses UT_FILE_URI_ROUTINES, whichj in turn make Colin> use of Colin> file_system.pathname_to_string (uri_to_pathname (a_uri)) Colin> As far as I can tell, the net result of this is to pass a Colin> UTF-8 byte string to the Eiffel runtime as a Colin> STRING/STRING_8. Colin> I suspect the runtime expects a Latin-1 name, but I don't Colin> know. What about on a windows system with the file system Colin> using UTF-16. I tried some experimenting. Changing the LANG env. var. on my system from en_GB.UTF-8 to en_GB, and then unzipping the distribution afresh didn't help. So Instead I renamed the file to give it its proper Unicode name (in UTF-8). Now the test works on Linux. This confirms that the byte-sequence is being passed straight to fopen, and apparently fopen isn't doing any translations with it. I shall try to find the time to repeat the tests on Windows. Does anyone have a Linux system where ISO-8859-1 is the standard encoding? If so, perhaps we can try the test there. But I bet it fails. It looks like we can't do any better within Gobo though. -- Colin Adams Preston Lancashire |