[Pyobjc-dev] unicode strings to system calls
Brought to you by:
ronaldoussoren
From: Bill B. <bb...@co...> - 2003-02-03 22:09:57
|
On Monday, Feb 3, 2003, at 16:48 US/Eastern, Just van Rossum wrote: > Python has only limited support for unicode file names and I believe > it's highly platform dependent. Right now it doesn't work with unicode > strings on OSX, but it does work with 8-bit strings encoded as utf-8: > >>>> os.stat('a\xcc\x8a') > (33188, 1685956L, 234881029L, 1, 501, 20, 0L, 1044307510, 1044307510, > 1044307510) >>>> os.stat(unicode('a\xcc\x8a', "utf-8")) > Traceback (most recent call last): > File "<stdin>", line 1, in ? > UnicodeEncodeError: 'ascii' codec can't encode character '\u30a' in > position 1: ordinal not in range(128) >>>> > > This seems pretty broken, but I don't know enough of the internals to > see what it would take to fix this. Broken, yes, but the behavior makes sense. The entire [I think entire, that was the goal] BSD layer can accept and use UTF-8 encoded strings. It "just works". As such, Python probably isn't doing anything with the strings before passing 'em into the underlying API. The following lends support to that theory: >>> x = 'a\xcc\x8a' >>> type(x) <type 'str'> In effect, -x- in the above example is just a regular string-- not unicode-- and is passed into the stat() [which parses the first argument in the same fashion as file()/open(); via the 'et' format sequence] function as the filename in the same fashion as file()/open(). So, in theory, I should be able to create an object that implements the character buffer interface, contains the NSString reference, and provides immutable access to the NSString's contents through the character buffer interface. The NSString's contents will be encoded as UTF8String into the buffer. We'll see how far I get.... b.bum |