[Pyobjc-dev] unicode strings to system calls

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

On Monday, Feb 3, 2003, at 16:48 US/Eastern, Just van Rossum wrote:
> Python has only limited support for unicode file names and I believe
> it's highly platform dependent. Right now it doesn't work with unicode
> strings on OSX, but it does work with 8-bit strings encoded as utf-8:
>
>>>> os.stat('a\xcc\x8a')
> (33188, 1685956L, 234881029L, 1, 501, 20, 0L, 1044307510, 1044307510,
> 1044307510)
>>>> os.stat(unicode('a\xcc\x8a', "utf-8"))
> Traceback (most recent call last):
>   File "<stdin>", line 1, in ?
> UnicodeEncodeError: 'ascii' codec can't encode character '\u30a' in
> position 1: ordinal not in range(128)
>>>>
>
> This seems pretty broken, but I don't know enough of the internals to
> see what it would take to fix this.

Broken, yes, but the behavior makes sense.

The entire [I think entire, that was the goal] BSD layer can accept and 
use UTF-8 encoded strings.   It "just works".

As such, Python probably isn't doing anything with the strings before 
passing 'em into the underlying API.   The following lends support to 
that theory:

 >>> x = 'a\xcc\x8a'
 >>> type(x)
<type 'str'>

In effect, -x- in the above example is just a regular string-- not 
unicode-- and is passed into the stat() [which parses the first 
argument in the same fashion as file()/open();  via the 'et' format 
sequence] function as the filename in the same fashion as file()/open().

So, in theory, I should be able to create an object that implements the 
character buffer interface, contains the NSString reference, and 
provides immutable access to the NSString's contents through the 
character buffer interface.   The NSString's contents will be encoded 
as UTF8String into the buffer.

We'll see how far I get....

b.bum