Re: [Pyobjc-dev] unicode strings to system calls

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Bill Bumgarner wrote:

> Broken, yes, but the behavior makes sense.
> 
> The entire [I think entire, that was the goal] BSD layer can accept
> and use UTF-8 encoded strings.   It "just works".
> 
> As such, Python probably isn't doing anything with the strings before 
> passing 'em into the underlying API.   The following lends support to 
> that theory:
> 
>  >>> x = 'a\xcc\x8a'
>  >>> type(x)
> <type 'str'>

Unicode strings can always be recognized by the leading u char in the
repr:

  >>> u'a'
  u'a'
  >>> type(u'a')
  <type 'unicode'>

If the repr doesn't start with a u, it's an 8-bit string. Unicode
strings are internally represented by 16-bit chars (but there's a build
option that makes this 32-bits).

> In effect, -x- in the above example is just a regular string-- not
> unicode-- and is passed into the stat() [which parses the first
> argument in the same fashion as file()/open();  via the 'et' format
> sequence] function as the filename in the same fashion as
> file()/open().
> 
> So, in theory, I should be able to create an object that implements
> the character buffer interface, contains the NSString reference, and
> provides immutable access to the NSString's contents through the
> character buffer interface.   The NSString's contents will be encoded
> as UTF8String into the buffer.

But to do anything _meaningful_ with unicode in Python (other than
working with the file system), you are going to need an actual unicode
object (or one that acts justs like it, which is what I'm not sure is
possible).

Just