Re: [Pyobjc-dev] unicode strings to system calls
Brought to you by:
ronaldoussoren
From: Just v. R. <ju...@le...> - 2003-02-03 22:32:20
|
Bill Bumgarner wrote: > Broken, yes, but the behavior makes sense. > > The entire [I think entire, that was the goal] BSD layer can accept > and use UTF-8 encoded strings. It "just works". > > As such, Python probably isn't doing anything with the strings before > passing 'em into the underlying API. The following lends support to > that theory: > > >>> x = 'a\xcc\x8a' > >>> type(x) > <type 'str'> Unicode strings can always be recognized by the leading u char in the repr: >>> u'a' u'a' >>> type(u'a') <type 'unicode'> If the repr doesn't start with a u, it's an 8-bit string. Unicode strings are internally represented by 16-bit chars (but there's a build option that makes this 32-bits). > In effect, -x- in the above example is just a regular string-- not > unicode-- and is passed into the stat() [which parses the first > argument in the same fashion as file()/open(); via the 'et' format > sequence] function as the filename in the same fashion as > file()/open(). > > So, in theory, I should be able to create an object that implements > the character buffer interface, contains the NSString reference, and > provides immutable access to the NSString's contents through the > character buffer interface. The NSString's contents will be encoded > as UTF8String into the buffer. But to do anything _meaningful_ with unicode in Python (other than working with the file system), you are going to need an actual unicode object (or one that acts justs like it, which is what I'm not sure is possible). Just |