[Pyobjc-dev] Re: [ pyobjc-Bugs-679748 ] NSMutableString gets converted to Python string

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Bill Bumgarner wrote:

> On Monday, Feb 3, 2003, at 15:45 US/Eastern, Just van Rossum wrote:
> > Bill Bumgarner wrote:
> >> - a python object that provides a character buffer style interface
> >> to the contents of an NSString.
> >
> > How would this work for NSStrings containing unicode?
> 
> I have no clue yet.

What works nicely now is that the conversion of unicode strings to
NSStrings and vice versa is really transparant: pass Python unicode
strings to ObjC call expecting an NSString and it works. The other way
also: if the NSString is representable in 7-bit ascii you get a str, if
not you get a unicode string. I worry about that Python users will have
to convert to a unicode string after all when this conversion _doesn't_
take place. I have no idea how to make an object can behave _like_ a
unicode string and have it work everywhere. Maybe time for a post to
c.l.py...

> NSString provides a rich set of API for converting from whatever the
> internal representation is to whatever Unicode representation you
> might want.   As such, it will be easy to produce a character buffer
> full of, say, UTF8 characters.
> 
> What can be done with this in the context of the Python API --
> whether it can be wrapped into a python object that is actually
> useful -- remains to be seen.   Given that file()/open() only looks
> for a character buffer and, I believe, can handle a UTF8 path gives
> me hope.

Python has only limited support for unicode file names and I believe
it's highly platform dependent. Right now it doesn't work with unicode
strings on OSX, but it does work with 8-bit strings encoded as utf-8:

>>> os.stat('a\xcc\x8a')
(33188, 1685956L, 234881029L, 1, 501, 20, 0L, 1044307510, 1044307510,
1044307510)
>>> os.stat(unicode('a\xcc\x8a', "utf-8"))
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
UnicodeEncodeError: 'ascii' codec can't encode character '\u30a' in
position 1: ordinal not in range(128)
>>> 

This seems pretty broken, but I don't know enough of the internals to
see what it would take to fix this.

Just