Thread: [Pyobjc-dev] Re: [ pyobjc-Bugs-679748 ] NSMutableString gets converted to Python string
Brought to you by:
ronaldoussoren
From: Bill B. <bb...@co...> - 2003-02-03 20:30:50
|
On Monday, Feb 3, 2003, at 15:26 US/Eastern, SourceForge.net wrote: >> Comment By: Just van Rossum (jvr) > Date: 2003-02-03 21:26 > > Message: > Logged In: YES > user_id=92689 > > How's this: let's _not_ convert any NS{Mutable}String as an > experiment, and see what it breaks in the Examples area and in our own > respective code bases code. We can always go back and do it > differently. This is my next Train Hack(tm) for the ride home this evening. I'm going to implement two things: - an OC_NSString subclass of NSString that can encapsulate a python string an an NSString compatible fashion - a python object that provides a character buffer style interface to the contents of an NSString. The former *should* be easy. The latter will be difficult, but I have experience with the character buffer APIs from doing the NSData/NSBitmapImageRep API support. Thank GOODNESS for unit tests. It is going to be *really* easy to get a feel for what breaks. If anyone has a test rolling around your head that you'd think would be a good thing to support/do, please whip off a test. Just copy one of the test cases that are already in Lib/Foundation/test [has more tests than the other two modules] and modify it for your needs. I don't really care if the test passes or fails-- but, if it fails, make sure it fails because it is demonstrating something you *want* to work. b.bum |
From: Just v. R. <ju...@le...> - 2003-02-03 20:46:21
|
Bill Bumgarner wrote: > - a python object that provides a character buffer style interface to > the contents of an NSString. How would this work for NSStrings containing unicode? Just |
From: Bill B. <bb...@co...> - 2003-02-03 20:56:33
|
On Monday, Feb 3, 2003, at 15:45 US/Eastern, Just van Rossum wrote: > Bill Bumgarner wrote: >> - a python object that provides a character buffer style interface to >> the contents of an NSString. > > How would this work for NSStrings containing unicode? I have no clue yet. NSString provides a rich set of API for converting from whatever the internal representation is to whatever Unicode representation you might want. As such, it will be easy to produce a character buffer full of, say, UTF8 characters. What can be done with this in the context of the Python API -- whether it can be wrapped into a python object that is actually useful -- remains to be seen. Given that file()/open() only looks for a character buffer and, I believe, can handle a UTF8 path gives me hope. b.bum |
From: Just v. R. <ju...@le...> - 2003-02-03 21:49:02
|
Bill Bumgarner wrote: > On Monday, Feb 3, 2003, at 15:45 US/Eastern, Just van Rossum wrote: > > Bill Bumgarner wrote: > >> - a python object that provides a character buffer style interface > >> to the contents of an NSString. > > > > How would this work for NSStrings containing unicode? > > I have no clue yet. What works nicely now is that the conversion of unicode strings to NSStrings and vice versa is really transparant: pass Python unicode strings to ObjC call expecting an NSString and it works. The other way also: if the NSString is representable in 7-bit ascii you get a str, if not you get a unicode string. I worry about that Python users will have to convert to a unicode string after all when this conversion _doesn't_ take place. I have no idea how to make an object can behave _like_ a unicode string and have it work everywhere. Maybe time for a post to c.l.py... > NSString provides a rich set of API for converting from whatever the > internal representation is to whatever Unicode representation you > might want. As such, it will be easy to produce a character buffer > full of, say, UTF8 characters. > > What can be done with this in the context of the Python API -- > whether it can be wrapped into a python object that is actually > useful -- remains to be seen. Given that file()/open() only looks > for a character buffer and, I believe, can handle a UTF8 path gives > me hope. Python has only limited support for unicode file names and I believe it's highly platform dependent. Right now it doesn't work with unicode strings on OSX, but it does work with 8-bit strings encoded as utf-8: >>> os.stat('a\xcc\x8a') (33188, 1685956L, 234881029L, 1, 501, 20, 0L, 1044307510, 1044307510, 1044307510) >>> os.stat(unicode('a\xcc\x8a', "utf-8")) Traceback (most recent call last): File "<stdin>", line 1, in ? UnicodeEncodeError: 'ascii' codec can't encode character '\u30a' in position 1: ordinal not in range(128) >>> This seems pretty broken, but I don't know enough of the internals to see what it would take to fix this. Just |
From: Just v. R. <ju...@le...> - 2003-02-03 22:03:46
|
Just van Rossum wrote: > I have no idea how to make an object can behave _like_ a > unicode string and have it work everywhere. Maybe time for a post to > c.l.py... Question posted. Something with "Unicode" in the subject... Just |
From: Bill B. <bb...@co...> - 2003-02-03 22:09:57
|
On Monday, Feb 3, 2003, at 16:48 US/Eastern, Just van Rossum wrote: > Python has only limited support for unicode file names and I believe > it's highly platform dependent. Right now it doesn't work with unicode > strings on OSX, but it does work with 8-bit strings encoded as utf-8: > >>>> os.stat('a\xcc\x8a') > (33188, 1685956L, 234881029L, 1, 501, 20, 0L, 1044307510, 1044307510, > 1044307510) >>>> os.stat(unicode('a\xcc\x8a', "utf-8")) > Traceback (most recent call last): > File "<stdin>", line 1, in ? > UnicodeEncodeError: 'ascii' codec can't encode character '\u30a' in > position 1: ordinal not in range(128) >>>> > > This seems pretty broken, but I don't know enough of the internals to > see what it would take to fix this. Broken, yes, but the behavior makes sense. The entire [I think entire, that was the goal] BSD layer can accept and use UTF-8 encoded strings. It "just works". As such, Python probably isn't doing anything with the strings before passing 'em into the underlying API. The following lends support to that theory: >>> x = 'a\xcc\x8a' >>> type(x) <type 'str'> In effect, -x- in the above example is just a regular string-- not unicode-- and is passed into the stat() [which parses the first argument in the same fashion as file()/open(); via the 'et' format sequence] function as the filename in the same fashion as file()/open(). So, in theory, I should be able to create an object that implements the character buffer interface, contains the NSString reference, and provides immutable access to the NSString's contents through the character buffer interface. The NSString's contents will be encoded as UTF8String into the buffer. We'll see how far I get.... b.bum |
From: Just v. R. <ju...@le...> - 2003-02-03 22:32:20
|
Bill Bumgarner wrote: > Broken, yes, but the behavior makes sense. > > The entire [I think entire, that was the goal] BSD layer can accept > and use UTF-8 encoded strings. It "just works". > > As such, Python probably isn't doing anything with the strings before > passing 'em into the underlying API. The following lends support to > that theory: > > >>> x = 'a\xcc\x8a' > >>> type(x) > <type 'str'> Unicode strings can always be recognized by the leading u char in the repr: >>> u'a' u'a' >>> type(u'a') <type 'unicode'> If the repr doesn't start with a u, it's an 8-bit string. Unicode strings are internally represented by 16-bit chars (but there's a build option that makes this 32-bits). > In effect, -x- in the above example is just a regular string-- not > unicode-- and is passed into the stat() [which parses the first > argument in the same fashion as file()/open(); via the 'et' format > sequence] function as the filename in the same fashion as > file()/open(). > > So, in theory, I should be able to create an object that implements > the character buffer interface, contains the NSString reference, and > provides immutable access to the NSString's contents through the > character buffer interface. The NSString's contents will be encoded > as UTF8String into the buffer. But to do anything _meaningful_ with unicode in Python (other than working with the file system), you are going to need an actual unicode object (or one that acts justs like it, which is what I'm not sure is possible). Just |
From: Bob I. <bo...@re...> - 2003-02-03 22:16:49
|
On Monday, Feb 3, 2003, at 16:48 America/New_York, Just van Rossum wrote: > Bill Bumgarner wrote: > >> On Monday, Feb 3, 2003, at 15:45 US/Eastern, Just van Rossum wrote: >>> Bill Bumgarner wrote: >>>> - a python object that provides a character buffer style interface >>>> to the contents of an NSString. >>> >>> How would this work for NSStrings containing unicode? >> >> I have no clue yet. > > What works nicely now is that the conversion of unicode strings to > NSStrings and vice versa is really transparant: pass Python unicode > strings to ObjC call expecting an NSString and it works. The other way > also: if the NSString is representable in 7-bit ascii you get a str, if > not you get a unicode string. I worry about that Python users will have > to convert to a unicode string after all when this conversion _doesn't_ > take place. I have no idea how to make an object can behave _like_ a > unicode string and have it work everywhere. Maybe time for a post to > c.l.py... What about this: class UnicodeNSStringWrapper(unicode): def __new__(clazz, myNSString): s = unicode.__new__(somethingToConvertNSStringInstanceToPyUnicodeObject(myNS String)) s._objc = myNSString return s def __getattr__(self, attr): try: return getattr(self._objc, attr) except: raise AttributeError, '%r object has no attribute %r' % (self.__class__.__name__, self._objc) It should do anything that unicode() will do, just like the str subclass I posted a bit ago.. and you don't lose any of the NSString functionality. -bob |
From: Bill B. <bb...@co...> - 2003-02-03 22:21:03
|
On Monday, Feb 3, 2003, at 17:16 US/Eastern, Bob Ippolito wrote: > What about this: > class UnicodeNSStringWrapper(unicode): > def __new__(clazz, myNSString): > s = > unicode.__new__(somethingToConvertNSStringInstanceToPyUnicodeObject(myN > SString)) > s._objc = myNSString > return s > def __getattr__(self, attr): > try: > return getattr(self._objc, attr) > except: > raise AttributeError, '%r object has no attribute %r' % > (self.__class__.__name__, self._objc) > > It should do anything that unicode() will do, just like the str > subclass I posted a bit ago.. and you don't lose any of the NSString > functionality. Given that I have to write somethingToConvertNSStringInstanceToPyUnicodeObject() anyway, I'll do so first, plug it into this and see what happens. (This old had ObjC programmer sometimes has to be beaten around the head with the obvious elegant path that only Python can offer.) thanks! b.bum |
From: Bob I. <bo...@re...> - 2003-02-03 22:35:58
|
On Monday, Feb 3, 2003, at 17:20 America/New_York, Bill Bumgarner wrote: > On Monday, Feb 3, 2003, at 17:16 US/Eastern, Bob Ippolito wrote: >> What about this: >> class UnicodeNSStringWrapper(unicode): >> def __new__(clazz, myNSString): >> s = >> unicode.__new__(somethingToConvertNSStringInstanceToPyUnicodeObject(my >> NSString)) >> s._objc = myNSString >> return s >> def __getattr__(self, attr): >> try: >> return getattr(self._objc, attr) >> except: >> raise AttributeError, '%r object has no attribute %r' % >> (self.__class__.__name__, self._objc) >> >> It should do anything that unicode() will do, just like the str >> subclass I posted a bit ago.. and you don't lose any of the NSString >> functionality. > > Given that I have to write > somethingToConvertNSStringInstanceToPyUnicodeObject() anyway, I'll do > so first, plug it into this and see what happens. (This old had ObjC > programmer sometimes has to be beaten around the head with the obvious > elegant path that only Python can offer.) PyObject *somethingToConvertNSStringInstanceToPyUnicodeObject(NSString *myNSString) { const char *s; if (myNSString == nil) return NULL; s = [myNSString UTF8String]; return PyUnicode_Decode(s, strlen(s), "utf-8", NULL); } not sure about the NULL for errors, but that just about does it! -bob |
From: Just v. R. <ju...@le...> - 2003-02-03 22:36:33
|
Bob Ippolito wrote: > What about this: > class UnicodeNSStringWrapper(unicode): > def __new__(clazz, myNSString): > s = > unicode.__new__(somethingToConvertNSStringInstanceToPyUnicodeObject(myNS > String)) > s._objc = myNSString > return s > def __getattr__(self, attr): > try: > return getattr(self._objc, attr) > except: > raise AttributeError, '%r object has no attribute %r' % > (self.__class__.__name__, self._objc) > > It should do anything that unicode() will do, just like the str > subclass I posted a bit ago.. and you don't lose any of the NSString > functionality. This seems the worst of both worlds, performance wise: allocate new storage *and* keep the old object? Hm... Just |
From: Bill B. <bb...@co...> - 2003-02-03 22:42:15
|
On Monday, Feb 3, 2003, at 17:36 US/Eastern, Just van Rossum wrote: > This seems the worst of both worlds, performance wise: allocate new > storage *and* keep the old object? Hm... If implemented correctly, the allocation only happens once the first time the object crosses the bridge from ObjC->Python. From then on, the bridge should be able to use the already existing instance of NSString when going from Python->ObjC. The challenge will be when going from ObjC->Python after the first invocation. I'm hoping the weak reference code that is already present in the bridge will provide some kind of a solution. The truth will be revealed in the code, I suppose. b.bum |
From: Bob I. <bo...@re...> - 2003-02-03 23:14:20
|
On Monday, Feb 3, 2003, at 17:42 America/New_York, Bill Bumgarner wrote: > On Monday, Feb 3, 2003, at 17:36 US/Eastern, Just van Rossum wrote: >> This seems the worst of both worlds, performance wise: allocate new >> storage *and* keep the old object? Hm... > > If implemented correctly, the allocation only happens once the first > time the object crosses the bridge from ObjC->Python. From then on, > the bridge should be able to use the already existing instance of > NSString when going from Python->ObjC. The challenge will be when > going from ObjC->Python after the first invocation. I'm hoping the > weak reference code that is already present in the bridge will provide > some kind of a solution. > > The truth will be revealed in the code, I suppose. Is it really the worst of both worlds, performance wise? If you have *both* available, you have a native object to work with on both sides of the bridge. Since you're keeping the NSString around, when/if it needs to get passed back you don't need anything special on the ObjC end. It also has the potential to save a lot of programmer hours (for everyone using pyobjc), I think, which is more important for Python users IMHO. It might use twice the memory, but how often do you pass gigantic NSStrings around over a bridge? If you really wanted to garbage collect the NSString (assuming it has no references on the ObjC side) you could do myUnicodeNSStringWrapper = unicode(myUnicodeNSStringWrapper) or myNSStringWrapper = str(myNSStringWrapper). In any case, as far as I can tell, you still need to have both allocated at the same time at one point *if* you want something that can act like a PyString or PyUnicode without the pyobjc user knowing too much about it. You might as well keep both around as long as you need them. -bob |
From: David E. <epp...@ic...> - 2003-02-03 23:08:47
|
On 2/3/03 10:48 PM +0100 Just van Rossum <ju...@le...> wrote: > What works nicely now is that the conversion of unicode strings to > NSStrings and vice versa is really transparant: pass Python unicode > strings to ObjC call expecting an NSString and it works. The other way > also: if the NSString is representable in 7-bit ascii you get a str, if > not you get a unicode string. My code certainly depends on this (at least, the part about sending unicode strings to objc and getting unicode strings back). > I worry about that Python users will have > to convert to a unicode string after all when this conversion _doesn't_ > take place. Currently, because of the 7-bit possibility, if you want a unicode string from a value s that came from the objc side, you need to call unicode(s). I hope and assume that whatever happens with strings, unicode(s) will still work. > Python has only limited support for unicode file names and I believe > it's highly platform dependent. Right now it doesn't work with unicode > strings on OSX, but it does work with 8-bit strings encoded as utf-8: > >>>> os.stat('a\xcc\x8a') > (33188, 1685956L, 234881029L, 1, 501, 20, 0L, 1044307510, 1044307510, > 1044307510) >>>> os.stat(unicode('a\xcc\x8a', "utf-8")) > Traceback (most recent call last): > File "<stdin>", line 1, in ? > UnicodeEncodeError: 'ascii' codec can't encode character '\u30a' in > position 1: ordinal not in range(128) >>>> > > This seems pretty broken, but I don't know enough of the internals to > see what it would take to fix this. There seems to be a thread going now on c.l.py about unicode filenames... -- David Eppstein UC Irvine Dept. of Information & Computer Science epp...@ic... http://www.ics.uci.edu/~eppstein/ |
From: Bob I. <bo...@re...> - 2003-02-03 23:19:37
|
On Monday, Feb 3, 2003, at 18:08 America/New_York, David Eppstein wrote: > On 2/3/03 10:48 PM +0100 Just van Rossum <ju...@le...> wrote: >> What works nicely now is that the conversion of unicode strings to >> NSStrings and vice versa is really transparant: pass Python unicode >> strings to ObjC call expecting an NSString and it works. The other way >> also: if the NSString is representable in 7-bit ascii you get a str, >> if >> not you get a unicode string. > > My code certainly depends on this (at least, the part about sending > unicode strings to objc and getting unicode strings back). > >> I worry about that Python users will have >> to convert to a unicode string after all when this conversion >> _doesn't_ >> take place. > > Currently, because of the 7-bit possibility, if you want a unicode > string from a value s that came from the objc side, you need to call > unicode(s). I hope and assume that whatever happens with strings, > unicode(s) will still work. unicode(s) works for any str or unicode instance, or any instance that otherwise implements __str__ and/or __unicode__. If it's a str or __str__ that has 8-bit characters, you have to specify an encoding. Optionally you may also specify a way to handle errors ('strict', 'ignore', or 'replace'). -bob |
From: David E. <epp...@ic...> - 2003-02-04 00:12:09
|
On 2/3/03 6:19 PM -0500 Bob Ippolito <bo...@re...> wrote: >> Currently, because of the 7-bit possibility, if you want a unicode >> string from a value s that came from the objc side, you need to call >> unicode(s). I hope and assume that whatever happens with strings, >> unicode(s) will still work. > > unicode(s) works for any str or unicode instance, or any instance that > otherwise implements __str__ and/or __unicode__. If it's a str or > __str__ that has 8-bit characters, you have to specify an encoding. > Optionally you may also specify a way to handle errors ('strict', > 'ignore', or 'replace'). So I guess my point is that if NSStrings with 8-bit characters stopped being converted to unicodes automatically, an appropriate implementation of __unicode__ should be added. It would be bad to go through __str__ and force an encoding to be specified explicitly, because an encoding is already determined for this kind of object. On 2/4/03 12:24 AM +0100 Just van Rossum <ju...@le...> wrote: >> Currently, because of the 7-bit possibility, if you want a unicode >> string from a value s that came from the objc side, you need to call >> unicode(s). I hope and assume that whatever happens with strings, >> unicode(s) will still work. > > I'm curious: if the string is known to be 7-bit ascii, in what situation > does an 8-bit string not work where a unicde string does? I had some code of the form unicode(s).encode('utf8') because I thought it didn't make sense to encode something that wasn't already unicode. But looking at it again, I guess encode works equally well on 7-bit strings... > Btw. while peeking around I came across the thing Bill warned for: > [NSString stringWithCString:] returns an instance of NSCFString, which > is a subclass of -- tada -- NSMutableString. This doesn't mean it _is_ > mutable: I get an exception if I try. So _inheritance_ can't be used to > determine mutability. Does anyone know off hand what can? Any idea whether it reuses the storage of the C string, or copies it? If it reuses it, it could be mutable by the C code... -- David Eppstein UC Irvine Dept. of Information & Computer Science epp...@ic... http://www.ics.uci.edu/~eppstein/ |
From: Just v. R. <ju...@le...> - 2003-02-03 23:25:41
|
David Eppstein wrote: > Currently, because of the 7-bit possibility, if you want a unicode > string from a value s that came from the objc side, you need to call > unicode(s). I hope and assume that whatever happens with strings, > unicode(s) will still work. I'm curious: if the string is known to be 7-bit ascii, in what situation does an 8-bit string not work where a unicde string does? > There seems to be a thread going now on c.l.py about unicode > filenames... I only see one about source encodings... Btw. while peeking around I came across the thing Bill warned for: [NSString stringWithCString:] returns an instance of NSCFString, which is a subclass of -- tada -- NSMutableString. This doesn't mean it _is_ mutable: I get an exception if I try. So _inheritance_ can't be used to determine mutability. Does anyone know off hand what can? Just |