Thread: [Pyobjc-dev] Re: [ pyobjc-Bugs-679748 ] NSMutableString gets converted to Python string

Brought to you by: ronaldoussoren

pyobjc-dev

[Pyobjc-dev] Re: [ pyobjc-Bugs-679748 ] NSMutableString gets converted to Python string

From: Bill B. <bb...@co...> - 2003-02-03 20:30:50

On Monday, Feb 3, 2003, at 15:26 US/Eastern, SourceForge.net wrote:
>> Comment By: Just van Rossum (jvr)
> Date: 2003-02-03 21:26
>
> Message:
> Logged In: YES
> user_id=92689
>
> How's this: let's _not_ convert any NS{Mutable}String as an 
> experiment, and see what it breaks in the Examples area and in our own 
> respective code bases code. We can always go back and do it 
> differently.

This is my next Train Hack(tm) for the ride home this evening.  I'm 
going to implement two things:

- an OC_NSString subclass of NSString that can encapsulate a python 
string an an NSString compatible fashion

- a python object that provides a character buffer style interface to 
the contents of an NSString.

The former *should* be easy.  The latter will be difficult, but I have 
experience with the character buffer APIs from doing the 
NSData/NSBitmapImageRep API support.

Thank GOODNESS for unit tests.   It is going to be *really* easy to get 
a feel for what breaks.

If anyone has a test rolling around your head that you'd think would be 
a good thing to support/do, please whip off a test.  Just copy one of 
the test cases that are already in Lib/Foundation/test [has more tests 
than the other two modules] and modify it for your needs.   I don't 
really care if the test passes or fails-- but, if it fails, make sure 
it fails because it is demonstrating something you *want* to work.

b.bum

[Pyobjc-dev] Re: [ pyobjc-Bugs-679748 ] NSMutableString gets converted to Python string

From: Just v. R. <ju...@le...> - 2003-02-03 20:46:21

Bill Bumgarner wrote:

> - a python object that provides a character buffer style interface to 
> the contents of an NSString.

How would this work for NSStrings containing unicode?

Just

[Pyobjc-dev] Re: [ pyobjc-Bugs-679748 ] NSMutableString gets converted to Python string

From: Bill B. <bb...@co...> - 2003-02-03 20:56:33

On Monday, Feb 3, 2003, at 15:45 US/Eastern, Just van Rossum wrote:
> Bill Bumgarner wrote:
>> - a python object that provides a character buffer style interface to
>> the contents of an NSString.
>
> How would this work for NSStrings containing unicode?

I have no clue yet.

NSString provides a rich set of API for converting from whatever the 
internal representation is to whatever Unicode representation you might 
want.   As such, it will be easy to produce a character buffer full of, 
say, UTF8 characters.

What can be done with this in the context of the Python API -- whether 
it can be wrapped into a python object that is actually useful -- 
remains to be seen.   Given that file()/open() only looks for a 
character buffer and, I believe, can handle a UTF8 path gives me hope.

b.bum

[Pyobjc-dev] Re: [ pyobjc-Bugs-679748 ] NSMutableString gets converted to Python string

From: Just v. R. <ju...@le...> - 2003-02-03 21:49:02

Bill Bumgarner wrote:

> On Monday, Feb 3, 2003, at 15:45 US/Eastern, Just van Rossum wrote:
> > Bill Bumgarner wrote:
> >> - a python object that provides a character buffer style interface
> >> to the contents of an NSString.
> >
> > How would this work for NSStrings containing unicode?
> 
> I have no clue yet.

What works nicely now is that the conversion of unicode strings to
NSStrings and vice versa is really transparant: pass Python unicode
strings to ObjC call expecting an NSString and it works. The other way
also: if the NSString is representable in 7-bit ascii you get a str, if
not you get a unicode string. I worry about that Python users will have
to convert to a unicode string after all when this conversion _doesn't_
take place. I have no idea how to make an object can behave _like_ a
unicode string and have it work everywhere. Maybe time for a post to
c.l.py...

> NSString provides a rich set of API for converting from whatever the
> internal representation is to whatever Unicode representation you
> might want.   As such, it will be easy to produce a character buffer
> full of, say, UTF8 characters.
> 
> What can be done with this in the context of the Python API --
> whether it can be wrapped into a python object that is actually
> useful -- remains to be seen.   Given that file()/open() only looks
> for a character buffer and, I believe, can handle a UTF8 path gives
> me hope.

Python has only limited support for unicode file names and I believe
it's highly platform dependent. Right now it doesn't work with unicode
strings on OSX, but it does work with 8-bit strings encoded as utf-8:

>>> os.stat('a\xcc\x8a')
(33188, 1685956L, 234881029L, 1, 501, 20, 0L, 1044307510, 1044307510,
1044307510)
>>> os.stat(unicode('a\xcc\x8a', "utf-8"))
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
UnicodeEncodeError: 'ascii' codec can't encode character '\u30a' in
position 1: ordinal not in range(128)
>>> 

This seems pretty broken, but I don't know enough of the internals to
see what it would take to fix this.

Just

Re: [Pyobjc-dev] Re: [ pyobjc-Bugs-679748 ] NSMutableString gets converted to Python string

From: Just v. R. <ju...@le...> - 2003-02-03 22:03:46

Just van Rossum wrote:

> I have no idea how to make an object can behave _like_ a
> unicode string and have it work everywhere. Maybe time for a post to
> c.l.py...

Question posted. Something with "Unicode" in the subject...

Just

[Pyobjc-dev] unicode strings to system calls

From: Bill B. <bb...@co...> - 2003-02-03 22:09:57

On Monday, Feb 3, 2003, at 16:48 US/Eastern, Just van Rossum wrote:
> Python has only limited support for unicode file names and I believe
> it's highly platform dependent. Right now it doesn't work with unicode
> strings on OSX, but it does work with 8-bit strings encoded as utf-8:
>
>>>> os.stat('a\xcc\x8a')
> (33188, 1685956L, 234881029L, 1, 501, 20, 0L, 1044307510, 1044307510,
> 1044307510)
>>>> os.stat(unicode('a\xcc\x8a', "utf-8"))
> Traceback (most recent call last):
>   File "<stdin>", line 1, in ?
> UnicodeEncodeError: 'ascii' codec can't encode character '\u30a' in
> position 1: ordinal not in range(128)
>>>>
>
> This seems pretty broken, but I don't know enough of the internals to
> see what it would take to fix this.

Broken, yes, but the behavior makes sense.

The entire [I think entire, that was the goal] BSD layer can accept and 
use UTF-8 encoded strings.   It "just works".

As such, Python probably isn't doing anything with the strings before 
passing 'em into the underlying API.   The following lends support to 
that theory:

 >>> x = 'a\xcc\x8a'
 >>> type(x)
<type 'str'>

In effect, -x- in the above example is just a regular string-- not 
unicode-- and is passed into the stat() [which parses the first 
argument in the same fashion as file()/open();  via the 'et' format 
sequence] function as the filename in the same fashion as file()/open().

So, in theory, I should be able to create an object that implements the 
character buffer interface, contains the NSString reference, and 
provides immutable access to the NSString's contents through the 
character buffer interface.   The NSString's contents will be encoded 
as UTF8String into the buffer.

We'll see how far I get....

b.bum

Re: [Pyobjc-dev] unicode strings to system calls

From: Just v. R. <ju...@le...> - 2003-02-03 22:32:20

Bill Bumgarner wrote:

> Broken, yes, but the behavior makes sense.
> 
> The entire [I think entire, that was the goal] BSD layer can accept
> and use UTF-8 encoded strings.   It "just works".
> 
> As such, Python probably isn't doing anything with the strings before 
> passing 'em into the underlying API.   The following lends support to 
> that theory:
> 
>  >>> x = 'a\xcc\x8a'
>  >>> type(x)
> <type 'str'>

Unicode strings can always be recognized by the leading u char in the
repr:

  >>> u'a'
  u'a'
  >>> type(u'a')
  <type 'unicode'>

If the repr doesn't start with a u, it's an 8-bit string. Unicode
strings are internally represented by 16-bit chars (but there's a build
option that makes this 32-bits).

> In effect, -x- in the above example is just a regular string-- not
> unicode-- and is passed into the stat() [which parses the first
> argument in the same fashion as file()/open();  via the 'et' format
> sequence] function as the filename in the same fashion as
> file()/open().
> 
> So, in theory, I should be able to create an object that implements
> the character buffer interface, contains the NSString reference, and
> provides immutable access to the NSString's contents through the
> character buffer interface.   The NSString's contents will be encoded
> as UTF8String into the buffer.

But to do anything _meaningful_ with unicode in Python (other than
working with the file system), you are going to need an actual unicode
object (or one that acts justs like it, which is what I'm not sure is
possible).

Just

Re: [Pyobjc-dev] Re: [ pyobjc-Bugs-679748 ] NSMutableString gets converted to Python string

From: Bob I. <bo...@re...> - 2003-02-03 22:16:49

On Monday, Feb 3, 2003, at 16:48 America/New_York, Just van Rossum  
wrote:

> Bill Bumgarner wrote:
>
>> On Monday, Feb 3, 2003, at 15:45 US/Eastern, Just van Rossum wrote:
>>> Bill Bumgarner wrote:
>>>> - a python object that provides a character buffer style interface
>>>> to the contents of an NSString.
>>>
>>> How would this work for NSStrings containing unicode?
>>
>> I have no clue yet.
>
> What works nicely now is that the conversion of unicode strings to
> NSStrings and vice versa is really transparant: pass Python unicode
> strings to ObjC call expecting an NSString and it works. The other way
> also: if the NSString is representable in 7-bit ascii you get a str, if
> not you get a unicode string. I worry about that Python users will have
> to convert to a unicode string after all when this conversion _doesn't_
> take place. I have no idea how to make an object can behave _like_ a
> unicode string and have it work everywhere. Maybe time for a post to
> c.l.py...

What about this:
class UnicodeNSStringWrapper(unicode):
	def __new__(clazz, myNSString):
		s =  
unicode.__new__(somethingToConvertNSStringInstanceToPyUnicodeObject(myNS 
String))
		s._objc = myNSString
		return s
	def __getattr__(self, attr):
		try:
			return getattr(self._objc, attr)
		except:
			raise AttributeError, '%r object has no attribute %r' %  
(self.__class__.__name__, self._objc)

It should do anything that unicode() will do, just like the str  
subclass I posted a bit ago.. and you don't lose any of the NSString  
functionality.

-bob

Re: [Pyobjc-dev] Re: [ pyobjc-Bugs-679748 ] NSMutableString gets converted to Python string

From: Bill B. <bb...@co...> - 2003-02-03 22:21:03

On Monday, Feb 3, 2003, at 17:16 US/Eastern, Bob Ippolito wrote:
> What about this:
> class UnicodeNSStringWrapper(unicode):
> 	def __new__(clazz, myNSString):
> 		s =  
> unicode.__new__(somethingToConvertNSStringInstanceToPyUnicodeObject(myN 
> SString))
> 		s._objc = myNSString
> 		return s
> 	def __getattr__(self, attr):
> 		try:
> 			return getattr(self._objc, attr)
> 		except:
> 			raise AttributeError, '%r object has no attribute %r' %  
> (self.__class__.__name__, self._objc)
>
> It should do anything that unicode() will do, just like the str  
> subclass I posted a bit ago.. and you don't lose any of the NSString  
> functionality.

Given that I have to write  
somethingToConvertNSStringInstanceToPyUnicodeObject() anyway, I'll do  
so first, plug it into this and see what happens.   (This old had ObjC  
programmer sometimes has to be beaten around the head with the obvious  
elegant path that only Python can offer.)

thanks!
b.bum

Re: [Pyobjc-dev] Re: [ pyobjc-Bugs-679748 ] NSMutableString gets converted to Python string

From: Bob I. <bo...@re...> - 2003-02-03 22:35:58

On Monday, Feb 3, 2003, at 17:20 America/New_York, Bill Bumgarner wrote:

> On Monday, Feb 3, 2003, at 17:16 US/Eastern, Bob Ippolito wrote:
>> What about this:
>> class UnicodeNSStringWrapper(unicode):
>> 	def __new__(clazz, myNSString):
>> 		s =  
>> unicode.__new__(somethingToConvertNSStringInstanceToPyUnicodeObject(my 
>> NSString))
>> 		s._objc = myNSString
>> 		return s
>> 	def __getattr__(self, attr):
>> 		try:
>> 			return getattr(self._objc, attr)
>> 		except:
>> 			raise AttributeError, '%r object has no attribute %r' %  
>> (self.__class__.__name__, self._objc)
>>
>> It should do anything that unicode() will do, just like the str  
>> subclass I posted a bit ago.. and you don't lose any of the NSString  
>> functionality.
>
> Given that I have to write  
> somethingToConvertNSStringInstanceToPyUnicodeObject() anyway, I'll do  
> so first, plug it into this and see what happens.   (This old had ObjC  
> programmer sometimes has to be beaten around the head with the obvious  
> elegant path that only Python can offer.)

PyObject *somethingToConvertNSStringInstanceToPyUnicodeObject(NSString  
*myNSString) {
	const char *s;
	if (myNSString == nil)
		return NULL;
	s = [myNSString UTF8String];
	return PyUnicode_Decode(s, strlen(s), "utf-8", NULL);
}

not sure about the NULL for errors, but that just about does it!

-bob

Re: [Pyobjc-dev] Re: [ pyobjc-Bugs-679748 ] NSMutableString gets converted to Python string

From: Just v. R. <ju...@le...> - 2003-02-03 22:36:33

Bob Ippolito wrote:

> What about this:
> class UnicodeNSStringWrapper(unicode):
>   def __new__(clazz, myNSString):
>       s =  
>
unicode.__new__(somethingToConvertNSStringInstanceToPyUnicodeObject(myNS 
> String))
>       s._objc = myNSString
>       return s
>   def __getattr__(self, attr):
>       try:
>           return getattr(self._objc, attr)
>       except:
>           raise AttributeError, '%r object has no attribute %r' %  
> (self.__class__.__name__, self._objc)
> 
> It should do anything that unicode() will do, just like the str  
> subclass I posted a bit ago.. and you don't lose any of the NSString  
> functionality.

This seems the worst of both worlds, performance wise: allocate new
storage *and* keep the old object? Hm...

Just

Re: [Pyobjc-dev] Re: [ pyobjc-Bugs-679748 ] NSMutableString gets converted to Python string

From: Bill B. <bb...@co...> - 2003-02-03 22:42:15

On Monday, Feb 3, 2003, at 17:36 US/Eastern, Just van Rossum wrote:
> This seems the worst of both worlds, performance wise: allocate new
> storage *and* keep the old object? Hm...

If implemented correctly, the allocation only happens once the first 
time the object crosses the bridge from ObjC->Python.   From then on, 
the bridge should be able to use the already existing instance of 
NSString when going from Python->ObjC.   The challenge will be when 
going from ObjC->Python after the first invocation.  I'm hoping the 
weak reference code that is already present in the bridge will provide 
some kind of a solution.

The truth will be revealed in the code, I suppose.

b.bum

Re: [Pyobjc-dev] Re: [ pyobjc-Bugs-679748 ] NSMutableString gets converted to Python string

From: Bob I. <bo...@re...> - 2003-02-03 23:14:20

On Monday, Feb 3, 2003, at 17:42 America/New_York, Bill Bumgarner wrote:

> On Monday, Feb 3, 2003, at 17:36 US/Eastern, Just van Rossum wrote:
>> This seems the worst of both worlds, performance wise: allocate new
>> storage *and* keep the old object? Hm...
>
> If implemented correctly, the allocation only happens once the first 
> time the object crosses the bridge from ObjC->Python.   From then on, 
> the bridge should be able to use the already existing instance of 
> NSString when going from Python->ObjC.   The challenge will be when 
> going from ObjC->Python after the first invocation.  I'm hoping the 
> weak reference code that is already present in the bridge will provide 
> some kind of a solution.
>
> The truth will be revealed in the code, I suppose.

Is it really the worst of both worlds, performance wise?  If you have 
*both* available, you have a native object to work with on both sides 
of the bridge.  Since you're keeping the NSString around, when/if it 
needs to get passed back you don't need anything special on the ObjC 
end.  It also has the potential to save a lot of programmer hours (for 
everyone using pyobjc), I think, which is more important for Python 
users IMHO.

It might use twice the memory, but how often do you pass gigantic 
NSStrings around over a bridge?  If you really wanted to garbage 
collect the NSString (assuming it has no references on the ObjC side) 
you could do myUnicodeNSStringWrapper = 
unicode(myUnicodeNSStringWrapper) or myNSStringWrapper = 
str(myNSStringWrapper).

In any case, as far as I can tell, you still need to have both 
allocated at the same time at one point *if* you want something that 
can act like a PyString or PyUnicode without the pyobjc user knowing 
too much about it.  You might as well keep both around as long as you 
need them.

-bob

Re: [Pyobjc-dev] Re: [ pyobjc-Bugs-679748 ] NSMutableString gets converted to Python string

From: David E. <epp...@ic...> - 2003-02-03 23:08:47

On 2/3/03 10:48 PM +0100 Just van Rossum <ju...@le...> wrote:
> What works nicely now is that the conversion of unicode strings to
> NSStrings and vice versa is really transparant: pass Python unicode
> strings to ObjC call expecting an NSString and it works. The other way
> also: if the NSString is representable in 7-bit ascii you get a str, if
> not you get a unicode string.

My code certainly depends on this (at least, the part about sending unicode 
strings to objc and getting unicode strings back).

> I worry about that Python users will have
> to convert to a unicode string after all when this conversion _doesn't_
> take place.

Currently, because of the 7-bit possibility, if you want a unicode string 
from a value s that came from the objc side, you need to call unicode(s). 
I hope and assume that whatever happens with strings, unicode(s) will still 
work.

> Python has only limited support for unicode file names and I believe
> it's highly platform dependent. Right now it doesn't work with unicode
> strings on OSX, but it does work with 8-bit strings encoded as utf-8:
>
>>>> os.stat('a\xcc\x8a')
> (33188, 1685956L, 234881029L, 1, 501, 20, 0L, 1044307510, 1044307510,
> 1044307510)
>>>> os.stat(unicode('a\xcc\x8a', "utf-8"))
> Traceback (most recent call last):
>   File "<stdin>", line 1, in ?
> UnicodeEncodeError: 'ascii' codec can't encode character '\u30a' in
> position 1: ordinal not in range(128)
>>>>
>
> This seems pretty broken, but I don't know enough of the internals to
> see what it would take to fix this.

There seems to be a thread going now on c.l.py about unicode filenames...
--
David Eppstein       UC Irvine Dept. of Information & Computer Science
epp...@ic... http://www.ics.uci.edu/~eppstein/

Re: [Pyobjc-dev] Re: [ pyobjc-Bugs-679748 ] NSMutableString gets converted to Python string

From: Bob I. <bo...@re...> - 2003-02-03 23:19:37

On Monday, Feb 3, 2003, at 18:08 America/New_York, David Eppstein wrote:

> On 2/3/03 10:48 PM +0100 Just van Rossum <ju...@le...> wrote:
>> What works nicely now is that the conversion of unicode strings to
>> NSStrings and vice versa is really transparant: pass Python unicode
>> strings to ObjC call expecting an NSString and it works. The other way
>> also: if the NSString is representable in 7-bit ascii you get a str, 
>> if
>> not you get a unicode string.
>
> My code certainly depends on this (at least, the part about sending 
> unicode strings to objc and getting unicode strings back).
>
>> I worry about that Python users will have
>> to convert to a unicode string after all when this conversion 
>> _doesn't_
>> take place.
>
> Currently, because of the 7-bit possibility, if you want a unicode 
> string from a value s that came from the objc side, you need to call 
> unicode(s). I hope and assume that whatever happens with strings, 
> unicode(s) will still work.

unicode(s) works for any str or unicode instance, or any instance that 
otherwise implements __str__ and/or __unicode__.  If it's a str or 
__str__ that has 8-bit characters, you have to specify an encoding.  
Optionally you may also specify a way to handle errors ('strict', 
'ignore', or 'replace').

-bob

Re: [Pyobjc-dev] Re: [ pyobjc-Bugs-679748 ] NSMutableString gets converted to Python string

From: David E. <epp...@ic...> - 2003-02-04 00:12:09

On 2/3/03 6:19 PM -0500 Bob Ippolito <bo...@re...> wrote:
>> Currently, because of the 7-bit possibility, if you want a unicode
>> string from a value s that came from the objc side, you need to call
>> unicode(s). I hope and assume that whatever happens with strings,
>> unicode(s) will still work.
>
> unicode(s) works for any str or unicode instance, or any instance that
> otherwise implements __str__ and/or __unicode__.  If it's a str or
> __str__ that has 8-bit characters, you have to specify an encoding.
> Optionally you may also specify a way to handle errors ('strict',
> 'ignore', or 'replace').

So I guess my point is that if NSStrings with 8-bit characters stopped 
being converted to unicodes automatically, an appropriate implementation of 
__unicode__ should be added.  It would be bad to go through __str__ and 
force an encoding to be specified explicitly, because an encoding is 
already determined for this kind of object.

On 2/4/03 12:24 AM +0100 Just van Rossum <ju...@le...> wrote:
>> Currently, because of the 7-bit possibility, if you want a unicode
>> string from a value s that came from the objc side, you need to call
>> unicode(s). I hope and assume that whatever happens with strings,
>> unicode(s) will still work.
>
> I'm curious: if the string is known to be 7-bit ascii, in what situation
> does an 8-bit string not work where a unicde string does?

I had some code of the form unicode(s).encode('utf8') because I thought it 
didn't make sense to encode something that wasn't already unicode.  But 
looking at it again, I guess encode works equally well on 7-bit strings...

> Btw. while peeking around I came across the thing Bill warned for:
> [NSString stringWithCString:] returns an instance of NSCFString, which
> is a subclass of -- tada -- NSMutableString. This doesn't mean it _is_
> mutable: I get an exception if I try. So _inheritance_ can't be used to
> determine mutability. Does anyone know off hand what can?

Any idea whether it reuses the storage of the C string, or copies it?
If it reuses it, it could be mutable by the C code...

--
David Eppstein       UC Irvine Dept. of Information & Computer Science
epp...@ic... http://www.ics.uci.edu/~eppstein/

Re: [Pyobjc-dev] Re: [ pyobjc-Bugs-679748 ] NSMutableString gets converted to Python string

From: Just v. R. <ju...@le...> - 2003-02-03 23:25:41

David Eppstein wrote:

> Currently, because of the 7-bit possibility, if you want a unicode
> string from a value s that came from the objc side, you need to call
> unicode(s). I hope and assume that whatever happens with strings,
> unicode(s) will still work.

I'm curious: if the string is known to be 7-bit ascii, in what situation
does an 8-bit string not work where a unicde string does?

> There seems to be a thread going now on c.l.py about unicode
> filenames...

I only see one about source encodings...

Btw. while peeking around I came across the thing Bill warned for:
[NSString stringWithCString:] returns an instance of NSCFString, which
is a subclass of -- tada -- NSMutableString. This doesn't mean it _is_
mutable: I get an exception if I try. So _inheritance_ can't be used to
determine mutability. Does anyone know off hand what can?

Just