On 13 Jun, 2012, at 2:22, Ken Thomases wrote:
> I can't claim to follow all of the details of how PyObjC works, but here are my impressions:
> On Jun 12, 2012, at 10:13 AM, Ronald Oussoren wrote:
>> trim_chars = u"*"
>> #trim_chars = Cocoa.CFStringCreateWithCString(None, trim_chars, Cocoa.kCFStringEncodingASCII)
>> Cocoa.CFStringTrim(s, trim_chars)
> Here, there are two paths, depending on whether that line above is commented out. Either a unicode object is implicitly converted to a CFString/NSString or a it's an objc.pyobjc_unicode object wrapping what's already a CFString/NSString. What are the code paths used by the two cases? A lot of it is implicit and I'm not able to follow it.
Case 1: trim_chars = u"" -> CFStringTrim's second argument is a Python unicode which is proxied as a OC_PythonUnicode object and that proxy is passed to the C function
Case 2: trim_chars = Cocoa.CFString... -> CFStringTrim's second argument is an instance of pyobjc_unicode, and the embedded NSString* value is passed to the C function
pyobjc_unicode is used to proxy an Objective-C string to Python. I'd prefer to not use a custom class for that, but that would increase friction: you'd have to explicitly convert NSString instances to a python unicode object when using numerous Python APIs implemented in C. Pyobjc_unicode shouldn't really be an issue for this problem though, the only instance of that class in the script is the instance of "s", and that's released as soon as "s.nsstring()" is called.
>> For me te problem only occurs when I run this code with a 64-bit build of python ("arch -x86_64 python2.7 ...") and works fine in a 32-bit build ("arch -i386 python2.7 ..."). I have only tested on OSX 10.7, I'm currently traveling and cannot easily test on other releases. To make life even more interesting, the problem only occurs when "PyObjC_UNICODE_FAST_PATH" is active.
> I note that PyObjC_UNICODE_FAST_PATH affects PyObjCUnicode_New() as well as the implementation of OC_PythonUnicode. I recommend that, instead of disabling it globally by tweaking the header, you disable it in each translation unit independently to isolate which is the problem. I'm sort of suspecting it's PyObjCUnicode_New() rather than OC_PythonUnicode, especially given that you tried disabling the implementation of -getCharacters:range:.
I've in effect disabled PyObjC_UNICODE_FAST_PATH for OC_PythonUnicode at least for now, and that fixes the issue as well. PyObjCUnicode_New should be in the clear.
My new implementation for OC_PythonUnicode always uses the __realObject__ trick, and only optimzes the implementation of __realObject__: when sizeof(unichar) == sizeof(Py_UNICODE) I create the NSString with the "NoCopy" variant of the NSString initializer. This avoids duplicating the string contents, although there's still the unnecessary object that eats more memory.
I'll probably revisit this in the future, I'd prefer to get rid of the additional object where possible. For now my current implementation works, and there are more important things to work on right now (not in the least getting an actual release out).
> I also note that PyObjCUnicode_New() uses the deprecated method -getCharacters: (no range) when the Unicode fast path is enabled. Other than that, I see no obvious problems with either code.
Good catch. This should be harmless, but -getCharacters:range: is saver. I'll update the code.
> Given the 32-/64-bit difference, I was for a while suspecting that Py_UNICODE might be 4-byte UCS32 under 64-bit, but I see that PyObjC_UNICODE_FAST_PATH would not be enabled in that case. *shrug*
Luckily the size of Py_UNICODE is a configure-time constant, and 16 bits in the default builds of Python. In Python 3.3 and later Py_UNICODE is UCS4 unconditionally, and Python's unicode object then uses UCS1, UCS2 or UCS4 as the backing store as appropriate. And after some optimization Python 3.3 unicode string is now as memory efficient and fast as Python 2.7's byte string for most code (according to discussions on python-dev, I haven't done benchmarking myself).
Anyway, thanks for your help.
I've also talked to an Apple engineer at WWDC, and the custom NSString subclass should just work, I was at one point worrying that CFString's APIs weren't guaranteed to work with custom NSString subclasses.