Re: [Pyobjc-dev] depythonify_c_value rejects non-ascii, non-unicode strings
Brought to you by:
ronaldoussoren
From: Bob I. <bo...@re...> - 2004-01-21 18:20:31
|
On Jan 21, 2004, at 12:57 PM, Glenn Andreas wrote: > At 12:31 PM -0500 1/21/04, Bob Ippolito wrote: >> If you want/need to exchange arbitrary data you're going to have to >> explicitly put it in NSData. I would almost vote to *disable* the >> str<->NSString bridge in PyObjC, or make it bridge NSData instead, >> but that would just be terribly inconvenient for many people. > > What about doing both? If the conversion works, it creates an > NSString. This will handle all the current ASCII cases as well as > cases where the default encoding is explicitly set (and all the str's > are handled accordingly). > > If the conversion doesn't work, it creates NSData. Obviously, this > will push the error somewhere else, which may not be able to handle it > any better, but at least there is a chance. (The current problem was > doing something like "NSText insertText:", which would then fail with > some other error, which might even be more confusing). Oh god no! What if you wanted an NSData that happened to not have any high bits set? This sounds more like how I'd imagine unicode support to work (or not work) in a Perl ObjC bridge ;) And yes, at least at this point the error predictably happens exactly when you're doing something evil/lazy. > I suppose a more general solution is to allow for custom conversion > handlers that can be installed, but that seems to open another can of > worms... (more like a 55 gallon drum) There are custom conversion handlers, Python's unicode support. You can make file-like-objects that spew unicode and you can convert any string of known encoding to a unicode string. The problem with "conversion handlers" is that you don't know where the str came from, and without that information you can't register a conversion handler that does anything that beyond what sys.defaultencoding can do. I think that the reason sys.setdefaultencoding is only settable by the end user (or any other mechanism for starting the python interpreter) is that it's evil for a module to change the system encoding, because it can break totally unrelated code, or end user preferences, in a hard to debug way. > Another possibility is to just make the system default encoding be > UTF8 instead of ASCII, but I'm guessing if that were a good idea it > would have already been done (and would certainly cause other problems > with "str is a collection of bytes", "no str is string of characters", > "no, it's a desert topping"). setdefaultencoding doesn't ever effect str, it only affects unicode (creating unicode and coercing unicode to str). str is always a collection of bytes that happens to be convenient at times to use as a collection of characters. It does, typically, make sense for the system default encoding to be UTF8 *on OS X*, but that is a decision that effects any Python code and that decision needs to be made by the end user (or vendor, I suppose). -bob |