Re: [Pyobjc-dev] depythonify_c_value rejects non-ascii, non-unicode strings
Brought to you by:
ronaldoussoren
From: Bob I. <bo...@re...> - 2004-01-21 17:28:42
|
On Jan 21, 2004, at 12:20 PM, Marc-Antoine Parent wrote: >> We had this (short) discussion before: >> http://sourceforge.net/mailarchive/message.php?msg_id=6595522 > > Thank you for pointing it out; I had not seen it. > >> I've come to the conclusion that if the Python program doesn't handle >> all text as unicode, then it's broken. This is really just PyObjC >> telling you to fix your code. > > I only partially agree. It is true that internally, a Python program > should use unicode all the way; but nobody should force me to use > unicode on the output. The case I am raising is that I have a Python > program with Latin-1 output, which is picked up by another Python > program, which is encoding-agnostic, and transfers it to the bridge. > The two programs are totally disconnected, except through I/O, and > that I/O may use another encoding. > > Now, maybe what you are saying amounts to the suggestion that the > second program should know (or be told) about the encoding of the > first program's output; and that makes sense. However, there may be > cases, such as mine, where it makes sense for the Python program to > use encoded (non-unicode) data internally, and not to care about it, > and (supposing I know the encoding) I should not have to convert to > unicode before calling the bridge at every point. > (Granted, in this case, we could convert to unicode at the interface > between both programs, but that may not always be the case...) > So let me then make a plea for an API so that a PyObjC program can > tell the bridge to use an encoding other than the system default, if > specified, even if the default behaviour remains identical, i.e. throw > exceptions upon non-ascii strings. > That way, only a program that knows what it is doing will modify the > behaviour, and no data will be lost by default; but a program that has > good architectural reasons to do so might still use another encoding > internally. The simple fact of the matter is that NSString is the equivalent to python's unicode. If you unicode('something-with-latin-1') then you will get an exception. There is no reason whatsoever to put arbitrary data in a NSString unless you know its encoding. If you want/need to exchange arbitrary data you're going to have to explicitly put it in NSData. I would almost vote to *disable* the str<->NSString bridge in PyObjC, or make it bridge NSData instead, but that would just be terribly inconvenient for many people. -bob |