Re: [Pyobjc-dev] depythonify_c_value rejects non-ascii, non-unicode strings

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

On Jan 21, 2004, at 12:20 PM, Marc-Antoine Parent wrote:

>> We had this (short) discussion before:
>> http://sourceforge.net/mailarchive/message.php?msg_id=6595522
>
> Thank you for pointing it out; I had not seen it.
>
>> I've come to the conclusion that if the Python program doesn't handle 
>> all text as unicode, then it's broken.  This is really just PyObjC 
>> telling you to fix your code.
>
> I only partially agree. It is true that internally, a Python program 
> should use unicode all the way; but nobody should force me to use 
> unicode on the output. The case I am raising is that I have a Python 
> program with Latin-1 output, which is picked up by another Python 
> program, which is encoding-agnostic, and transfers it to the bridge. 
> The two programs are totally disconnected, except through I/O, and 
> that I/O may use another encoding.
>
> Now, maybe what you are saying amounts to the suggestion that the 
> second program should know (or be told) about the encoding of the 
> first program's output; and that makes sense. However, there may be 
> cases, such as mine, where it makes sense for the Python program to 
> use encoded (non-unicode) data internally, and not to care about it, 
> and (supposing I know the encoding) I should not have to convert to 
> unicode before calling the bridge at every point.
> (Granted, in this case, we could convert to unicode at the interface 
> between both programs, but that may not always be the case...)
> So let me then make a plea for an API so that a PyObjC program can 
> tell the bridge to use an encoding other than the system default, if 
> specified, even if the default behaviour remains identical, i.e. throw 
> exceptions upon non-ascii strings.
> That way, only a program that knows what it is doing will modify the 
> behaviour, and no data will be lost by default; but a program that has 
> good architectural reasons to do so might still use another encoding 
> internally.

The simple fact of the matter is that NSString is the equivalent to 
python's unicode.  If you unicode('something-with-latin-1') then you 
will get an exception.  There is no reason whatsoever to put arbitrary 
data in a NSString unless you know its encoding.

If you want/need to exchange arbitrary data you're going to have to 
explicitly put it in NSData.  I would almost vote to *disable* the 
str<->NSString bridge in PyObjC, or make it bridge NSData instead, but 
that would just be terribly inconvenient for many people.

-bob