Thread: [Pyobjc-dev] depythonify_c_value rejects non-ascii, non-unicode strings

Brought to you by: ronaldoussoren

pyobjc-dev

[Pyobjc-dev] depythonify_c_value rejects non-ascii, non-unicode strings

From: Marc-Antoine P. <map...@ac...> - 2004-01-21 16:24:15

Good day, all!

I am writing some Python code that has to output Latin-1 text.
Some of that output makes its way through other (python) code to a text 
widget through insertText_. The other code does not know about my 
encoding choice, as it is not my code, but Glenn Andreas' PyOxide IDE; 
it should not know about encoding. So it simply passes along my Latin-1 
strings to the insertText_ method of a text widget, where the PyObjC 
bridge tries to make it into a NSString.

In objc_support.c, in  int depythonify_c_value (const char *type, 
PyObject *argument, void *datum)
We have the following code
(currently around line 1300:)
			as_unicode = PyUnicode_Decode(
				strval,
				len,
				PyUnicode_GetDefaultEncoding(),
				"strict");
			if (as_unicode == NULL) {
				PyErr_Format(PyExc_UnicodeError,
					"depythonifying 'id', got "
					"a string with a non-default "
					"encoding");
				return -1;
			}
Now, it turns out that the DefaultEncoding is ascii, unless specified 
otherwise in PyUnicode_SetDefaultEncoding....
(from 
/System/Library/Frameworks/Python.framework/Headers/unicodeobject.h)
Now, that means that in many cases, I get the immediately following 
error and no output at all.

It is fairly easy to set the default encoding at startup (thanks to 
Glenn for pointing this out to me) using 
sys.setdefaultencoding('iso-8859-1') in a sitecustomize.py.
However, this can only be done at Python startup, and I fear many users 
of the bridge may not know about this limitation.
I propose that the PyObjC bridge use a less restrictive encoding than 
the current (bizarre) platform default, so as to allow Python to output 
encoded text to Cocoa widgets.
(Maybe the bridge should have a hook to set the platforn default when 
the Python subsystem is started?)
I suggest Latin 1, as it is the most common encoding, and the one most 
likely to be used by most (unix-written) Python code; even if the 
python code uses another encoding, as Latin-1 lets bytes pass through 
identically to widgets, if the user sees gibbersih it will be familiar 
gibberish. But I am sure a case could be made for mac-roman as well.
Another solution (Glenn's suggestion) is to at least not decode it 
'strict'ly, using 'ignore' or at worst 'replace' to allow some of the 
text at least to reach the user...

Whatever the correct solution, I feel that the current situation 
(rejecting any encoded non-ascii text) is overly restrictive.

Thank you for your attention,
Marc-Antoine Parent

Re: [Pyobjc-dev] depythonify_c_value rejects non-ascii, non-unicode strings

From: Bob I. <bo...@re...> - 2004-01-21 16:49:17

On Jan 21, 2004, at 11:24 AM, Marc-Antoine Parent wrote:

> I am writing some Python code that has to output Latin-1 text.
> Some of that output makes its way through other (python) code to a 
> text widget through insertText_. The other code does not know about my 
> encoding choice, as it is not my code, but Glenn Andreas' PyOxide IDE; 
> it should not know about encoding. So it simply passes along my 
> Latin-1 strings to the insertText_ method of a text widget, where the 
> PyObjC bridge tries to make it into a NSString.

We had this (short) discussion before:
http://sourceforge.net/mailarchive/message.php?msg_id=6595522

I've come to the conclusion that if the Python program doesn't handle 
all text as unicode, then it's broken.  This is really just PyObjC 
telling you to fix your code.

Here's some important snippets that helped me come to this conclusion:

[Just van Rossum]
  Strongly disagree. This leads to silent errors, possibly even data 
loss.
  You _have_ to know the encoding, and you _have_ to deal with it. If
  there's no way you can know the encoding, you have to explicitly tell
  which encoding or behavior to use.

  Btw. it's not so much PyObjC's behavior, but Python's default str ->
  unicode coercion behavior. Perhaps it's "fixable" in the bridge, but I
  think it's a bad idea to deviate from Python's behavior (in addition to
  that I find it a bad idea to begin with).

[Ronald Oussoren]
  BTW. You should convert all input to unicode instead of waiting for
  problems with the implicit conversion to unicode that is performed by
  PyObjC. You're more likely to know the right encoding while reading the
  data.

-bob

Re: [Pyobjc-dev] depythonify_c_value rejects non-ascii, non-unicode strings

From: Marc-Antoine P. <map...@ac...> - 2004-01-21 17:20:08

> We had this (short) discussion before:
> http://sourceforge.net/mailarchive/message.php?msg_id=6595522

Thank you for pointing it out; I had not seen it.

> I've come to the conclusion that if the Python program doesn't handle 
> all text as unicode, then it's broken.  This is really just PyObjC 
> telling you to fix your code.

I only partially agree. It is true that internally, a Python program 
should use unicode all the way; but nobody should force me to use 
unicode on the output. The case I am raising is that I have a Python 
program with Latin-1 output, which is picked up by another Python 
program, which is encoding-agnostic, and transfers it to the bridge. 
The two programs are totally disconnected, except through I/O, and that 
I/O may use another encoding.

Now, maybe what you are saying amounts to the suggestion that the 
second program should know (or be told) about the encoding of the first 
program's output; and that makes sense. However, there may be cases, 
such as mine, where it makes sense for the Python program to use 
encoded (non-unicode) data internally, and not to care about it, and 
(supposing I know the encoding) I should not have to convert to unicode 
before calling the bridge at every point.
(Granted, in this case, we could convert to unicode at the interface 
between both programs, but that may not always be the case...)
So let me then make a plea for an API so that a PyObjC program can tell 
the bridge to use an encoding other than the system default, if 
specified, even if the default behaviour remains identical, i.e. throw 
exceptions upon non-ascii strings.
That way, only a program that knows what it is doing will modify the 
behaviour, and no data will be lost by default; but a program that has 
good architectural reasons to do so might still use another encoding 
internally.

Marc-Antoine Parent

Re: [Pyobjc-dev] depythonify_c_value rejects non-ascii, non-unicode strings

From: Bob I. <bo...@re...> - 2004-01-21 17:28:42

On Jan 21, 2004, at 12:20 PM, Marc-Antoine Parent wrote:

>> We had this (short) discussion before:
>> http://sourceforge.net/mailarchive/message.php?msg_id=6595522
>
> Thank you for pointing it out; I had not seen it.
>
>> I've come to the conclusion that if the Python program doesn't handle 
>> all text as unicode, then it's broken.  This is really just PyObjC 
>> telling you to fix your code.
>
> I only partially agree. It is true that internally, a Python program 
> should use unicode all the way; but nobody should force me to use 
> unicode on the output. The case I am raising is that I have a Python 
> program with Latin-1 output, which is picked up by another Python 
> program, which is encoding-agnostic, and transfers it to the bridge. 
> The two programs are totally disconnected, except through I/O, and 
> that I/O may use another encoding.
>
> Now, maybe what you are saying amounts to the suggestion that the 
> second program should know (or be told) about the encoding of the 
> first program's output; and that makes sense. However, there may be 
> cases, such as mine, where it makes sense for the Python program to 
> use encoded (non-unicode) data internally, and not to care about it, 
> and (supposing I know the encoding) I should not have to convert to 
> unicode before calling the bridge at every point.
> (Granted, in this case, we could convert to unicode at the interface 
> between both programs, but that may not always be the case...)
> So let me then make a plea for an API so that a PyObjC program can 
> tell the bridge to use an encoding other than the system default, if 
> specified, even if the default behaviour remains identical, i.e. throw 
> exceptions upon non-ascii strings.
> That way, only a program that knows what it is doing will modify the 
> behaviour, and no data will be lost by default; but a program that has 
> good architectural reasons to do so might still use another encoding 
> internally.

The simple fact of the matter is that NSString is the equivalent to 
python's unicode.  If you unicode('something-with-latin-1') then you 
will get an exception.  There is no reason whatsoever to put arbitrary 
data in a NSString unless you know its encoding.

If you want/need to exchange arbitrary data you're going to have to 
explicitly put it in NSData.  I would almost vote to *disable* the 
str<->NSString bridge in PyObjC, or make it bridge NSData instead, but 
that would just be terribly inconvenient for many people.

-bob

Re: [Pyobjc-dev] depythonify_c_value rejects non-ascii, non-unicode strings

From: Marc-Antoine P. <map...@ac...> - 2004-01-21 17:49:45

> The simple fact of the matter is that NSString is the equivalent to 
> python's unicode.  If you unicode('something-with-latin-1') then you 
> will get an exception.  There is no reason whatsoever to put arbitrary 
> data in a NSString unless you know its encoding.

That sentence agrees with my point the second time: What if I _do_ know 
the encoding, and I want to tell the bridge about it?
Your point is that I should convert strings to unicode before the 
bridge; my point is that I may be calling the bridge in quite a few 
places, and converting there may not be practical.
Whereas if the bridge had a simple API, viz.
PyObjC.setStringEncoding(str)
PyObjC.getStringEncoding()
getting and setting a variable which defaults to the system's default 
encoding,
then it would be easy to still use (single-byte) strings in Python if 
so desired (again, do realize that one is often dealing with someone 
else's code, and reengineering it is not always practical.)

> If you want/need to exchange arbitrary data you're going to have to 
> explicitly put it in NSData.

That would be valid for arbitrary data; but strings of a _known_ 
encoding are not arbitrary data.

> I would almost vote to *disable* the str<->NSString bridge in PyObjC, 
> or make it bridge NSData instead, but that would just be terribly 
> inconvenient for many people.

Indeed.

Marc-Antoine Parent

Re: [Pyobjc-dev] depythonify_c_value rejects non-ascii, non-unicode strings

From: Bob I. <bo...@re...> - 2004-01-21 18:08:28

On Jan 21, 2004, at 12:50 PM, Marc-Antoine Parent wrote:

>> The simple fact of the matter is that NSString is the equivalent to 
>> python's unicode.  If you unicode('something-with-latin-1') then you 
>> will get an exception.  There is no reason whatsoever to put 
>> arbitrary data in a NSString unless you know its encoding.
>
> That sentence agrees with my point the second time: What if I _do_ 
> know the encoding, and I want to tell the bridge about it?
> Your point is that I should convert strings to unicode before the 
> bridge; my point is that I may be calling the bridge in quite a few 
> places, and converting there may not be practical.
> Whereas if the bridge had a simple API, viz.
> PyObjC.setStringEncoding(str)
> PyObjC.getStringEncoding()
> getting and setting a variable which defaults to the system's default 
> encoding,
> then it would be easy to still use (single-byte) strings in Python if 
> so desired (again, do realize that one is often dealing with someone 
> else's code, and reengineering it is not always practical.)

The problem with this proposal is that you want a function to change 
the encoding related to *your* code, the proposed API changes the 
encoding for *all* code that uses the bridge.  If you had control over 
all of the code then it would be fine, but in that case you would also 
be able to just change Python's default encoding.

>> If you want/need to exchange arbitrary data you're going to have to 
>> explicitly put it in NSData.
>
> That would be valid for arbitrary data; but strings of a _known_ 
> encoding are not arbitrary data.

Yeah they are, they're arbitrary data until they're combined with the 
encoding metadata -- which is the unicode type.

In any case, this really just isn't going to happen.  There's too many 
extremely good reasons not to do it.

-bob

Re: [Pyobjc-dev] depythonify_c_value rejects non-ascii, non-unicode strings

From: Marc-Antoine P. <map...@ac...> - 2004-01-21 19:27:13

>> That sentence agrees with my point the second time: What if I _do_ 
>> know the encoding, and I want to tell the bridge about it?
>> Your point is that I should convert strings to unicode before the 
>> bridge; my point is that I may be calling the bridge in quite a few 
>> places, and converting there may not be practical.
>> Whereas if the bridge had a simple API, viz.
>> PyObjC.setStringEncoding(str)
>> PyObjC.getStringEncoding()
>> getting and setting a variable which defaults to the system's default 
>> encoding,
>> then it would be easy to still use (single-byte) strings in Python if 
>> so desired (again, do realize that one is often dealing with someone 
>> else's code, and reengineering it is not always practical.)
>
> The problem with this proposal is that you want a function to change 
> the encoding related to *your* code, the proposed API changes the 
> encoding for *all* code that uses the bridge.

Do you mean that this global would be shared by two different python 
programs using the bridge? (i.e. in different processes...)
That would be indeed very dangerous and fully justify your reluctance. 
Otherwise, see my point in another post about uniqueness of GUI.

>  If you had control over all of the code then it would be fine, but in 
> that case you would also be able to just change Python's default 
> encoding.

Remember that I cannot do it after startup,

>>> If you want/need to exchange arbitrary data you're going to have to 
>>> explicitly put it in NSData.
>>
>> That would be valid for arbitrary data; but strings of a _known_ 
>> encoding are not arbitrary data.
>
> Yeah they are, they're arbitrary data until they're combined with the 
> encoding metadata -- which is the unicode type.

My point was to allow for more than one way to combine them. Unicode is 
one solution, and my favoured solution in most cases, but not always 
the best solution, and sometimes not practically available.

> In any case, this really just isn't going to happen.  There's too many 
> extremely good reasons not to do it.

Well, I will stop here, it is clear you do not find my arguments 
compelling, and that is unfortunately that.
We still disagree, but thank you for taking the time to give me your 
reasons.

Regards,

Marc-Antoine Parent

Re: [Pyobjc-dev] depythonify_c_value rejects non-ascii, non-unicode strings

From: Bob I. <bo...@re...> - 2004-01-21 19:51:33

On Jan 21, 2004, at 2:27 PM, Marc-Antoine Parent wrote:

>>> That sentence agrees with my point the second time: What if I _do_=20=

>>> know the encoding, and I want to tell the bridge about it?
>>> Your point is that I should convert strings to unicode before the=20
>>> bridge; my point is that I may be calling the bridge in quite a few=20=

>>> places, and converting there may not be practical.
>>> Whereas if the bridge had a simple API, viz.
>>> PyObjC.setStringEncoding(str)
>>> PyObjC.getStringEncoding()
>>> getting and setting a variable which defaults to the system's=20
>>> default encoding,
>>> then it would be easy to still use (single-byte) strings in Python=20=

>>> if so desired (again, do realize that one is often dealing with=20
>>> someone else's code, and reengineering it is not always practical.)
>>
>> The problem with this proposal is that you want a function to change=20=

>> the encoding related to *your* code, the proposed API changes the=20
>> encoding for *all* code that uses the bridge.
>
> Do you mean that this global would be shared by two different python=20=

> programs using the bridge? (i.e. in different processes...)
> That would be indeed very dangerous and fully justify your reluctance.=20=

> Otherwise, see my point in another post about uniqueness of GUI.
>
>>  If you had control over all of the code then it would be fine, but=20=

>> in that case you would also be able to just change Python's default=20=

>> encoding.
>
> Remember that I cannot do it after startup,
>
>>>> If you want/need to exchange arbitrary data you're going to have to=20=

>>>> explicitly put it in NSData.
>>>
>>> That would be valid for arbitrary data; but strings of a _known_=20
>>> encoding are not arbitrary data.
>>
>> Yeah they are, they're arbitrary data until they're combined with the=20=

>> encoding metadata -- which is the unicode type.
>
> My point was to allow for more than one way to combine them. Unicode=20=

> is one solution, and my favoured solution in most cases, but not=20
> always the best solution, and sometimes not practically available.

I think I understand your problem now, you have a console program that=20=

is interacting with a GUI application  via a pipe.  This GUI=20
application is trying to display the output of your program, but since=20=

it does not know the encoding of your text it is passing on NSString=20
and crossing its fingers.  The correct solution is, of course, to fix=20
the GUI application; the way it is handling text is broken.

Solution:
Possibly use a configuration panel for the GUI to choose the encoding=20
of incoming pipes
Use codecs.getreader(your_encoding) on the pipe, and use that to create=20=

NSStrings.

 >>> import sys
 >>> import codecs
 >>> input =3D codecs.getreader('utf8')(sys.stdin)
 >>> input.readline()
=8E=F0
u'\xe9\uf8ff\n'

-bob

Re: [Pyobjc-dev] depythonify_c_value rejects non-ascii, non-unicode strings

From: Marc-Antoine P. <map...@ac...> - 2004-01-21 20:17:00

Attachments: smime.p7s

> I think I understand your problem now, you have a console program that=20=

> is interacting with a GUI application  via a pipe.  This GUI=20
> application is trying to display the output of your program, but since=20=

> it does not know the encoding of your text it is passing on NSString=20=

> and crossing its fingers.

That is indeed my case.
I was trying to make a more general argument, about third-party=20
non-unicode libraries in general, but I will admit it is theoretical. I=20=

still feel that the fact that there is a single point of conversion in=20=

the PyObjC bridge makes it a very practical point of control. But I=20
will now try to restrain myself to my current problem.

>  The correct solution is, of course, to fix the GUI application; the=20=

> way it is handling text is broken.
>
> Solution:
> Possibly use a configuration panel for the GUI to choose the encoding=20=

> of incoming pipes
> Use codecs.getreader(your_encoding) on the pipe, and use that to=20
> create NSStrings....

Yes, in this case, we can ask Glen about it (I have) and/or do the=20
change (I may.)
If the application were closed source, I would be in more trouble.=20
Hence my request.

Le 04-01-21, =E0 14:57, Ronald Oussoren a =E9crit :

>> The fact that setdefaultencoding can only be set at startup is a=20
>> major limitation, and the reason that I argue for a separate value in=20=

>> the bridge.
>
> And the fact that setdefaultencoding exists and is removed early=20
> during startup is an important reason for not adding a simular=20
> function to PyObjC.

I am arguing it is not similar, as it controls a single point of=20
conversion (communication with the Cocoa code) as opposed to Python=20
behaviour as a whole.
I assume it makes sense, in that (in my limited experience) the Cocoa=20
interface is mostly used to talk with the UI, which is a well-defined=20
subset of the API.
Though I admit that this would also affect other parts of the Cocoa=20
bridge, if used, which is as bad as changing Python as a whole.

> If you really want to change the encoding after startup you should=20
> probably file a bugreport for Python, or ask around on=20
> comp.lang.python.

Fair, but I still think that my case is slightly different.

> BTW. If you build .app bundles you can completely replace the site.py=20=

> inside your application

Ah? How, out of curiosity?

Marc-Antoine

Re: [Pyobjc-dev] depythonify_c_value rejects non-ascii, non-unicode strings

From: Bob I. <bo...@re...> - 2004-01-21 20:36:51

On Jan 21, 2004, at 3:17 PM, Marc-Antoine Parent wrote:

>> I think I understand your problem now, you have a console program=20
>> that is interacting with a GUI application  via a pipe.  This GUI=20
>> application is trying to display the output of your program, but=20
>> since it does not know the encoding of your text it is passing on=20
>> NSString and crossing its fingers.
>
> That is indeed my case.
> I was trying to make a more general argument, about third-party=20
> non-unicode libraries in general, but I will admit it is theoretical.=20=

> I still feel that the fact that there is a single point of conversion=20=

> in the PyObjC bridge makes it a very practical point of control. But I=20=

> will now try to restrain myself to my current problem.

Encodings are serialization formats, beyond that you need to be using=20
unicode.  This is by far one of the worst things about Python: we have=20=

this AWESOME unicode support, but we forget to use it most of the time=20=

because it requires us to put a u in front of our text.  Hopefully=20
someday, Python str will be crippled to the point where nobody will=20
want to use it for anything but raw data.

>>  The correct solution is, of course, to fix the GUI application; the=20=

>> way it is handling text is broken.
>>
>> Solution:
>> Possibly use a configuration panel for the GUI to choose the encoding=20=

>> of incoming pipes
>> Use codecs.getreader(your_encoding) on the pipe, and use that to=20
>> create NSStrings....
>
> Yes, in this case, we can ask Glen about it (I have) and/or do the=20
> change (I may.)
> If the application were closed source, I would be in more trouble.=20
> Hence my request.

The truth of the matter is that the application is broken, whether it's=20=

open source or closed.

<offtopic>
Because it's open source, and you're a developer, you have this=20
wonderful i-can-fix-it-if-i-have-to power over your software.  That's=20
what I really like about open source.  I don't particularly care for=20
the rest of it (especially annoyances like the GPL and even LGPL).  If=20=

everyone just used Python/BSD/MIT-style licenses, then we could all=20
share code and not have to hire a lawyer to see if we can reuse=20
something in another open source project with a different license.
</offtopic>

> Le 04-01-21, =E0 14:57, Ronald Oussoren a =E9crit :
>
>>> The fact that setdefaultencoding can only be set at startup is a=20
>>> major limitation, and the reason that I argue for a separate value=20=

>>> in the bridge.
>>
>> And the fact that setdefaultencoding exists and is removed early=20
>> during startup is an important reason for not adding a simular=20
>> function to PyObjC.
>
> I am arguing it is not similar, as it controls a single point of=20
> conversion (communication with the Cocoa code) as opposed to Python=20
> behaviour as a whole.
> I assume it makes sense, in that (in my limited experience) the Cocoa=20=

> interface is mostly used to talk with the UI, which is a well-defined=20=

> subset of the API.
> Though I admit that this would also affect other parts of the Cocoa=20
> bridge, if used, which is as bad as changing Python as a whole.
>
>> If you really want to change the encoding after startup you should=20
>> probably file a bugreport for Python, or ask around on=20
>> comp.lang.python.
>
> Fair, but I still think that my case is slightly different.
>
>> BTW. If you build .app bundles you can completely replace the site.py=20=

>> inside your application
>
> Ah? How, out of curiosity?

http://pythonmac.org/wiki/BundleBuilder

The bootstrap script sets your PYTHONPATH to the Resources folder, so=20
you can put a sitecustomize.py there and it will just work

-bob

Re: [Pyobjc-dev] depythonify_c_value rejects non-ascii, non-unicode strings

From: Marc-Antoine P. <map...@ac...> - 2004-01-21 20:52:50

Attachments: smime.p7s

>>> BTW. If you build .app bundles you can completely replace the 
>>> site.py inside your application
>>
>> Ah? How, out of curiosity?
>
> http://pythonmac.org/wiki/BundleBuilder
>
> The bootstrap script sets your PYTHONPATH to the Resources folder, so 
> you can put a sitecustomize.py there and it will just work

OK, I did not realize this. I had tried in one case, but the Python had 
been segregated in a subfolder, so it failed for me. I should have 
tried harder.
Thanks

Re: [Pyobjc-dev] depythonify_c_value rejects non-ascii, non-unicode strings

From: Glenn A. <gan...@ma...> - 2004-01-21 17:58:14

At 12:31 PM -0500 1/21/04, Bob Ippolito wrote:
>If you want/need to exchange arbitrary data you're going to have to 
>explicitly put it in NSData.  I would almost vote to *disable* the 
>str<->NSString bridge in PyObjC, or make it bridge NSData instead, 
>but that would just be terribly inconvenient for many people.

What about doing both?  If the conversion works, it creates an 
NSString. This will handle all the current ASCII cases as well as 
cases where the default encoding is explicitly set (and all the str's 
are handled accordingly).

If the conversion doesn't work, it creates NSData.  Obviously, this 
will push the error somewhere else, which may not be able to handle 
it any better, but at least there is a chance.  (The current problem 
was doing something like "NSText insertText:", which would then fail 
with some other error, which might even be more confusing).

I suppose a more general solution is to allow for custom conversion 
handlers that can be installed, but that seems to open another can of 
worms... (more like a 55 gallon drum)

Another possibility is to just make the system default encoding be 
UTF8 instead of ASCII, but I'm guessing if that were a good idea it 
would have already been done (and would certainly cause other 
problems with "str is a collection of bytes", "no str is string of 
characters", "no, it's a desert topping").

Based on the number of google group hits on "+python 
+setdefaultencodings" these sorts of issues bite those using IDLE, 
etc...

-- 
Glenn Andreas                      gan...@de... 
Theldrow, Blobbo, Cythera, oh my!
Be good, and you will be lonesome

Re: [Pyobjc-dev] depythonify_c_value rejects non-ascii, non-unicode strings

From: Bob I. <bo...@re...> - 2004-01-21 18:20:31

On Jan 21, 2004, at 12:57 PM, Glenn Andreas wrote:

> At 12:31 PM -0500 1/21/04, Bob Ippolito wrote:
>> If you want/need to exchange arbitrary data you're going to have to 
>> explicitly put it in NSData.  I would almost vote to *disable* the 
>> str<->NSString bridge in PyObjC, or make it bridge NSData instead, 
>> but that would just be terribly inconvenient for many people.
>
> What about doing both?  If the conversion works, it creates an 
> NSString. This will handle all the current ASCII cases as well as 
> cases where the default encoding is explicitly set (and all the str's 
> are handled accordingly).
>
> If the conversion doesn't work, it creates NSData.  Obviously, this 
> will push the error somewhere else, which may not be able to handle it 
> any better, but at least there is a chance.  (The current problem was 
> doing something like "NSText insertText:", which would then fail with 
> some other error, which might even be more confusing).

Oh god no!  What if you wanted an NSData that happened to not have any 
high bits set?  This sounds more like how I'd imagine unicode support 
to work (or not work) in a Perl ObjC bridge ;)

And yes, at least at this point the error predictably happens exactly 
when you're doing something evil/lazy.

> I suppose a more general solution is to allow for custom conversion 
> handlers that can be installed, but that seems to open another can of 
> worms... (more like a 55 gallon drum)

There are custom conversion handlers, Python's unicode support.  You 
can make file-like-objects that spew unicode and you can convert any 
string of known encoding to a unicode string.  The problem with 
"conversion handlers" is that you don't know where the str came from, 
and without that information you can't register a conversion handler 
that does anything that beyond what sys.defaultencoding can do.

I think that the reason sys.setdefaultencoding is only settable by the 
end user (or any other mechanism for starting the python interpreter) 
is that it's evil for a module to change the system encoding, because 
it can break totally unrelated code, or end user preferences, in a hard 
to debug way.

> Another possibility is to just make the system default encoding be 
> UTF8 instead of ASCII, but I'm guessing if that were a good idea it 
> would have already been done (and would certainly cause other problems 
> with "str is a collection of bytes", "no str is string of characters", 
> "no, it's a desert topping").

setdefaultencoding doesn't ever effect str, it only affects unicode 
(creating unicode and coercing unicode to str).  str is always a 
collection of bytes that happens to be convenient at times to use as a 
collection of characters.  It does, typically, make sense for the 
system default encoding to be UTF8 *on OS X*, but that is a decision 
that effects any Python code and that decision needs to be made by the 
end user (or vendor, I suppose).

-bob

Re: [Pyobjc-dev] depythonify_c_value rejects non-ascii, non-unicode strings

From: Ronald O. <ous...@ci...> - 2004-01-21 17:38:08

On 21 jan 2004, at 18:20, Marc-Antoine Parent wrote:
[...]
> So let me then make a plea for an API so that a PyObjC program can 
> tell the bridge to use an encoding other than the system default, if 
> specified, even if the default behaviour remains identical, i.e. throw 
> exceptions upon non-ascii strings.

I don't like introducing global switches like this, libraries may 
modify the switch and change the behaviour of other code.

Too bad that sitecustomize.py cannot in the same directory as a script 
(dirname(sys.argv[0] is added after site.py finishes). BTW. does anyone 
know why sys.setdefaultencoding is removed in site.py? E.g. why is it 
good that users cannot change the default encoding after the 
interpreter has initialized?

> That way, only a program that knows what it is doing will modify the 
> behaviour, and no data will be lost by default; but a program that has 
> good architectural reasons to do so might still use another encoding 
> internally.

Unicode should be good enough for this. The strings used by Cocoa are 
Unicode strings there's not much you can do about this.

Ronald

Re: [Pyobjc-dev] depythonify_c_value rejects non-ascii, non-unicode strings

From: Bob I. <bo...@re...> - 2004-01-21 17:41:03

On Jan 21, 2004, at 12:38 PM, Ronald Oussoren wrote:

>
> On 21 jan 2004, at 18:20, Marc-Antoine Parent wrote:
> [...]
>> So let me then make a plea for an API so that a PyObjC program can 
>> tell the bridge to use an encoding other than the system default, if 
>> specified, even if the default behaviour remains identical, i.e. 
>> throw exceptions upon non-ascii strings.
>
> I don't like introducing global switches like this, libraries may 
> modify the switch and change the behaviour of other code.
>
> Too bad that sitecustomize.py cannot in the same directory as a script 
> (dirname(sys.argv[0] is added after site.py finishes). BTW. does 
> anyone know why sys.setdefaultencoding is removed in site.py? E.g. why 
> is it good that users cannot change the default encoding after the 
> interpreter has initialized?

sys.setdefaultencoding is probably removed in site.py for the same 
reason you don't like global switches.. someone could 
sys.setdefaultencoding in a module that you use, for example.

-bob