[Pyobjc-dev] depythonify_c_value rejects non-ascii, non-unicode strings
Brought to you by:
ronaldoussoren
From: Marc-Antoine P. <map...@ac...> - 2004-01-21 16:24:15
|
Good day, all! I am writing some Python code that has to output Latin-1 text. Some of that output makes its way through other (python) code to a text widget through insertText_. The other code does not know about my encoding choice, as it is not my code, but Glenn Andreas' PyOxide IDE; it should not know about encoding. So it simply passes along my Latin-1 strings to the insertText_ method of a text widget, where the PyObjC bridge tries to make it into a NSString. In objc_support.c, in int depythonify_c_value (const char *type, PyObject *argument, void *datum) We have the following code (currently around line 1300:) as_unicode = PyUnicode_Decode( strval, len, PyUnicode_GetDefaultEncoding(), "strict"); if (as_unicode == NULL) { PyErr_Format(PyExc_UnicodeError, "depythonifying 'id', got " "a string with a non-default " "encoding"); return -1; } Now, it turns out that the DefaultEncoding is ascii, unless specified otherwise in PyUnicode_SetDefaultEncoding.... (from /System/Library/Frameworks/Python.framework/Headers/unicodeobject.h) Now, that means that in many cases, I get the immediately following error and no output at all. It is fairly easy to set the default encoding at startup (thanks to Glenn for pointing this out to me) using sys.setdefaultencoding('iso-8859-1') in a sitecustomize.py. However, this can only be done at Python startup, and I fear many users of the bridge may not know about this limitation. I propose that the PyObjC bridge use a less restrictive encoding than the current (bizarre) platform default, so as to allow Python to output encoded text to Cocoa widgets. (Maybe the bridge should have a hook to set the platforn default when the Python subsystem is started?) I suggest Latin 1, as it is the most common encoding, and the one most likely to be used by most (unix-written) Python code; even if the python code uses another encoding, as Latin-1 lets bytes pass through identically to widgets, if the user sees gibbersih it will be familiar gibberish. But I am sure a case could be made for mac-roman as well. Another solution (Glenn's suggestion) is to at least not decode it 'strict'ly, using 'ignore' or at worst 'replace' to allow some of the text at least to reach the user... Whatever the correct solution, I feel that the current situation (rejecting any encoded non-ascii text) is overly restrictive. Thank you for your attention, Marc-Antoine Parent |