From: SourceForge.net <no...@so...> - 2011-01-27 02:23:05
|
Feature Requests item #3090894, was opened at 2010-10-19 17:41 Message generated for change (Comment added) made by srparish You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=351645&aid=3090894&group_id=1645 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: None Group: None Status: Open Priority: 5 Private: No Submitted By: Jim (jimfcarroll) Assigned to: Nobody/Anonymous (nobody) Summary: Python 2.x unicode handling for std::string Initial Comment: I'm not sure why but, while SWIG handles unicode in python 3, it doesn't handle it in python 2.x. The following test case should work: C++ code ----------------- ... void func(std::string val) { printf ("%s",val.c_str()); } ... ----------------- python 2.x code: ----------------- ... module.func(unicode('hello world')) ... I have attached a patch that allows this to work. The patch applies to [src]/Lib/python/pystrings.swg. ---------------------------------------------------------------------- Comment By: Scott Parish (srparish) Date: 2011-01-26 20:23 Message: Not having something like this was a huge pain for a project I've been working on. A growing number of python libraries are returning unicode for everything (for example json and pymongo) and it's a continual pain to catch and convert every string that comes in before passing it through swig. For what it's worth I independently developed the following patch before seeing this feature request. The patch works with python 2.6 and 2.7 and converts the string if it only contained ascii characters. Otherwise it falls back on the existing behavior of throwing an exception. (I'm adding it here as I can't find how to upload additional files to this request): diff --git a/include/pystrings.swg b/include/pystrings.swg index 1983037..29cbf3c 100644 --- a/include/pystrings.swg +++ b/include/pystrings.swg @@ -5,13 +9,13 @@ SWIGINTERN int SWIG_AsCharPtrAndSize(PyObject *obj, char** cptr, size_t* psize, int *alloc) { + char *cstr; Py_ssize_t len; %#if PY_VERSION_HEX>=0x03000000 if (PyUnicode_Check(obj)) %#else if (PyString_Check(obj)) %#endif { - char *cstr; Py_ssize_t len; %#if PY_VERSION_HEX>=0x03000000 if (!alloc && cptr) { /* We can't allow converting without allocation, since the internal @@ -62,6 +66,16 @@ SWIG_AsCharPtrAndSize(PyObject *obj, char** cptr, size_t* psize, int *alloc) Py_XDECREF(obj); %#endif return SWIG_OK; + } else if (PyUnicode_Check(obj)) { + obj = PyUnicode_AsASCIIString(obj); + if (obj == NULL) + return SWIG_TypeError; + PyString_AsStringAndSize(obj, &cstr, &len); + if (cptr) *cptr = %new_copy_array(cstr, len + 1, char); + if (alloc) *alloc = SWIG_NEWOBJ; + if (psize) *psize = len + 1; + Py_XDECREF(obj); + return SWIG_OK; } else { swig_type_info* pchar_descriptor = SWIG_pchar_descriptor(); if (pchar_descriptor) { ---------------------------------------------------------------------- Comment By: Jim (jimfcarroll) Date: 2010-10-28 06:16 Message: I believe Nytrogenycs has valid points both about using std::string as a data container and about the choice of UTF-8. I currently have a typemap that causes the behavior I'm looking for. My biggest concern is that my typemap is fragile because it depends too much on the existing functionality remaining structured the way it currently is, and may break if the SWIG default python string handling structure is modified significantly in a future release - hence my reason for submitting a patch. A middle ground would be to allow those developing multi-lingual application to select this behavior (or 'behaviour' has Nytrogenycs would say :-) ) somehow. ---------------------------------------------------------------------- Comment By: Jim (jimfcarroll) Date: 2010-10-28 06:00 Message: ah, we cross posted - see below. ---------------------------------------------------------------------- Comment By: nitro (nitrogenycs) Date: 2010-10-28 05:58 Message: I think raising the exception is the proper behaviour. The choice of utf-8 is entirely arbitrary. The user should call the function like myfunc( myunicodestring.encode('utf-8') ) explicitly in my opinion. I have not yet investigated the swig and python 3 behaviour. However, I think python 3 unicode strings should not automatically map to utf-8 std::strings. Take a look at http://docs.python.org/release/3.0.1/whatsnew/3.0.html#text-vs-data-instead-of-unicode-vs-8-bit . The idea is to separate data and text. std::strings can be used for data and for text. That site says "any attempt to mix text and data in Python 3.0 raises TypeError". So the swig + python3 behaviour is wrong here in my opinion. It should raise a TypeError. However, I can see that it's useful for your specific usecase to have automatic conversion to utf-8. So maybe this could be made optional somehow? A dirty way would be to use something like %define STD_STRINGS_UTF_8 and then check for that in the pystrings.swg typemaps. Maybe you can make a post about this on the swig mailing list as well to get a few more opinions. ---------------------------------------------------------------------- Comment By: Jim (jimfcarroll) Date: 2010-10-28 05:56 Message: nitrogenycs, I re-read your comment and I see what you're saying. You WANT the exception if someone passes a Unicode object to a method that takes a std::string. Acknowledging nitrogenycs concern, I can offer severl reasons that I think this should be taken: 1) Nitrogenycs case is broken in Python 3 and this patch makes Python2 and Python3 more consistent. 2) I can't imagine what a Python writer that passes a Unicode object to a C++ method that takes a std::string would want, other than that conversion. 3) The conversion operates the same way it used to if you pass a Python string rather than a Unicode object. In other words, no existing working code (i.e., code that doesn't throw a type exception which you call a C++ method from Python that takes a std::string) should break. An alternate would be to allow this to be optional somehow. Though, I cannot believe that the vast majority of people writing multi-lingual applications that use std::string wouldn't want this to be able to be used more naturally from Python 2 though - or at least have it consistent with Python 3. ---------------------------------------------------------------------- Comment By: Jim (jimfcarroll) Date: 2010-10-28 05:41 Message: s/this patch will now work/this patch will allow that to now work/1 ---------------------------------------------------------------------- Comment By: Jim (jimfcarroll) Date: 2010-10-28 05:40 Message: nitrogenycs, I do not believe you are correct in that this patch doesn't change existing behavior. It doesn't convert the input to UTF-8 UNLESS the input is a Unicode object. Currently, if you pass a Unicode object to a C++ parameter that takes a std::string you get an exception. With the patch, this will now work. Everything else should remain the same. That is, if you pass a python string (no matter what it contains) it operates as before. Note: [code] if (PyUnicode_Check(obj)) { ... } else PyString_AsStringAndSize(obj, &cstr, &len); [/code] Also, this is much more consistent with the Python 3 behavior, which handles Unicode strings ONLY and handles them the same way as the Patch allows Python 2.x to behave. ---------------------------------------------------------------------- Comment By: nitro (nitrogenycs) Date: 2010-10-27 20:04 Message: Hmm, this patch seems to try and convert the input argument to utf-8 implicitly. For my own projects I do not want such behaviour. Sometimes std::strings are used as containers for binary data. Passing a unicode object into such a function should fail. There might be more examples where implicit conversion to utf-8 is not desirable. E.g. somebody expecting their function to take a different encoding than utf-8. The patch will cause subtle breakage if it performs implicit encoding conversions. ---------------------------------------------------------------------- Comment By: Jim (jimfcarroll) Date: 2010-10-27 18:06 Message: It's in 1.6 (Sept 2000). http://docs.python.org/release/1.6/api/unicodeObjects.html It's not in 1.5. 1.5 was the first real release (or at least the oldest one referenced on the Python official site other than 0.9.2 which they only have through a third part reference). Do we need to check if it's been available for that long? ---------------------------------------------------------------------- Comment By: Jim (jimfcarroll) Date: 2010-10-27 17:54 Message: I'll do some digging. I'm using 2.6 but I know it was available in 2.4. Thanks ---------------------------------------------------------------------- Comment By: William Fulton (wsfulton) Date: 2010-10-27 17:25 Message: Your patch assumes PyUnicode_Check is available. Which version of Python did PyUnicode_Check first become available? Probably any use of it needs to be within a #ifdef PY_VERSION_HEX. If it has always been in Python, then we can probably apply this patch. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=351645&aid=3090894&group_id=1645 |