From: Richie H. <ri...@en...> - 2005-10-06 16:28:59
|
Hello jython-dev, I've just been bitten by what to me looks like a misfeature in Jython. I have a Python client and a Jython server on the same machine, talking over a localhost socket. I'm using repr() and eval() as my wire protocol (because there's no possibility of anyone else being able to connect to this socket; though I'm sufficiently paranoid that I'm pre-parsing the repr()'d packets anyway to ensure that they're safe). My problem is this: when repr()'ing a string, Jython only add the u'' if the string contains char values > 255. So when I send a list of strings over the socket from my Jython server to my Python client, some of them come out as plain strings and others come out as unicode strings. This had me swearing for several hours yesterday (I've calmed down now 8-) because it never occurred to me that a list that went in one end homogeneous could come out of the other end heterogeneous. This looks to me like a terrible violation of the Principle of Least Surprise. Everything works perfectly until a character over 255 comes along and suddenly your code breaks with an encoding error. It also makes Jython significantly different from CPython. Any comments? Can I consider this a bug and enter a bug report / patch? Or am I simply abusing repr()? -- Richie Hindle ri...@en... |
From: Frank W. <fwi...@gm...> - 2005-10-06 18:09:51
|
> My problem is this: when repr()'ing a string, Jython only add the u'' if > the string contains char values > 255. I'm afraid this is the behavior for Jython 2.1. Jython 2.2 should behave better, as long as the strings are initialized as type unicode. So, in 2.1u"abc" would translate to "abc", now in 2.2 u"abc" stays u"abc". The reason for this odd behavior is: in Jython *all* strings are represente= d by unicode strings internally, and the Python unicode support is just a compatibility wrapper. In 2.1 this wrapper merely consists of a check for characters values > 255 in the string. In 2.2 there is a separate PyUnicode type that (hopefully) provides better compatibility. Some day (at least according to http://www.python.org/peps/pep-3000.html) Python will come around to a more jythony point of view :) and this problem should go away. Then again, the time-frame for Python3000 is somewhat undefined last I heard (much less the time-frame for a presumed Jython3000)= . -Frank |
From: Richie H. <ri...@en...> - 2005-10-06 19:18:32
|
Hi Frank, Thanks for the response: > I'm afraid this is the behavior for Jython 2.1. Jython 2.2 should behave > better, as long as the strings are initialized as type unicode. So, in > 2.1 u"abc" would translate to "abc", now in 2.2 u"abc" stays u"abc". It doesn't seem to work like that in 2.2a1: Jython 2.2a1 on java1.4.2_08 (JIT: null) Type "copyright", "credits" or "license" for more information. >>> x = u'unicode string' >>> print repr(x) 'unicode string' >>> and looking at the code for PyString.encode_UnicodeEscape, it's clear that it only adds the u'' when there's a character > 255 in there. Nothing seems to be different in CVS HEAD, unless I'm missing something...? (Besides, my strings are being read over a socket, so they're all created the same way.) > The reason for this odd behavior is: in Jython *all* strings are represented > by unicode strings internally I know - that's why I was expecting repr() to always put the u'' there. > In 2.2 there is a separate PyUnicode > type that (hopefully) provides better compatibility. How does that square with the fact that all Java strings are Unicode? Is there any documentation that discusses the relationship between Jython, Java and Unicode? Some sort of Best Practice document? -- Richie Hindle ri...@en... |