Thanks for the response:
> I'm afraid this is the behavior for Jython 2.1. Jython 2.2 should behave
> better, as long as the strings are initialized as type unicode. So, in
> 2.1 u"abc" would translate to "abc", now in 2.2 u"abc" stays u"abc".
It doesn't seem to work like that in 2.2a1:
Jython 2.2a1 on java1.4.2_08 (JIT: null)
Type "copyright", "credits" or "license" for more information.
>>> x = u'unicode string'
>>> print repr(x)
and looking at the code for PyString.encode_UnicodeEscape, it's clear that
it only adds the u'' when there's a character > 255 in there. Nothing
seems to be different in CVS HEAD, unless I'm missing something...?
(Besides, my strings are being read over a socket, so they're all created
the same way.)
> The reason for this odd behavior is: in Jython *all* strings are represented
> by unicode strings internally
I know - that's why I was expecting repr() to always put the u'' there.
> In 2.2 there is a separate PyUnicode
> type that (hopefully) provides better compatibility.
How does that square with the fact that all Java strings are Unicode? Is
there any documentation that discusses the relationship between Jython,
Java and Unicode? Some sort of Best Practice document?