[Jython-users] Unicode difference between Python and Jython (2.2a1)

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

Hello all,

In our project we are implementing a test automation framework that runs 
both on Python and Jython. We recently found some differences in the 
unicode syntax with these two platforms. Following examples ought to 
demonstrate the issue pretty well.

Jython 2.2a1 on java1.5.0_04 (JIT: null)
Type "copyright", "credits" or "license" for more information.
 >>> u = u'Hyv\u00E4'
 >>> u
'Hyv\xE4'
 >>> type(u)
<type 'str'>
 >>> unicode(u)
Traceback (innermost last):
File "<console>", line 1, in ? UnicodeError: ascii decoding error: 
ordinal not in range(128)
 >>> str(u)
'Hyv\xE4'

Python 2.4.3 (#69, Mar 29 2006, 17:35:34) [MSC v.1310 32 bit (Intel)] on 
win32
Type "help", "copyright", "credits" or "license" for more information.
 >>> u = u'Hyv\u00E4'
 >>> u
u'Hyv\xe4'
 >>> type(u)
<type 'unicode'>
 >>> unicode(u)
u'Hyv\xe4'
 >>> str(u)
Traceback (most recent call last):
   File "<stdin>", line 1, in ? UnicodeEncodeError: 'ascii' codec can't 
encode character u'\xe4' in position 3: ordinal not in range(128)

We want to convert all the test data we get from users into unicode 
internally so this issue causes some problems. As a workaround we are 
planning to use following utility method.

def unic(text):
     if os.name == 'java':
         return str(text)
     else:
         return unicode(text)

Is this a Jython bug (we submitted bug [1] anyway) or are we doing 
something wrong? Furthermore, do you think our workaround utility really 
works?

[1] 
http://sourceforge.net/tracker/index.php?func=detail&aid=1538001&group_id=12867&atid=112867

Cheers,
     .peke