Re: [Jython-users] Unicode

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

[Paul Prescod]

>Brian Quinlan reported this strange result to me:
>
>Jython 2.1 on java1.3.1_02 (JIT: null)
>Type "copyright", "credits" or "license" for more information.
>>>> x=u'\x81'
>>>> unicode(x)
>Traceback (innermost last):
>  File "<console>", line 1, in ?
>UnicodeError: ascii decoding error: ordinal not in range(128)

In CPython the unicode() builtin function will either.

- decodes a byte string 
- returns a unicode argument unmodified.

In jython there is no difference between the strings u"\x81" and "\x81".
So we can only do one of these two things.

>If Jython is going to unify 8-bit strings and Unicode strings (as Java
>does) 

IMO, there is very little unification between java byte arrays and java
strings and the methods that existed initially to convert between them
have since been deprecated. That is A Good Thing because it clearly
separates the obvious use of bytes and characters.

>then it should probably treat them all as Unicode strings, not as
>8-bit.

Then unicode() would be a no-op. It would just return the argument
without doing anything.

It is unfortunate that there are such differences between CPython and
Jython, but it is a natural consequence of our design where we decided
to work without a byte string type.

regards,
finn