From: <bc...@wo...> - 2002-01-10 19:26:37
|
[Paul Prescod] >Brian Quinlan reported this strange result to me: > >Jython 2.1 on java1.3.1_02 (JIT: null) >Type "copyright", "credits" or "license" for more information. >>>> x=u'\x81' >>>> unicode(x) >Traceback (innermost last): > File "<console>", line 1, in ? >UnicodeError: ascii decoding error: ordinal not in range(128) In CPython the unicode() builtin function will either. - decodes a byte string - returns a unicode argument unmodified. In jython there is no difference between the strings u"\x81" and "\x81". So we can only do one of these two things. >If Jython is going to unify 8-bit strings and Unicode strings (as Java >does) IMO, there is very little unification between java byte arrays and java strings and the methods that existed initially to convert between them have since been deprecated. That is A Good Thing because it clearly separates the obvious use of bytes and characters. >then it should probably treat them all as Unicode strings, not as >8-bit. Then unicode() would be a no-op. It would just return the argument without doing anything. It is unfortunate that there are such differences between CPython and Jython, but it is a natural consequence of our design where we decided to work without a byte string type. regards, finn |