Re: [Jython-users] Unicode

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

[dman]
> On Thu, Jan 10, 2002 at 04:16:14PM -0800, Brian Quinlan wrote:
> | Paul Prescod wrote:
>
> | > Is that a problem? If the user specifies an encoding then you could
> | > decode. If they don't, I would suggest to just do a no-op. Under what
> | > circumstances would the current exception be more helpful?
> |
> | Because you are specifically looking for the exception to see if the
> | string can be converted to a Unicode object using the default encoding?
>
> Is it supposed to be an error when trying to convert a unicode object
> to a unicode object?  I don't think so.  I can convert an int to an
> int.
>
> >>> x =  u"\u20ac"
> >>> x = unicode( u"\u20ac" )
> Traceback (innermost last):
>   File "<console>", line 1, in ?
> UnicodeError: ascii decoding error: ordinal not in range(128)
> >>>
>
> (I used assignment so I won't get the error of printing non-ascii
> characters on an ascii display)
>

I start to think that Paul Prescod is right here. In CPython

Python 2.1 (#15, Apr 16 2001, 18:25:49) [MSC 32 bit (Intel)] on win32
Type "copyright", "credits" or "license" for more information.
>>> unicode(u"\xe9")
u'\xe9'
>>>

while

Jython 2.1 on java1.3.0 (JIT: null)
Type "copyright", "credits" or "license" for more information.
>>> unicode(u"\xe9")
Traceback (innermost last):
  File "<console>", line 1, in ?
UnicodeError: ascii decoding error: ordinal not in range(128)
>>>

The question is: it is better to fail when CPython does not fails or not to
fail when
CPython fails and succeed when CPython succeeds. I'm maybe missing something
subtle but I prefer the  latter and so unicode without an encoding should be a
nop.

regards.