Re: [Jython-users] [jython-users] UTF-8 support for interactive input

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

[Joey]
> I cannot get Jython 2.5.2 to correctly handle UTF characters when executing
> the following sort of lines. I can get Kanji to out correctly if I call one
> of my Java routines that returns a string that contains Kanji – so it seems
> like an “input” issue. Any ideas?
>
>>>>print 'test kanji - こんにちは'
>
> test kanji - ã “ã‚“ã «ã ¡ã ¯

A few questions we need to find the answers to before getting to the
bottom of it.

1. By the look of the the above, you're specifying this Kanji string
in the interactive interpreter. If so,
 - What java version are you using?
 - What (precise) jython version are you using?
 - What operating system are you using?
 - What is the input encoding is your console?
 - What is the output encoding of your console?

The encoding of your console matters.

If I enter your test string in an encoding independent manner, e.g.

>>> s = u'test kanji \u3053\u3093\u306b\u3061\u306f'

The I can't print it on my windows CP437 console

>>> print s
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\jython252\Lib\encodings\cp1252.py", line 12, in encode
    return codecs.charmap_encode(input,errors,encoding_table)
UnicodeEncodeError: 'charmap' codec can't encode character u'\u3053'
in position 11: character maps to <undefined>

But I can print by escaping the contents so that they display in
ascii-friendly encoding

>>> print s.encode("ascii", "xmlcharrefreplace")
test kanji &#12371;&#12435;&#12395;&#12385;&#12399;

>>>> execfile('C:/temp/こんにちは/test.py')

And this case is different again, because you're using kanji in a
filename. Operating systems treat unicode filenames differently.
(Although I'm guessing from your pathname that you're using windows).

Try this

>>> execfile (u'C:/temp/\u3053\u3093\u306b\u3061\u306f/test.py")

Alan.