From: Joey J. <jo...@ca...> - 2011-11-14 18:11:31
|
Hi Alan, Java 6u17 sys.version_info == (2, 5, 2, 'final', 0) Windows XP I am actually using the interpreter embedded in my Java application. I have verified in my debugger that I am passing the UTF-8 characters down to the interpreter without losing anything. If I access an attribute that I exposed from the Java level that returns a string, it indeeds returns things correctly as shown below. >>> p.notes '\u3053\u3093\u306b\u3061\u306f' >>> print p.notes こんにちは But if I import the following function it does not work. # -*- coding: UTF-8 -*- def testUTF8(): print 'こんにちは' I can open the above file in several different text editors that support UTF8 and they display it correctly. Am getting myself really confused. -----Original Message----- From: ala...@gm... [mailto:ala...@gm...] On Behalf Of Alan Kennedy Sent: Friday, November 11, 2011 10:40 AM To: Joey Jarosz Cc: jyt...@li... Subject: Re: [Jython-users] [jython-users] UTF-8 support for interactive input [Joey] > I cannot get Jython 2.5.2 to correctly handle UTF characters when executing > the following sort of lines. I can get Kanji to out correctly if I call one > of my Java routines that returns a string that contains Kanji – so it seems > like an “input” issue. Any ideas? > >>>>print 'test kanji - こんにちは' > > test kanji - ã “ã‚“ã «ã ¡ã ¯ A few questions we need to find the answers to before getting to the bottom of it. 1. By the look of the the above, you're specifying this Kanji string in the interactive interpreter. If so, - What java version are you using? - What (precise) jython version are you using? - What operating system are you using? - What is the input encoding is your console? - What is the output encoding of your console? The encoding of your console matters. If I enter your test string in an encoding independent manner, e.g. >>> s = u'test kanji \u3053\u3093\u306b\u3061\u306f' The I can't print it on my windows CP437 console >>> print s Traceback (most recent call last): File "<stdin>", line 1, in <module> File "C:\jython252\Lib\encodings\cp1252.py", line 12, in encode return codecs.charmap_encode(input,errors,encoding_table) UnicodeEncodeError: 'charmap' codec can't encode character u'\u3053' in position 11: character maps to <undefined> But I can print by escaping the contents so that they display in ascii-friendly encoding >>> print s.encode("ascii", "xmlcharrefreplace") test kanji こんにちは >>>> execfile('C:/temp/こんにちは/test.py') And this case is different again, because you're using kanji in a filename. Operating systems treat unicode filenames differently. (Although I'm guessing from your pathname that you're using windows). Try this >>> execfile (u'C:/temp/\u3053\u3093\u306b\u3061\u306f/test.py") Alan. |