Not sure if this helps or not but we have done a few things for enabling UTF-8 support that you might want to try. First we created a sitecustomize.py file and in that file we added:
import sys
sys.setdefaultencoding("utf8")

Then after creating the Jython interpreter instance we do `imp.load("site");`

Hope that helps in some way,

-Brandon



On Fri, Dec 7, 2012 at 1:53 AM, Jeff Allen <ja...py@farowl.co.uk> wrote:
I tried something similar, and I can get my non-8-bit characters into the parser, and a result out:

        PythonInterpreter interpreter = new PythonInterpreter() {
            {
                cflags = new CompilerFlags(CompilerFlags.PyCF_SOURCE_IS_UTF8);
            }
        };

        String s = "\u0153";
        String line = "text = '" + s + "'";
        byte[] line8 = line.getBytes("UTF-8");
        InputStream stream8 = new ByteArrayInputStream(line8);

        interpreter.execfile(stream8);

        PyObject text = interpreter.get("text");
        String t = text.toString();


Everything is fine until the parser builds its parse tree, then the "\u0153" has been turned into a PyString holding a single \u00c5 (which is the first byte of the correct string in UTF-8). I don't know why, but it has to be a bug in tokenizing. that's probably all I've got for now.

Jeff Allen

On 07/12/2012 04:56, Akshay Kini wrote:
Hi Jeff and Chris,

I tried declaring UTF-16 in the coding declaration in Python (Jython?) and it throws exceptions saying it cannot recognize that.

I am able to get the code to work by passsing in Bytes using the execfile() method.

Code Snippet:
String japaneseString = "\'ジャパ寝せ\'";
String printJapanese = "print " + japaneseString;
ByteArrayInputStream byteArrayInputStream = new ByteArrayInputStream(printJapanese.getBytes());
interp.execfile(byteArrayInputStream);

Unfortunately, I _need_ to use the eval() function and that does not have a file input equivalent.

Can anyone help me here? Any suggestions?

I agree with Jeff, I am passing UTF-16 since they are Java Strings. Sorry for the confusion.
To summarize my problem with the above correction:
I need to pass in Either UTF-8 or UTF-16 "code" to the PythonInterpreter via the eval() function. How do I do that in Java?

Thanks to Jeff and Chris,
Regards,
KGA

On Fri, Dec 7, 2012 at 4:37 AM, Jeff Allen <ja...py@farowl.co.uk> wrote:
On 06/12/2012 16:59, Chris Clark wrote:
> On Thursday 2012-12-06 08:53 (-0800), Akshay Kini
> <kga.official@gmail.com>  wrote:
>> I was using Jython 2.2 and it defaulted to UTF-8 I suppose and when I
>> passed text to PythonInterpreter.exec() or .eval() it was encoded
>> correctly.
>>
>> Since I recently moved to Jython 2.5, I realise that Python has
>> changed it's encoding to Ascii by default.*I need this encoding
>> changed to UTF-8 before my first call to .eval("<some python code
>> here>").*
>>
>> Doing:
>> interpreter.exec("# coding=utf-8");
>> interpreter.exec("print " + japaneseString);
>>
>> *is not working.*
>>
>> I thought this is a Jython bug and filed
>> http://bugs.jython.org/issue1992 (It might be as well?)
>> But if you need a sample program, screenshots etc. you can refer to
>> the bug.
>>
> That doesn't look correct to me. Java strings are UTF16, there is no way
> you could be sending in utf8 "strings" from java. Unless you are sending
> in byte arrays (I checked the bug test case for 1992 and it is a String,
> not a byte array).
> ...
If they really are UTF-8, that is to say the char values in the string
are all 0..255, and these bytes encode characters as UTF-8, as they
might be if read from a stream using Jython's io, then I think you
address this when you create the interpreter like this:

         interpreter = new PythonInterpreter() {
             {
                 cflags = new
CompilerFlags(CompilerFlags.PyCF_SOURCE_IS_UTF8);
             }
         };

This is untried by me. But looking at the code, the protected cflags
member seems to be the thing that controls how the compiler reads the text.

If the japaneseString is actually a java.lang.String containing the
characters, then I think it would have worked as expected. Failing that,
recode it as UTF-8. :-(

Jeff Allen

------------------------------------------------------------------------------
LogMeIn Rescue: Anywhere, Anytime Remote support for IT. Free Trial
Remotely access PCs and mobile devices and provide instant support
Improve your efficiency, and focus on delivering more value-add services
Discover what IT Professionals Know. Rescue delivers
http://p.sf.net/sfu/logmein_12329d2d
_______________________________________________
Jython-users mailing list
Jython-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/jython-users


------------------------------------------------------------------------------
LogMeIn Rescue: Anywhere, Anytime Remote support for IT. Free Trial
Remotely access PCs and mobile devices and provide instant support
Improve your efficiency, and focus on delivering more value-add services
Discover what IT Professionals Know. Rescue delivers
http://p.sf.net/sfu/logmein_12329d2d
_______________________________________________
Jython-users mailing list
Jython-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/jython-users