[Jython-dev] Cyrillic string

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

Dear, developers!

I'm trying to use Jython (version 2.1) to execute programm on Python
from Java. I have some problem with cyrillic string.

I run the programm as follows:

import org.python.util.PythonInterpreter;
import org.python.core.*;

public class TestPy {
  public static void main( String args[] ) {
    System.out.println("Start");

    PythonInterpreter interp =3D new PythonInterpreter();
    interp.exec("str1=3D'=FC=D4=CF =D3=D4=D2=CF=CB=C11. This is string1  =
- it is bad'");
    interp.set("str2", new PyString("=FC=D4=CF =D3=D4=D2=CF=CB=C12. This =
is string2  - it is
OK"));
    interp.exec("print str1\nprint str2");

    System.out.println("Stop");
  }
}

I get the following result:

Start
-B> AB@>:01. This is string1  - it is bad
=FC=D4=CF =D3=D4=D2=CF=CB=C12. This is string2  - it is OK
Stop

str1 has wrong value in Python.
str2 has right value in Python.

I have done some research and found out what causes the error.
It happens because in classes
\org\python\core\parser.java
\org\python\core\Py.java

Jython uses the class
java.io.StringBufferInputStream(String s)

This class is deprecated and does not properly convert characters into
bytes.
I replace this class with the following
java.io.ByteArrayInputStream(byte[] s.getBytes())
and I get the correct result.

The Jython 2.2 alpha 0 has the same problem.

Cyrillic users of Jython (including myself) would really appreciate
it if the class
java.io.StringBufferInputStream(String s)
were replaced with
java.io.ByteArrayInputStream(byte[] s.getBytes())

It will provide for the correct processing of non-latin string.

Sincerely, Pavel
pa...@ui...