From: Stuart D. G. <st...@bm...> - 2007-10-16 20:39:53
|
>> The only change I need to make is: >> Index: src/org/python/core/PyString.java >> =================================================================== >> --- src/org/python/core/PyString.java (revision 3607) >> +++ src/org/python/core/PyString.java (working copy) >> @@ -2027,10 +2027,7 @@ >> * the low-order bits of its corresponding char. >> */ >> public static byte[] to_bytes(String s) { >> - int len = s.length(); >> - byte[] b = new byte[len]; >> - s.getBytes(0, len, b, 0); >> - return b; >> + return s.getBytes(); >> } > The only trouble here is that getBytes() uses the default char set for > the platform -- unlike the deprecated method we are currently using > which uses "raw" bytes -- with a little work we could probably use the > getBytes() method that takes an encoding -- anybody know how that > could be done effectively? As of Java 1.4, create a java.nio.charset.Charset implementation. We could call the Charset "LSB" for returning the least significant 8 bits of each character when encoding (which corresponds to the deprecated behaviour). Why is the deprecated behaviour needed? I am assuming it is to reuse the java String class for python 8-bit strings - without writing special 8-bit "String8" classes. Thus when converting a python byte string to a python unicode string, we first have to "encode" to a byte[] using LSB, then decode using the specified charset/encoding to String. When Converting a python unicode string to a byte string, we first have to encode to byte[] with the specified charset/encoding, then "decode" using the LSB charset again. The "unicode" Strings used to represent python byte strings would compare, indexOf, substring, etc correctly (because the MSB is always zero), but things like "toUpperCase" might not be exactly correct, and would need to be converted to real unicode first. I can't think right off how to optimize this. You can't keep python byte strings in any encoding other than LSB lest byte strings not compare properly (at which point you might as well write the String8 classes). -- Stuart D. Gathman <st...@bm...> Business Management Systems Inc. Phone: 703 591-0911 Fax: 703 591-6154 "Confutatis maledictis, flammis acribus addictis" - background song for a Microsoft sponsored "Where do you want to go from here?" commercial. |