From: Charlie G. <cha...@gm...> - 2006-07-12 17:15:48
|
Hi Kent, Jython uses the codecs module when you call encode. utf-8, ascii, latin1 and a couple others are built in. Beyond that, the items in the encoding package are used. It looks like shift-jis and gb2312 were added to Python in 2.4. Unfortunately they're implemented in C so it's not as simple as copying over a couple python modules. I can think of three options: 1. Port the C implementations of the codecs to Java and expose them to Jython. 2. Write a wrapper around the Java character encoding stuff using the codecs methods 3. Just use Java's built in stuff like you suggested. 1 isn't too bad. I just did that for a couple C codec implementations on the 2.3 branch in an afternoon. You'd just have to rewrite the C in Java and then register a new function with codecs.register that returned your codecs. I can give you some help with that if you go that route. 2 would be really cool since it means all of the Java encodings would just work for Jython like you expected, but I'm not sure it's possible. It would require all of the Python expectations from codecs to be satisfied by calling the Java encoding classes. I took a quick look at the Java character encoding classes and it looks like they're roughly amenable to being driven from codecs, but I worry that something sticky would present itself halfway through the implementation. You already know about 3. I guess it boils down to how much you want to avoid using InputStreamReader. Be sure to let us know if you go with 1 or 2. Someone would have to port those codecs for Jython 2.4 anyway and having 2 would be a feather in Jython's cap. Charlie On 7/12/06, Kent Johnson <ke...@td...> wrote: > I have always assumed that jython supported all the character encodings > supported by the underlying Java VM but that doesn't seem to be the > case. In particular Shift-JIS and gb2312 don't seem to be supported in > Jython. For example, gb2313 works with String.getBytes() but not with > str.encode(): > >>> import java > >>> s=java.lang.String('a') > >>> s.getBytes('gb2312') > array([97], byte) > >>> u'a'.encode('gb2312') > Traceback (innermost last): > File "<console>", line 1, in ? > LookupError: unknown encoding gb2312 > > Am I missing something? Is the workaround to use native Java methods? My > particular use case is to read a file in sjis or gb2312 encoding, do I > have to use an InputStreamReader to do this? > > Thanks, > Kent > > > > ------------------------------------------------------------------------- > Using Tomcat but need to do more? Need to support web services, security? > Get stuff done quickly with pre-integrated technology to make your job easier > Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo > http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 > _______________________________________________ > Jython-users mailing list > Jyt...@li... > https://lists.sourceforge.net/lists/listinfo/jython-users > |