On 9/18/07, Gert-Jan <jython@...> wrote:
> > This is "working" because __str__ on unicode is just calling getBytes
> > on its underlying java.lang.String object. That encodes the String
> > using JVM's default charset, which I'm guessing is utf-8 for you as
> > well, and prints fine on your terminal. It should actually encode
> > using the default encoding on your install of Jython, which unless
> > you've changed the defaults is ascii.
> >> Unicodestring converted to UTF8:
> >> =CE(c)=CE=B4=E2=CF=CE=B5
> Actually, I encountered this behaviour when trying to write the unicode
> object to utf8 and utf16-encoded textfiles using jython2.2. So I used
> the file.write() method instead of a print statement.
file.write uses the same code to get the bytes out of a str object as
print, so it's going to double encode in the same fashion.
> I tried the same program on my XP-box at the office today (also using
> jython2.2) and it seemed to produce correct utf8 and utf16 files. I'll
> check these files again later this week.
I'll bet Java on your XP box uses Windows-1252 as its default charset,
so encoding to that doesn't mangle things.
> > I'll fix this for Jython 2.2.1, but to get it to output like you
> > expect with it working you'll either have to set your default encoding
> > in Jython's site.py or encode your unicode objects to the JVM's
> > default charset.
> Does this also mean I won't be able to create utf8 and utf16 encoded
> textfiles from the same jython installation?
Nope, I was trying to explain how you'd need to line things up to use
the defaults. You can(and probably should) specify the encoding and
decoding you want so you're not at the mercy of the JVM and Python
defaults. That can be anything the codecs module supports.
> Thanks for this elaborate explanation!
Hope I didn't veer too far off into the realm of uninteresting
minutiae; I wrote my response as I was figuring out what was going