2011/4/17 Alex Grönholm <alex.gronholm@nextday.fi>:
> 18.04.2011 03:22, Dan Stromberg kirjoitti:
>> How does one, in Jython 2.5.2, convert from a byte string to a text
>> string, and vice versa?
> The same way as you do in all other Pythons: 'blah'.decode(encoding).

I'm writing a deduplicating backup program that works well so far on CPython 2.x, CPython 3.x, PyPy 1.4.1 and recent PyPy trunk builds, but it tracebacks on Jython 2.5.2 with something that felt related to string semantics.

However, it appears to really be an issue of what type is returned by open(fn, 'r').read(length) and os.read(os.open(fn, O_RDONLY), length).

I made some progress by adding 'b' to my python open()'s, but when reading using os.read(), how does one convince jython to return a str instead of a unicode type?  It seems to mostly return a unicode object, but sometimes to return a str object - from the same open.  Jython on Linux doesn't appear to have an os.O_BINARY.

I've been using os.open+os.read, because it appears to return bytes on both CPython 2.x (including PyPy) and CPython 3.x, but that doesn't appear to be the case in Jython 2.5.2.

>> I'd like to support Jython in my opensource python2x3 module, but
>> Jython's string handling seems different enough from that of other
>> Pythons that I'm not clear on how to do so.  I found an article saying
>> that if you do a binary read in Jython, you'll get a binary str that
>> just keeps the high bytes zeroed
> Link? Sounds a little odd.

Finding the original link I read is proving somewhat time consuming, but here's something a bit similar that sounds more promising than what I read before.  Apparently str behavior changed in jython 2.5, so perhaps the original link I read was out of date:


Prior to the 2.5.0 release of Jython, there was only one string type. The string type in Jython supported full two-byte Unicode characters and all functions contained in the string module are Unicode-aware. If the u’’ string modifier was specified, it was ignored by Jython. Since the release of 2.5.0, strings in Jython are treated just like those in CPython, so the same rules will apply to both implementations. It is also worth noting that Jython uses character properties from the Java platform. Therefore properties such as isupper and islower, which we will discuss later in the section, are based upon the Java properties.

>> , but I didn't notice anything about
>> converting from one (always zero high bytes to nonzero high bytes, for
>> EG) to the other.
>> Python2x3's at http://stromberg.dnsalias.org/svn/python2x3/trunk - and
>> I'm including a copy at the bottom of this message.
> The worst problem in writing cross-version code is entering unicode/byte
> literals.
> Does Python2x3 solve this somehow?

python2x3.string_to_binary() addresses this to some extent.  You give it a str literal (or other str), and it converts it to bytes on 3.x (assuming latin-1), and leaves it as str on 2.x.  It's more typing than adding a b prefix, but it seems to work fine.