[Jeff]
> Would it be fair to say that the character codes in a Jython
> PyString.string member should always be in the range 0..255 inclusive?

If the string contains an encoded string, i.e. a string that has been encoded into a series of bytes for storage or some other form of IO, then yes, the values will all be in the range 0..255.

You may find this email that I wrote back in the WSGI days to be useful.

http://mail.python.org/pipermail/web-sig/2004-September/000858.html

[Jeff]
> Apart from having to forego Java's lovely String methods, we wish we'd
> used an array of bytes implementation for PyString: right?

Right.

Jython's use of a java.lang.String to contain bytes is a hangover from emulating cpython 1.x and 2.x, where strings have a dual nature and can contain characters or bytes.

Since this was a great source of confusion for users, cpython 3.x did away with the dual nature and changed to have separate string and bytes types, which can only be transformed into the other with an encode or decode operation.

http://docs.python.org/3.0/whatsnew/3.0.html#text-vs-data-instead-of-unicode-vs-8-bit

So when jython moves to 3.x, we'll have to do the same.

Alan.