From: Dan S. <drs...@gm...> - 2011-04-18 00:22:55
|
How does one, in Jython 2.5.2, convert from a byte string to a text string, and vice versa? I'd like to support Jython in my opensource python2x3 module, but Jython's string handling seems different enough from that of other Pythons that I'm not clear on how to do so. I found an article saying that if you do a binary read in Jython, you'll get a binary str that just keeps the high bytes zeroed, but I didn't notice anything about converting from one (always zero high bytes to nonzero high bytes, for EG) to the other. Python2x3's at http://stromberg.dnsalias.org/svn/python2x3/trunk - and I'm including a copy at the bottom of this message. TIA! #!/usr/bin/python '''Provides code and data to facilitate writing python code that runs on 2.x and 3.x, including pypy''' # I'm afraid pylint won't like this one... import sys import platform def python_major(result = int(platform.python_version_tuple()[0])): '''Return an integer corresponding to the major version # of the python interpreter we're running on''' return result if python_major() == 2: empty_bytes = '' null_byte = '\0' bytes_type = str def intlist_to_binary(intlist): '''Convert a list of integers to a binary string type''' return ''.join(chr(byte) for byte in intlist) def string_to_binary(string): '''Convert a text string to a binary string type''' return string def binary_to_intlist(binary): '''Convert a binary string to a list of integers''' return [ ord(character) for character in binary ] def binary_to_string(binary): '''Convert a binary string to a text string''' return binary elif python_major() == 3: empty_bytes = ''.encode('utf-8') null_byte = bytes([ 0 ]) bytes_type = bytes def intlist_to_binary(intlist): '''Convert a list of integers to a binary string type''' return bytes(intlist) def string_to_binary(string): '''Convert a text string (or binary string type) to a binary string type''' if isinstance(string, str): return string.encode('latin-1') else: return string def binary_to_intlist(binary): '''Convert a binary string to a list of integers''' return binary def binary_to_string(binary): '''Convert a binary string to a text string''' return binary.decode('latin-1') else: sys.stderr.write('%s: Python < 2 or > 3 not (yet) supported\n' % sys.argv[0]) sys.exit(1) |
From: Alex G. <ale...@ne...> - 2011-04-18 06:40:46
|
18.04.2011 03:22, Dan Stromberg kirjoitti: > How does one, in Jython 2.5.2, convert from a byte string to a text > string, and vice versa? The same way as you do in all other Pythons: 'blah'.decode(encoding). > I'd like to support Jython in my opensource python2x3 module, but > Jython's string handling seems different enough from that of other > Pythons that I'm not clear on how to do so. I found an article saying > that if you do a binary read in Jython, you'll get a binary str that > just keeps the high bytes zeroed Link? Sounds a little odd. > , but I didn't notice anything about > converting from one (always zero high bytes to nonzero high bytes, for > EG) to the other. > > Python2x3's at http://stromberg.dnsalias.org/svn/python2x3/trunk - and > I'm including a copy at the bottom of this message. The worst problem in writing cross-version code is entering unicode/byte literals. Does Python2x3 solve this somehow? > TIA! > > #!/usr/bin/python > > '''Provides code and data to facilitate writing python code that runs > on 2.x and 3.x, including pypy''' > > # I'm afraid pylint won't like this one... > > import sys > import platform > > def python_major(result = int(platform.python_version_tuple()[0])): > '''Return an integer corresponding to the major version # of > the python interpreter we're running on''' > return result > > if python_major() == 2: > empty_bytes = '' > null_byte = '\0' > bytes_type = str > def intlist_to_binary(intlist): > '''Convert a list of integers to a binary string type''' > return ''.join(chr(byte) for byte in intlist) > def string_to_binary(string): > '''Convert a text string to a binary string type''' > return string > def binary_to_intlist(binary): > '''Convert a binary string to a list of integers''' > return [ ord(character) for character in binary ] > def binary_to_string(binary): > '''Convert a binary string to a text string''' > return binary > elif python_major() == 3: > empty_bytes = ''.encode('utf-8') > null_byte = bytes([ 0 ]) > bytes_type = bytes > def intlist_to_binary(intlist): > '''Convert a list of integers to a binary string type''' > return bytes(intlist) > def string_to_binary(string): > '''Convert a text string (or binary string type) to a > binary string type''' > if isinstance(string, str): > return string.encode('latin-1') > else: > return string > def binary_to_intlist(binary): > '''Convert a binary string to a list of integers''' > return binary > def binary_to_string(binary): > '''Convert a binary string to a text string''' > return binary.decode('latin-1') > else: > sys.stderr.write('%s: Python< 2 or> 3 not (yet) supported\n' > % sys.argv[0]) > sys.exit(1) > > ------------------------------------------------------------------------------ > Benefiting from Server Virtualization: Beyond Initial Workload > Consolidation -- Increasing the use of server virtualization is a top > priority.Virtualization can reduce costs, simplify management, and improve > application availability and disaster protection. Learn more about boosting > the value of server virtualization. http://p.sf.net/sfu/vmware-sfdev2dev > _______________________________________________ > Jython-users mailing list > Jyt...@li... > https://lists.sourceforge.net/lists/listinfo/jython-users |
From: Dan S. <drs...@gm...> - 2011-04-18 18:02:07
|
2011/4/17 Alex Grönholm <ale...@ne...>: > 18.04.2011 03:22, Dan Stromberg kirjoitti: >> How does one, in Jython 2.5.2, convert from a byte string to a text >> string, and vice versa? > The same way as you do in all other Pythons: 'blah'.decode(encoding). I'm writing a deduplicating backup program that works well so far on CPython 2.x, CPython 3.x, PyPy 1.4.1 and recent PyPy trunk builds, but it tracebacks on Jython 2.5.2 with something that felt related to string semantics. However, it appears to really be an issue of what type is returned by open(fn, 'r').read(length) and os.read(os.open(fn, O_RDONLY), length). I made some progress by adding 'b' to my python open()'s, but when reading using os.read(), how does one convince jython to return a str instead of a unicode type? It seems to mostly return a unicode object, but sometimes to return a str object - from the same open. Jython on Linux doesn't appear to have an os.O_BINARY. I've been using os.open+os.read, because it appears to return bytes on both CPython 2.x (including PyPy) and CPython 3.x, but that doesn't appear to be the case in Jython 2.5.2. >> I'd like to support Jython in my opensource python2x3 module, but >> Jython's string handling seems different enough from that of other >> Pythons that I'm not clear on how to do so. I found an article saying >> that if you do a binary read in Jython, you'll get a binary str that >> just keeps the high bytes zeroed > Link? Sounds a little odd. Finding the original link I read is proving somewhat time consuming, but here's something a bit similar that sounds more promising than what I read before. Apparently str behavior changed in jython 2.5, so perhaps the original link I read was out of date: http://jythonpodcast.hostjava.net/jythonbook/chapter2.html Prior to the 2.5.0 release of Jython, there was only one string type. The string type in Jython supported full two-byte Unicode characters and all functions contained in the string module are Unicode-aware. If the u’’ string modifier was specified, it was ignored by Jython. Since the release of 2.5.0, strings in Jython are treated just like those in CPython, so the same rules will apply to both implementations. It is also worth noting that Jython uses character properties from the Java platform. Therefore properties such as isupper and islower, which we will discuss later in the section, are based upon the Java properties. >> , but I didn't notice anything about >> converting from one (always zero high bytes to nonzero high bytes, for >> EG) to the other. >> >> Python2x3's at http://stromberg.dnsalias.org/svn/python2x3/trunk - and >> I'm including a copy at the bottom of this message. > The worst problem in writing cross-version code is entering unicode/byte > literals. > Does Python2x3 solve this somehow? python2x3.string_to_binary() addresses this to some extent. You give it a str literal (or other str), and it converts it to bytes on 3.x (assuming latin-1), and leaves it as str on 2.x. It's more typing than adding a b prefix, but it seems to work fine. |
From: Dan S. <drs...@gm...> - 2011-04-18 18:55:24
|
2011/4/18 Dan Stromberg <drs...@gm...>: > > 2011/4/17 Alex Grönholm <ale...@ne...>: >> 18.04.2011 03:22, Dan Stromberg kirjoitti: >>> How does one, in Jython 2.5.2, convert from a byte string to a text >>> string, and vice versa? >> The same way as you do in all other Pythons: 'blah'.decode(encoding). > > I'm writing a deduplicating backup program that works well so far on CPython > 2.x, CPython 3.x, PyPy 1.4.1 and recent PyPy trunk builds, but it tracebacks > on Jython 2.5.2 with something that felt related to string semantics. > > However, it appears to really be an issue of what type is returned by > open(fn, 'r').read(length) and os.read(os.open(fn, O_RDONLY), length). > > I made some progress by adding 'b' to my python open()'s, but when reading > using os.read(), how does one convince jython to return a str instead of a > unicode type? It seems to mostly return a unicode object, but sometimes to > return a str object - from the same open. Jython on Linux doesn't appear to > have an os.O_BINARY. > > I've been using os.open+os.read, because it appears to return bytes on both > CPython 2.x (including PyPy) and CPython 3.x, but that doesn't appear to be > the case in Jython 2.5.2. Supporting detail follows - I believe Jython isn't conformant on this, but perhaps the standard (if such we can call it at this time) is flexible on what is returned? $ for i in /usr/local/*/bin/python /usr/local/pypy-1.4.1/bin/pypy /usr/local/pypy-trunk-2011-04-10/bin/pypy /usr/local/jython-2.5.2/bin/jython; do echo $i $($i -c 'import os; print(type(os.read(os.open("/etc/protocols", os.O_RDONLY), 1)))'); done /usr/local/cpython-2.5/bin/python <type 'str'> /usr/local/cpython-2.6/bin/python <type 'str'> /usr/local/cpython-2.7/bin/python <type 'str'> /usr/local/cpython-3.0/bin/python <class 'bytes'> /usr/local/cpython-3.1/bin/python <class 'bytes'> /usr/local/cpython-3.2/bin/python <class 'bytes'> /usr/local/pypy-1.4.1/bin/pypy <type 'str'> /usr/local/pypy-trunk-2011-04-10/bin/pypy <type 'str'> /usr/local/jython-2.5.2/bin/jython <type 'unicode'> |
From: Chris C. <Chr...@in...> - 2011-04-18 19:19:54
|
Dan Stromberg wrote: > I believe Jython isn't conformant on this, > but perhaps the standard (if such we can call it at this time) is > flexible on what is returned? > > $ for i in /usr/local/*/bin/python /usr/local/pypy-1.4.1/bin/pypy > /usr/local/pypy-trunk-2011-04-10/bin/pypy > /usr/local/jython-2.5.2/bin/jython; do echo $i $($i -c 'import os; > print(type(os.read(os.open("/etc/protocols", os.O_RDONLY), 1)))'); > done > /usr/local/cpython-2.5/bin/python <type 'str'> > /usr/local/cpython-2.6/bin/python <type 'str'> > /usr/local/cpython-2.7/bin/python <type 'str'> > /usr/local/cpython-3.0/bin/python <class 'bytes'> > /usr/local/cpython-3.1/bin/python <class 'bytes'> > /usr/local/cpython-3.2/bin/python <class 'bytes'> > /usr/local/pypy-1.4.1/bin/pypy <type 'str'> > /usr/local/pypy-trunk-2011-04-10/bin/pypy <type 'str'> > /usr/local/jython-2.5.2/bin/jython <type 'unicode'> My 2 cents, I suspect most focus has been on the builtin "open()" (like the posix fopen function). I would suggest sticking with open() rather than os.open() where the Jython 2.5.x results are like Cpython. As has already been discussed using 'rb' is usually the ideal thing to do unless you explicitly use the codec module. Out of idle curiosity what are you using os.open for that open can't handle? Chris |
From: Dan S. <drs...@gm...> - 2011-04-18 21:11:47
|
On Mon, Apr 18, 2011 at 12:19 PM, Chris Clark <Chr...@in...> wrote: > Dan Stromberg wrote: >> >> I believe Jython isn't conformant on this, >> but perhaps the standard (if such we can call it at this time) is >> flexible on what is returned? >> >> $ for i in /usr/local/*/bin/python /usr/local/pypy-1.4.1/bin/pypy >> /usr/local/pypy-trunk-2011-04-10/bin/pypy >> /usr/local/jython-2.5.2/bin/jython; do echo $i $($i -c 'import os; >> print(type(os.read(os.open("/etc/protocols", os.O_RDONLY), 1)))'); >> done >> /usr/local/cpython-2.5/bin/python <type 'str'> >> /usr/local/cpython-2.6/bin/python <type 'str'> >> /usr/local/cpython-2.7/bin/python <type 'str'> >> /usr/local/cpython-3.0/bin/python <class 'bytes'> >> /usr/local/cpython-3.1/bin/python <class 'bytes'> >> /usr/local/cpython-3.2/bin/python <class 'bytes'> >> /usr/local/pypy-1.4.1/bin/pypy <type 'str'> >> /usr/local/pypy-trunk-2011-04-10/bin/pypy <type 'str'> >> /usr/local/jython-2.5.2/bin/jython <type 'unicode'> > > My 2 cents, I suspect most focus has been on the builtin "open()" (like the > posix fopen function). I would suggest sticking with open() rather than > os.open() where the Jython 2.5.x results are like Cpython. As has already > been discussed using 'rb' is usually the ideal thing to do unless you > explicitly use the codec module. I may special-case my code to use os.open on CPython and PyPy, and open with 'b' on Jython. > Out of idle curiosity what are you using os.open for that open can't handle? os.open always returns bytes on CPython 2.x and CPython 3.x. open returns bytes on 2.x and unicode on 3.x. Until I started experimenting with Jython, it seemed that os.open was a way to get some easy portability, since in my current project I'm mostly interested in byte strings rather than text - a backup program shouldn't break if a file doesn't fit its encoding assumptions. Also, use of os.open just seems a more direct route to use of os.fstat as a security measure, to deter symlink races - though I've never found a way of doing fstat in java short of JNI, JNA or similar, despite some asking around; it seems like java is mostly used for user applications, and not systems programming. Or maybe I'm overlooking something - how do Java developers deal with the possibility of symlink races? And os.open is a little faster for "transfer lots of blocks" sorts of operations. But that's a pretty minor concern. FWIW, it appears that Jython 2.5.2's os.read and os.write are taking extra steps to convert in an unexpected way... Would the corresponding tiny patch be well received? Though I'm now realizing, that's not as significant as the lack of an os.fstat for my current application. It seems like os.fstat is in Jython's documentation, but upon import os, I don't see it. There appears to be a few mentions of JNA in the current code... Is there a policy governing its use in Jython? BTW, do you know Jim Gramling and/or Karl Schendel? |
From: Chris C. <Chr...@in...> - 2011-04-18 21:38:35
|
Dan Stromberg wrote: > os.open always returns bytes on CPython 2.x and CPython 3.x. open > returns bytes on 2.x and unicode on 3.x. Until I started > experimenting with Jython, it seemed that os.open was a way to get > some easy portability, since in my current project I'm mostly > interested in byte strings rather than text - a backup program > shouldn't break if a file doesn't fit its encoding assumptions. > > Also, use of os.open just seems a more direct route to use of os.fstat > as a security measure, to deter symlink races - though I've never > found a way of doing fstat in java short of JNI, JNA or similar, > despite some asking around; Java (and ergo Jython) does seem to be missing a few (what I would consider to be basic) unix/posix routines (cross platform, e.g. try using subprocess which needs fork under VMS). I don't have a solution, this is just a "me too!" comment :-( One slightly long winded option would be to use the new to 2.5.2 ctypes support to load libc - potentially easier than straight JNI. A lot of the boiler plat can be generated with http://code.google.com/p/ctypesgen/ - NOTE I've only use ctypes with Cpython, I've not sat down and use ctypes with Jython. > it seems like java is mostly used for user > applications, and not systems programming. Or maybe I'm overlooking > something - how do Java developers deal with the possibility of > symlink races? > > And os.open is a little faster for "transfer lots of blocks" sorts of > operations. But that's a pretty minor concern. > > FWIW, it appears that Jython 2.5.2's os.read and os.write are taking > extra steps to convert in an unexpected way... Would the > corresponding tiny patch be well received? > I'm not a Jython dev, we'd have to see what Jim's response is. From my perspective, the current behavior is a bug. There have been discussions at Pycon on a project for shared test suite and standard libraries between the different implementations (http://www.boredomandlaziness.org/2011/03/python-vm-summit-rough-notes.html). This seems like a great addition if that gets started. I'm not sure who to raise this with other than joining the python-dev mailing list. This is probably more than you were thinking of :-) RE your compatibility library for versions/implementations take a look at http://packages.python.org/six/ > Though I'm now realizing, that's not as significant as the lack of an > os.fstat for my current application. It seems like os.fstat is in > Jython's documentation, but upon import os, I don't see it. > > There appears to be a few mentions of JNA in the current code... Is > there a policy governing its use in Jython? > > BTW, do you know Jim Gramling and/or Karl Schendel? > I know-of Jim but I do not know him (I'm not sure if he is still at Microsoft). I know Karl, he's at Ingres :-) Small world! Chris |
From: Alan K. <jyt...@xh...> - 2011-04-18 18:45:22
|
[Dan] > How does one, in Jython 2.5.2, convert from a byte string to a text > string, and vice versa? You may find this discussion from 2004, about bytes, string and WSGI on jython to be informative. [Web-SIG] bytes, strings, and Unicode in Jython, IronPython, and CPython 3.0 http://mail.python.org/pipermail/web-sig/2004-September/000858.html Regards, Alan. |
From: Alan K. <jyt...@xh...> - 2011-04-18 18:54:28
|
[Dan] > I'm writing a deduplicating backup program that works well so far on CPython > 2.x, CPython 3.x, PyPy 1.4.1 and recent PyPy trunk builds, but it tracebacks > on Jython 2.5.2 with something that felt related to string semantics. > > However, it appears to really be an issue of what type is returned by > open(fn, 'r').read(length) and os.read(os.open(fn, O_RDONLY), length). > > I made some progress by adding 'b' to my python open()'s, but when reading > using os.read(), how does one convince jython to return a str instead of a > unicode type? It seems to mostly return a unicode object, but sometimes to > return a str object - from the same open. Jython on Linux doesn't appear to > have an os.O_BINARY. If you're expecting to read bytes from a file, then you *must* specify the 'b' flag, for binary. If you don't, then you will get line-ending translation on some platforms, e.g. Windows will translate "\n" to "\r\n", MacOS will translate "\n" to "\r". This is because the default mode for files, if you don't specify binary, is text. But you won't see any line-ending translation on *nix, because the line ending on such platforms is "\n": no translation required. Builtin functions: open http://docs.python.org/release/2.5/lib/built-in-funcs.html#l2h-54 So if you open your file with "r", you won't see any problems on linux, but you will on other platforms that use a different line ending, i.e. MacOS and Windows. Alan. |
From: Dan S. <drs...@gm...> - 2011-04-18 19:00:07
|
On Mon, Apr 18, 2011 at 11:54 AM, Alan Kennedy <jyt...@xh...> wrote: > [Dan] >> I'm writing a deduplicating backup program that works well so far on CPython >> 2.x, CPython 3.x, PyPy 1.4.1 and recent PyPy trunk builds, but it tracebacks >> on Jython 2.5.2 with something that felt related to string semantics. >> >> However, it appears to really be an issue of what type is returned by >> open(fn, 'r').read(length) and os.read(os.open(fn, O_RDONLY), length). >> >> I made some progress by adding 'b' to my python open()'s, but when reading >> using os.read(), how does one convince jython to return a str instead of a >> unicode type? It seems to mostly return a unicode object, but sometimes to >> return a str object - from the same open. Jython on Linux doesn't appear to >> have an os.O_BINARY. > > If you're expecting to read bytes from a file, then you *must* specify > the 'b' flag, for binary. > > If you don't, then you will get line-ending translation on some > platforms, e.g. Windows will translate "\n" to "\r\n", MacOS will > translate "\n" to "\r". This is because the default mode for files, if > you don't specify binary, is text. > > But you won't see any line-ending translation on *nix, because the > line ending on such platforms is "\n": no translation required. > > Builtin functions: open > http://docs.python.org/release/2.5/lib/built-in-funcs.html#l2h-54 > > So if you open your file with "r", you won't see any problems on > linux, but you will on other platforms that use a different line > ending, i.e. MacOS and Windows. Yes, I'm aware. I'm developing on Linux; MacOS and then Windows may come later if I have time. BTW, does MacOS X still use carriage returns as newlines? Or is it closer to the rest of *ix now - using a line feed? I've mostly moved on from the above part of the issue - the part that remains is the return type of os.read(os.open(), length). |
From: Philip J. <pj...@un...> - 2011-04-19 01:51:50
|
On Apr 18, 2011, at 12:00 PM, Dan Stromberg wrote: > I've mostly moved on from the above part of the issue - the part that > remains is the return type of os.read(os.open(), length). Please log a ticket for this, os.read should probably never return unicode. That's definitely a bug -- Philip Jenvey |
From: Dan S. <drs...@gm...> - 2011-04-19 02:15:24
|
On Mon, Apr 18, 2011 at 6:45 PM, Philip Jenvey <pj...@un...> wrote: > > On Apr 18, 2011, at 12:00 PM, Dan Stromberg wrote: > >> I've mostly moved on from the above part of the issue - the part that >> remains is the return type of os.read(os.open(), length). > > Please log a ticket for this, os.read should probably never return unicode. That's definitely a bug I've submitted tickets for: 1) os.read() returns unicode 2) os.fstat() missing 3) os.major() and os.minor() missing I put them under 2.5.2rc, because I didn't see 2.5.2 in the list of versions one can file tickets under. |