From: Philip J. <pj...@un...> - 2012-09-20 20:32:21
|
On Sep 18, 2012, at 1:07 AM, Jeff Allen wrote: > I've been looking at the io module with a view to getting test.test_io > to pass, and bringing the io module as seen from Python as close as can > be to to the CPython picture. test_io has a high level of failures (25), > errors (52) and skips (82) in at the moment. A copy of the Python > implementation of PEP-3116 (_pyio.py) masquerades as _io by occupying > _io.py, whereas the CPython io.py thinks it is delegating to a C > implementation when it imports _io. > > I know a number of you have contributed to a Java implementation close > to PEP-3116 that resides in org.python.core.io . I intend to build on > that. You understand it better than I do, so I'd like to check mine is a > viable plan. > > 1. The package org.python.modules._fileio is my starting point, but for > strict correspondence with CPython, ought in version 2.7 to be _io not > _fileio. > > 2. The new Java package org.python.modules._io will contain exposed > classes corresponding to Python's _io.* classes. These classes will have > the signatures Python expects, but delegate the work to corresponding > org.python.core.io classes. The existing PyFileIO can be my exemplar in > how to do this. Hey Jeff, I haven't thought about all this in too much detail but I should point out (as the author of core.io): o It's "loosely" based on PEP 3116 because there are some differences between py2 and py3 file: - universal newlines mode is more configurable (you can choose the 'newline' to use) and it now supports writing - since the buffer/raw layers aren't exposed in Py 2 file, I didn't bother making them threadsafe (PyFile is responsible for the locking) - no 'encoding' arg to open() functionality was needed for Py 2 - other small things (like some of the exceptions IOBase raises should be different in Py 3) and probably other things I've forgotten o It was written before CPython's _io and obviously before your new buffer stuff. IIRC the lower layers of CPython's _io heavily use Py_Buffer whereas core.io works on java.nio.Buffer. I'm not sure how much of it might benefit from the new Buffer stuff (I haven't followed the buffer work very much unfortunately). o I was hoping PyString might eventually be based on bytes instead of char. I also thought that *possibly* future Py_Buffer support in Jython might be based/or somehow integrate with java.nio.Buffer (I'm not sure that's even a great goal though, you might have some insight. Integrating with a ByteBuffer is simple if you have an underlying Java byte[] array somewhere). - So note that core.io is well optimized right now, though it could actually gain a slight speedup in Py2 if we got rid of the extra bytes->String (for PyString) conversion Basically, adapting core.io to _io will take some doing and I'm not sure how to handle the 2 vs 3 differences. We should also keep in mind that the work shouldn't affect Py2 file performance negatively as that's ultimately more important to Py 2 code. Adding locks to all of the layers could hurt (though maybe Java 6 escape analysis/lock coarsening helps here) In fact, I'm not sure the io module is very heavily used in Py 2 code at all (probably just in some cases of Py3 compat)? You might want to consider doing the bare minimum to get it working for now, and leave optimizing it until later (maybe even until Jython 3). Then you can basically defer on all the points I'm worrying about =] > 3. There should be a static open() function in > org.python.modules._io._io.java . > > 4. fileno() should return something the Python user treats as an opaque > handle, and that open() and the constructors of streams will have to > accept, where currently their CPython implementations expect an int. I > read the discussion around the proper return type fileno() > (http://comments.gmane.org/gmane.comp.lang.jython.devel/3994 and refs > therein). We should have this already unless I'm missing something > > 5. I can make these changes progressively by ditching _io.py (clone of > _pyio.py) and replacing the current CPython io.py with one that > delegates to _pyio.py initially. Then class by class, I change its > delegation from _pyio to _io (Java implementation). In the end, we go > back to the CPython io.py. It was a little simpler in 2.6 in that the bare minimum you needed to implement for the pure Python version of io to work was _fileio.FileIO. The 2.7 _pyio is a little strange in that it refers to io.IO/RawIO/Buffered/TextIOBase to register as ABCs (which requires _io). You can probably get away with implementing just io.FileIO and SEEK_SET/CUR/END (as a builtin that'd replace io.py). Then comment out the ABC registration calls in _pyio. -- Philip Jenvey |