From: Jeff A. <ja...@fa...> - 2017-05-19 08:19:11
|
Hi Darjus. On inclusion, I'm happy to go with the community view, as always. On one of the related tickets (http://bugs.jython.org/issue1839), Jim said we'd get it in if timing allowed and there was some user support. I'm very keen to see a 2.7.1 too. The last (soft) RC was unsuccessful, and we're still making changes, so I assume we're talking about another RC first rather than a release? The UTF-8 work is nearly there, but not quite: one Linux defect to fix, as noted on the same issue by James against the "latin-1" version. After all the additions in the last couple of weeks (to get full BMP support), I'm happy to find from my Linux laptop that it is still the only thing I have to do. It looks trivial. I've been unable code at all for a few days, so haven't looked into a solution, but now I'm back I expect to nail it for us today or tomorrow. I can, of course, merge all this myself and will. I shared your hesitancy initially, hence the fork repository, but it's turned out so well I feel it's now low risk, as long as we still have a few days. I will now dive under the desk and wire up my Linux dev box. Jeff Allen On 16/05/2017 21:46, Darjus Loktevic wrote: > Hey Jeff, > > It seems your last commit to this branch is of three days ago. Is this > ready for review? BTW, your changes look good to me. > I'm a little hesitant to merge this since we've had an RC and REALLY > have to release 2.7.1 It's miles better than 2.7.0. > > Cheers, > Darjus > > On Mon, May 1, 2017 at 6:34 AM Jeff Allen <ja...@fa... > <mailto:ja...@fa...>> wrote: > > I went for sys.getfilesystemencoding() == 'utf-8' and it works pretty > well. Rather than just push directly I have published to here: > > https://bitbucket.org/tournesol/jython-utf8 > > I write to ask for a second or third pair of eyes on it. Please > tell me > you can see it and whether it breaks things you care about. > > I touched a lot of files in the core and import system: quite a lot of > tricky stuff with loaders and search paths has been adjusted. I > think it > a good sign that I changed hardly anything in the standard library we > inherit from CPython, that we hadn't already specialised. > > By "works pretty well" above, I mean that the regression tests run > cleanly for me when my user name is "Épreuve", where previously Jython > died horribly. The launcher works from a Chinese user name too, as > long > as I localise Windows to China (CPython 2.7 feature). I can use the > prompt and runs some tests with that setup, but I can't run the > regression test yet, and printing a stack dump is fatal, so there's a > bit more to do for Chinese. > > I think this means we have solid support for "latin-1" languages, but > there are still places where we fatally assume bytes are Unicode code > points. > > Jeff Allen > > On 05/04/2017 08:57, Jeff Allen wrote: > > I've been working on http://bugs.jython.org/issue2356 which I'd > like to > > get in 2.7.1 -- it seems rather poor that Jython simply does not > run for > > users whose names have an un-American character ;). I know this > issue is > > not a blocker in most minds. > > > > I've made pretty good progress by allowing file names to be unicode > > objects more often than they would be in CPython 2, which usually > > returns them as bytes in some encoding that we may not know. > I've got > > the launcher to work properly, and straightened the logic in our > > printing of trace-backs and exceptions from Java. Unicode file names > > seems the way to go for Jython because: > > > > 1. Java gives us competently decoded unicode file names, from > > java.io.File, etc.. Re-encoding the result will be a pain (and > > overlooked). > > 2. We appear not to have the codec we need ('mbcs'), that CPython > > reports on Windows via sys.getfilesystemencoding(). > > 3. We do this already. In 2.7.0, os.getcwd() returns unicode > if necessary. > > > > Most regression tests pass. However, I'm struggling with > test_doctest. > > Problems arise when mixing unicode and bytes when one byte is > 128 and > > over. This happens in ''.join(list) and formatted output like > "%s %s" % > > (ustr, bstr). The behaviour of these is identical with CPython: they > > raise UnicodeDecodeError because the bytes are promoted to > characters > > with a strict ascii interpretation. This happens a lot in > doctest.py and > > traceback.py, for example, where file paths and stack dumps that > include > > them, are now frequently unicode, while other inputs are byte data > > containing file paths presented in the console encoding. > > > > I can beat this into submission with enough customisation of the > stdlib > > modules, but that always makes me uncomfortable. I usually see > that as a > > hint that user code might also need to change. This may be > unfounded. I > > can probably ensure no impact to users of only ascii paths, and the > > others seem unable to run Jython at all (in the scope of this > issue). > > However, I'm seriously wondering if I should pursue the approach > where > > file names from Java are re-encoded to bytes (maybe as utf-8 > > everywhere), but that's grim. > > > > Thoughts? > > > > > ------------------------------------------------------------------------------ > Check out the vibrant tech community on one of the world's most > engaging tech sites, Slashdot.org! http://sdm.link/slashdot > _______________________________________________ > Jython-dev mailing list > Jyt...@li... > <mailto:Jyt...@li...> > https://lists.sourceforge.net/lists/listinfo/jython-dev > |