From: Darjus L. <da...@gm...> - 2017-05-16 20:46:37
|
Hey Jeff, It seems your last commit to this branch is of three days ago. Is this ready for review? BTW, your changes look good to me. I'm a little hesitant to merge this since we've had an RC and REALLY have to release 2.7.1 It's miles better than 2.7.0. Cheers, Darjus On Mon, May 1, 2017 at 6:34 AM Jeff Allen <ja...@fa...> wrote: > I went for sys.getfilesystemencoding() == 'utf-8' and it works pretty > well. Rather than just push directly I have published to here: > > https://bitbucket.org/tournesol/jython-utf8 > > I write to ask for a second or third pair of eyes on it. Please tell me > you can see it and whether it breaks things you care about. > > I touched a lot of files in the core and import system: quite a lot of > tricky stuff with loaders and search paths has been adjusted. I think it > a good sign that I changed hardly anything in the standard library we > inherit from CPython, that we hadn't already specialised. > > By "works pretty well" above, I mean that the regression tests run > cleanly for me when my user name is "Épreuve", where previously Jython > died horribly. The launcher works from a Chinese user name too, as long > as I localise Windows to China (CPython 2.7 feature). I can use the > prompt and runs some tests with that setup, but I can't run the > regression test yet, and printing a stack dump is fatal, so there's a > bit more to do for Chinese. > > I think this means we have solid support for "latin-1" languages, but > there are still places where we fatally assume bytes are Unicode code > points. > > Jeff Allen > > On 05/04/2017 08:57, Jeff Allen wrote: > > I've been working on http://bugs.jython.org/issue2356 which I'd like to > > get in 2.7.1 -- it seems rather poor that Jython simply does not run for > > users whose names have an un-American character ;). I know this issue is > > not a blocker in most minds. > > > > I've made pretty good progress by allowing file names to be unicode > > objects more often than they would be in CPython 2, which usually > > returns them as bytes in some encoding that we may not know. I've got > > the launcher to work properly, and straightened the logic in our > > printing of trace-backs and exceptions from Java. Unicode file names > > seems the way to go for Jython because: > > > > 1. Java gives us competently decoded unicode file names, from > > java.io.File, etc.. Re-encoding the result will be a pain (and > > overlooked). > > 2. We appear not to have the codec we need ('mbcs'), that CPython > > reports on Windows via sys.getfilesystemencoding(). > > 3. We do this already. In 2.7.0, os.getcwd() returns unicode if > necessary. > > > > Most regression tests pass. However, I'm struggling with test_doctest. > > Problems arise when mixing unicode and bytes when one byte is 128 and > > over. This happens in ''.join(list) and formatted output like "%s %s" % > > (ustr, bstr). The behaviour of these is identical with CPython: they > > raise UnicodeDecodeError because the bytes are promoted to characters > > with a strict ascii interpretation. This happens a lot in doctest.py and > > traceback.py, for example, where file paths and stack dumps that include > > them, are now frequently unicode, while other inputs are byte data > > containing file paths presented in the console encoding. > > > > I can beat this into submission with enough customisation of the stdlib > > modules, but that always makes me uncomfortable. I usually see that as a > > hint that user code might also need to change. This may be unfounded. I > > can probably ensure no impact to users of only ascii paths, and the > > others seem unable to run Jython at all (in the scope of this issue). > > However, I'm seriously wondering if I should pursue the approach where > > file names from Java are re-encoded to bytes (maybe as utf-8 > > everywhere), but that's grim. > > > > Thoughts? > > > > > > ------------------------------------------------------------------------------ > Check out the vibrant tech community on one of the world's most > engaging tech sites, Slashdot.org! http://sdm.link/slashdot > _______________________________________________ > Jython-dev mailing list > Jyt...@li... > https://lists.sourceforge.net/lists/listinfo/jython-dev > |