From: Adam T. <aat...@ou...> - 2022-06-15 22:57:54
|
> Parts of the patch-set that (IMO) do not require further discussion are now > committed to master. Thank you. Unify naming of the "utf-8" codec --------------------------------- > I'd prefer 'utf-8' (lowercase, in quotes) also in documentation, if it > refers to the Python codec and UTF-8 for the abstract encoding > algorithm. This makes sense, although for specific references to the stdlib implementation of UTF-8 as in the ``encodings.utf_8`` module we could be explicit. I couldn't find anywhere in my patch set that I would change, but I may have missed something -- were there any specific instances you were thinking of? Add encoding arguments ---------------------- Changes: > Don't add encoding when the locale encoding is OK. > (We may switch to "locale" after implementing it in `docutils.io`.) Outwith ``FileInput``, where would you want to use 'locale' for the encoding? This diverges with the custom and practise in the general Python ecosystem (and as far as I can tell encodings in general) -- I would strongly suggest using UTF-8, as it eliminates an entire class of locale/encoding related bugs. > Document changes that may affect users. > Use 'ascii' in "tools/dev/unicode2rstsubs.py". Makes sense, thanks. > Break too long lines. Sorry, I thought I'd done a formatting pass but seemingly not. Ensure locale_encoding is lower case ------------------------------------ > We can use locale.getpreferredencoding() after dropping Python versions where this was problematic. Great, thanks. Handle encoding='locale' for docutils.io.Output ----------------------------------------------- > Is uppercase ``encoding='LOCALE'`` supported in the standard > function open() in Python >= 3.10? Good question, I tested and only the exact literal ``locale`` is accepted, so we can drop the ``.lower()`` call. > IMO, we need ``encoding='locale'`` support in both, input and output. > Should ``encoding='locale'`` be supported in all Input/Output classes or > only in FileInput/FileOutput? The patch set I set last time does, via the default encoding helper method I added. I don't mind about putting support for ``encoding='locale'`` on just FileInput/FileOutput -- what would your preference be here? Deprecations ------------ > Why do you want to deprecate ``io.locale_encoding``? Because after introducing ``encoding='locale'`` there's no use for ``io.locale_encoding`` in Docutils anymore, and to reduce API surface. > Why do you want to deprecate auto-detection of the input encoding? > * ``encoding='locale'`` does not help if my input files are a mix of > UTF-8 and latin-1. "auto-guessing" is a poor term -- basically I meant deprecating using the locale encoding as default (as it will change to UTF-8). I'm not sure I understand the example you gave as Docutils works on a single file basis. Could you add more context please? > Using Python 3.10's ``-X warn_default_encoding`` argument to Python, > we can see a large number of places where the default encoding is > used. On posix systems this is now UTF-8 following PEP 538 [1], but on > Windows a non-unicode codepage can be used. > Also on POSIX, the locale encoding is kept unless the locale is "C". Yes, sorry, I wasn't precise enough. Thanks, Adam |