From: Guenter M. <mi...@us...> - 2022-06-17 12:28:34
|
Dear Adam, On 2022-06-15, Adam Turner wrote: > Unify naming of the "utf-8" codec > --------------------------------- >> I'd prefer 'utf-8' (lowercase, in quotes) also in documentation, if it >> refers to the Python codec and UTF-8 for the abstract encoding >> algorithm. > [...] I couldn't find anywhere in my patch set that I would change [...] Sorry, this was replying to an earlier statement ("UTF-8 in documentation"). Patch https://github.com/AA-Turner/docutils/pull/15/commits/f7f45addbd8cc728ef03c28d62b6ea981d0fc8ac states it very well: - Use UTF-8 in prose text, error messages, and documentation - Use utf-8 in code or when referring to code - Use utf8 for LaTeX I did not apply the changes in the sample SVG images (generated with Inkscape), though. > Add encoding arguments > ---------------------- >> Don't add encoding when the locale encoding is OK. >> (We may switch to "locale" after implementing it in `docutils.io`.) > Outwith ``FileInput``, where would you want to use 'locale' for the encoding? "quicktest.py" is an old developer diagnostics tool without an option to select the input/output encodings. I suggest keeping the encoding unspecified here, so Python's default is used and the user can change the encoding via either a locale setting or starting Python in UTF-8 mode. ... > Handle encoding='locale' for docutils.io.Output > ----------------------------------------------- Which encoding is used with ``open('foo', encoding='locale')`` if Python is in UTF-8 mode? > I don't mind about putting support for ``encoding='locale'`` on just > FileInput/FileOutput -- what would your preference be here? We want to drop our 'locale' support when dropping support for Py<3.10. Does Python support 'locale' also with str.encode()? Maybe we don't even need backporting "locale" (see below). > Deprecations > ------------ >> Why do you want to deprecate ``io.locale_encoding``? > Because after introducing ``encoding='locale'`` there's no use for ``io.locale_encoding`` in Docutils anymore, and to reduce API surface. OK. We do not need special deprecation, as `io.locale_encoding` is new in Docutils 0.19.dev (moved from `utils.error_reporting`). >> Why do you want to deprecate auto-detection of the input encoding? >> * ``encoding='locale'`` does not help if my input files are a mix of >> UTF-8 and latin-1. > "auto-guessing" is a poor term -- basically I meant deprecating using > the locale encoding as default (as it will change to UTF-8). > I'm not sure I understand the example you gave as Docutils works on a > single file basis. Could you add more context please? What I want to keep/restore is the "auto-detect" default behaviour for reading/decoding input on Python2 (when opening files under Python 3, this only kicks in when the first try rises an UnicodeError): With unspecified `input_encoding` setting, `io.Input.decode` does: a) Check the BOM mark and top 2 lines of data for an encoding specification and use it, else b) try UTF-8. c) If this fails, try the locale encoding (if valid). d) Try latin-1. e) Give up, report the error. This allows decoding most input without the need to configure an encoding. Whether the future default "input-encoding" should be "auto-detect" or "utf-8" may be decided later. In any case I would keep "auto-detect" as an option. Future (incompatible) changes: * use `locale.getpreferredencoding()` in c): If a user starts Python in UTF-8 mode, we should report decoding errors instead of trying a locale encoding. * maybe drop d) * warn/info when input encoding is not UTF-8. Günter |