From: Guenter M. <mi...@us...> - 2022-06-21 12:34:46
|
On 2022-06-17, Guenter Milde via Docutils-develop wrote: > On 2022-06-15, Adam Turner wrote: ... >>> Why do you want to deprecate auto-detection of the input encoding? >>> * ``encoding='locale'`` does not help if my input files are a mix of >>> UTF-8 and latin-1. ... >> I'm not sure I understand the example you gave as Docutils works on a >> single file basis. Could you add more context please? The idea is to try another encoding when UTF-8 fails or when specified in the document itself. Use cases would be: * lazy user with many different rST source files in different encodings compiling them on different occasions but not wanting to think about the encoding. * different encodings in files compiled in one run via a Makefile or script (e.g. buildhtml.py or similar). > What I want to keep/restore is the "auto-detect" default behaviour for > reading/decoding input on Python2 (when opening files under Python 3, > this only kicks in when the first try rises an UnicodeError): The attache patches restore the Python2 behaviour (auto-detection if FileInput.encoding is None) on Python3 by (internally) reading the file in binary mode and doing the decoding with `io.Input.decode()`. This allows decoding most input without the need to configure an encoding. Günter |