Re: [Docutils-develop] I/O uses default encoding argument

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

On 2022-06-09, Adam Turner wrote:

>> This means for every instance of open() without explicit encoding, we have
>> to decide whether to use "ascii", "utf-8", or `io.locale_encoding`
>> (the latter is equivalent to the value "locale" introduced in Py 3.10).

> My strong suggestion would be that Docutils moves towards defaulting to
> UTF-8 for all encodings (of course keeping the option to supply
> explicit other encodings) -- it is compatible with US-ASCII and is the
> safest sane default. (PEP 686's motivation section [1]_ has some colour
> on this).

However, in cases of user-supplied input, this is an API change.
We can fix the cases in the tests now but need due process for cases where
changes may lead to different behaviur for users.

Suggestion:

* backport Python 3.11 behaviour to docutils.io:
  “use locale encoding when encoding="locale" is passed”.

* announce change of default encoding to UTF-8

* keep encoding attribute unspecified for now when reading input
  specified by users or 3rd-party code.

>>Unfortunately, the patch mixes added "encoding" arguments with the>change of "utf8" to "utf-8" in many cases.

> An updated patch attached (The only 'utf8' -> 'utf-8' were in the
> LaTex2e writer, but you're right it is better to keep the changes
> distinct.)

Consistent naming in Docutils code (not only latex2e.py) and
documentation is good.

What is the motivation for 'utf-8'?

* Python's codecs module uses "utf_8"
  (with aliases U8, UTF, utf8, cp65001 and normalizing case and "-/_").

* In LaTeX, it's named "utf8",

* `locale` reports "UTF-8"

* PEP 8 uses uppercase:
  "Code in the core Python distribution should always use UTF-8".

* The `codecs  documentation`__ uses ``encoding='utf-8'`` when documenting
  default arguments for encode() and decode().

__ https://docs.python.org/3/library/codecs.html

Thanks,

Günter