From: Adam T. <aa-...@us...> - 2024-08-01 20:24:30
|
> In my opinion, the project should stop honoring the "preferred encoding" and instead expect UTF-8 unless otherwise specified, as that's going to become the default behavior in Python 3.14 for most IO operations. I agree, however... > I'm unsure of compatibility implications. There was fairly extensive discussion of this in April last year. The core issue is that Docutils serialises to formats that have internal charset/encoding declarations (e.g. TeX, HTML, XML). If everything is UTF-8 then all is fine and simple, but if the user wants e.g. latin1 encoding then Docutils 'should' encode that in the relevant places in the output documents. Docutils also chooses whether to embed a Unicode character directly vs using an escape or macro (e.g. the dagger † footnote symbol) based on the chosen encoding. I am of the opinion that Docutils should remove support for encodings other than Unicode (UTF-8) in text mode for both input and output. UTF-8 is so ubiquitous that anyone running a modern enough Python to use this version of Docutils will either support UTF-8 or know how to work-around any problems. The only writer that makes runtime use of the output encoding setting is LaTeX. LuaTex and XeTeX have always supported UTF-8 in source files, and LaTeX has [since 2018](https://tug.org/TUGboat/tb39-1/tb121ltnews28.pdf). ---- To your original point, running with ``PYTHONWARNDEFAULTENCODING=1`` should now produce no warnings. If you still get warnings please let us know as it would mean we are missing test coverage (Docutils' tests [pass with -Werror and -Xwarn_default_encoding](https://github.com/docutils/docutils/actions/runs/10204589804/job/28233427232)). A --- **[bugs:#490] EncodingWarnings in io module** **Status:** open-fixed **Created:** Fri Jun 28, 2024 03:34 PM UTC by Jason R. Coombs **Last Updated:** Thu Aug 01, 2024 04:14 PM UTC **Owner:** nobody When running the [distutils](https://github.com/pypa/distutils) tests with `PYTHONWARNDEFAULTENCODING=1`, two warnings are emitted: ``` distutils/tests/test_check.py::TestCheck::test_check_restructuredtext /Users/jaraco/code/pypa/distutils/.tox/py/lib/python3.12/site-packages/docutils/io.py:381: EncodingWarning: 'encoding' argument not specified self.source = open(source_path, mode, distutils/tests/test_check.py::TestCheck::test_check_restructuredtext /Users/jaraco/code/pypa/distutils/.tox/py/lib/python3.12/site-packages/docutils/io.py:151: EncodingWarning: UTF-8 Mode affects locale.getpreferredencoding(). Consider locale.getencoding() instead. fallback = locale.getpreferredencoding(do_setlocale=False) ``` Docutils should honor [PEP 597](https://peps.python.org/pep-0597/) and address these warnings (and possibly others). In my experience, adding `encoding='utf-8'` to any io operation is the best approach - it's straight-up compatible with the default on non-Windows systems and usually honoring the Unix convention is suitable if not preferable on Windows. Not only that, but that behavior will become the default in Python 3.15 or so. --- Sent from sourceforge.net because doc...@li... is subscribed to https://sourceforge.net/p/docutils/bugs/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/docutils/admin/bugs/options. Or, if this is a mailing list, you can unsubscribe from the mailing list. |