From: Jason R. C. <ja...@us...> - 2024-06-28 15:35:03
|
--- **[bugs:#490] EncodingWarnings in io module** **Status:** open **Created:** Fri Jun 28, 2024 03:34 PM UTC by Jason R. Coombs **Last Updated:** Fri Jun 28, 2024 03:34 PM UTC **Owner:** nobody When running the [distutils](https://github.com/pypa/distutils) tests with `PYTHONWARNDEFAULTENCODING=1`, two warnings are emitted: ``` distutils/tests/test_check.py::TestCheck::test_check_restructuredtext /Users/jaraco/code/pypa/distutils/.tox/py/lib/python3.12/site-packages/docutils/io.py:381: EncodingWarning: 'encoding' argument not specified self.source = open(source_path, mode, distutils/tests/test_check.py::TestCheck::test_check_restructuredtext /Users/jaraco/code/pypa/distutils/.tox/py/lib/python3.12/site-packages/docutils/io.py:151: EncodingWarning: UTF-8 Mode affects locale.getpreferredencoding(). Consider locale.getencoding() instead. fallback = locale.getpreferredencoding(do_setlocale=False) ``` Docutils should honor [PEP 597](https://peps.python.org/pep-0597/) and address these warnings (and possibly others). In my experience, adding `encoding='utf-8'` to any io operation is the best approach - it's straight-up compatible with the default on non-Windows systems and usually honoring the Unix convention is suitable if not preferable on Windows. Not only that, but that behavior will become the default in Python 3.15 or so. --- Sent from sourceforge.net because doc...@li... is subscribed to https://sourceforge.net/p/docutils/bugs/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/docutils/admin/bugs/options. Or, if this is a mailing list, you can unsubscribe from the mailing list. |
From: Günter M. <mi...@us...> - 2024-06-29 21:09:20
|
Thank you for the feedback. The problem is worked on: * In the repository version, the default encoding is "utf-8". * In Docutils 1.0, the input encoding detection will be removed. This removal includes the fallback "getpreferredencoding()" in line 151. --- **[bugs:#490] EncodingWarnings in io module** **Status:** open **Created:** Fri Jun 28, 2024 03:34 PM UTC by Jason R. Coombs **Last Updated:** Fri Jun 28, 2024 03:34 PM UTC **Owner:** nobody When running the [distutils](https://github.com/pypa/distutils) tests with `PYTHONWARNDEFAULTENCODING=1`, two warnings are emitted: ``` distutils/tests/test_check.py::TestCheck::test_check_restructuredtext /Users/jaraco/code/pypa/distutils/.tox/py/lib/python3.12/site-packages/docutils/io.py:381: EncodingWarning: 'encoding' argument not specified self.source = open(source_path, mode, distutils/tests/test_check.py::TestCheck::test_check_restructuredtext /Users/jaraco/code/pypa/distutils/.tox/py/lib/python3.12/site-packages/docutils/io.py:151: EncodingWarning: UTF-8 Mode affects locale.getpreferredencoding(). Consider locale.getencoding() instead. fallback = locale.getpreferredencoding(do_setlocale=False) ``` Docutils should honor [PEP 597](https://peps.python.org/pep-0597/) and address these warnings (and possibly others). In my experience, adding `encoding='utf-8'` to any io operation is the best approach - it's straight-up compatible with the default on non-Windows systems and usually honoring the Unix convention is suitable if not preferable on Windows. Not only that, but that behavior will become the default in Python 3.15 or so. --- Sent from sourceforge.net because doc...@li... is subscribed to https://sourceforge.net/p/docutils/bugs/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/docutils/admin/bugs/options. Or, if this is a mailing list, you can unsubscribe from the mailing list. |
From: Günter M. <mi...@us...> - 2024-07-22 11:55:06
|
- **status**: open --> open-fixed - **Comment**: [r9772] changes the default encoding from ``None`` (auto-detect) to "utf-8" in `docutils.io.Input` and `docutils.io.FileInput`. --- **[bugs:#490] EncodingWarnings in io module** **Status:** open-fixed **Created:** Fri Jun 28, 2024 03:34 PM UTC by Jason R. Coombs **Last Updated:** Sat Jun 29, 2024 09:09 PM UTC **Owner:** nobody When running the [distutils](https://github.com/pypa/distutils) tests with `PYTHONWARNDEFAULTENCODING=1`, two warnings are emitted: ``` distutils/tests/test_check.py::TestCheck::test_check_restructuredtext /Users/jaraco/code/pypa/distutils/.tox/py/lib/python3.12/site-packages/docutils/io.py:381: EncodingWarning: 'encoding' argument not specified self.source = open(source_path, mode, distutils/tests/test_check.py::TestCheck::test_check_restructuredtext /Users/jaraco/code/pypa/distutils/.tox/py/lib/python3.12/site-packages/docutils/io.py:151: EncodingWarning: UTF-8 Mode affects locale.getpreferredencoding(). Consider locale.getencoding() instead. fallback = locale.getpreferredencoding(do_setlocale=False) ``` Docutils should honor [PEP 597](https://peps.python.org/pep-0597/) and address these warnings (and possibly others). In my experience, adding `encoding='utf-8'` to any io operation is the best approach - it's straight-up compatible with the default on non-Windows systems and usually honoring the Unix convention is suitable if not preferable on Windows. Not only that, but that behavior will become the default in Python 3.15 or so. --- Sent from sourceforge.net because doc...@li... is subscribed to https://sourceforge.net/p/docutils/bugs/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/docutils/admin/bugs/options. Or, if this is a mailing list, you can unsubscribe from the mailing list. |
From: Adam T. <aa-...@us...> - 2024-08-01 14:55:34
|
[r9772] breaks tests for non-UTF locales on both Linux and Windows (e.g. ISO 88591), when not using Python's UTF-8 mode. See the following failures (from [GitHub Actions](https://github.com/AA-Turner/docutils/actions/runs/10200475399/job/28220043798?pr=16), scroll up to the first section `Run test suite (pytest ./test)`): ``` _____________________ FileInputTests.test_fallback_no_utf8 _____________________ self = <test.test_io.FileInputTests testMethod=test_fallback_no_utf8> @unittest.skipIf(preferredencoding in (None, 'ascii', 'utf-8'), 'locale encoding not set or UTF-8') def test_fallback_no_utf8(self): # If no encoding is given and decoding with 'utf-8' fails, # use the locale's preferred encoding (if not None). # Provisional: the default will become 'utf-8' # (without auto-detection and fallback) in Docutils 0.22. source = du_io.FileInput( source_path=os.path.join(DATA_ROOT, 'latin1.txt')) > data = source.read() test/test_io.py:321: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ docutils/io.py:412: in read data = self.source.read() _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ self = <encodings.utf_8.IncrementalDecoder object at 0x7f023f9906f0> input = b'Gr\xfc\xdfe\n', final = True > ??? E UnicodeDecodeError: 'utf-8' codec can't decode byte 0xfc in position 2: invalid start byte <frozen codecs>:322: UnicodeDecodeError ``` I'm not sure what the right behaviour here should be. ------ There's also a problem on the same non-UTF-8 locales when not in UTF-8 mode: ``` ====================================================================== ERROR: test_publish_cmdline (test_publisher.ConvenienceFunctionTests.test_publish_cmdline) ---------------------------------------------------------------------- Traceback (most recent call last): File "/home/runner/work/docutils/docutils/docutils/docutils/io.py", line 525, in write self.destination.write(data) File "/home/runner/work/docutils/docutils/docutils/test/alltests.py", line 63, in write self.stream.write(string) TypeError: write() argument must be str, not bytes During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/home/runner/work/docutils/docutils/docutils/docutils/io.py", line 529, in write self.destination.buffer.write(data) ^^^^^^^^^^^^^^^^^^^^^^^ AttributeError: 'Tee' object has no attribute 'buffer' During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/home/runner/work/docutils/docutils/docutils/test/test_publisher.py", line 160, in test_publish_cmdline core.publish_cmdline(writer_name='null', File "/home/runner/work/docutils/docutils/docutils/docutils/core.py", line 431, in publish_cmdline output = publisher.publish( ^^^^^^^^^^^^^^^^^^ File "/home/runner/work/docutils/docutils/docutils/docutils/core.py", line 261, in publish output = self.writer.write(self.document, self.destination) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/runner/work/docutils/docutils/docutils/docutils/writers/__init__.py", line 81, in write return self.destination.write(self.output) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/runner/work/docutils/docutils/docutils/docutils/io.py", line 533, in write raise ValueError( ValueError: Encoding of <file> (iso8859-1) differs from specified encoding (utf-8) ---------------------------------------------------------------------- Ran 1872 tests in 4.888s FAILED (errors=1, skipped=2) Elapsed time: 5.079 seconds ``` (On Windows it says ``ValueError: Encoding of <file> (cp1252) differs `` instead). This failure only happens with ``alltests.py``. Now that both ``pytest`` and ``unittest`` work with our test suite, we could consider removing ``alltests.py``. A --- **[bugs:#490] EncodingWarnings in io module** **Status:** open-fixed **Created:** Fri Jun 28, 2024 03:34 PM UTC by Jason R. Coombs **Last Updated:** Mon Jul 22, 2024 11:55 AM UTC **Owner:** nobody When running the [distutils](https://github.com/pypa/distutils) tests with `PYTHONWARNDEFAULTENCODING=1`, two warnings are emitted: ``` distutils/tests/test_check.py::TestCheck::test_check_restructuredtext /Users/jaraco/code/pypa/distutils/.tox/py/lib/python3.12/site-packages/docutils/io.py:381: EncodingWarning: 'encoding' argument not specified self.source = open(source_path, mode, distutils/tests/test_check.py::TestCheck::test_check_restructuredtext /Users/jaraco/code/pypa/distutils/.tox/py/lib/python3.12/site-packages/docutils/io.py:151: EncodingWarning: UTF-8 Mode affects locale.getpreferredencoding(). Consider locale.getencoding() instead. fallback = locale.getpreferredencoding(do_setlocale=False) ``` Docutils should honor [PEP 597](https://peps.python.org/pep-0597/) and address these warnings (and possibly others). In my experience, adding `encoding='utf-8'` to any io operation is the best approach - it's straight-up compatible with the default on non-Windows systems and usually honoring the Unix convention is suitable if not preferable on Windows. Not only that, but that behavior will become the default in Python 3.15 or so. --- Sent from sourceforge.net because doc...@li... is subscribed to https://sourceforge.net/p/docutils/bugs/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/docutils/admin/bugs/options. Or, if this is a mailing list, you can unsubscribe from the mailing list. |
From: Adam T. <aa-...@us...> - 2024-08-01 14:57:57
|
I tested reverting and re-applying [r9772] in [this PR](https://github.com/AA-Turner/docutils/pull/16) -- note that the 'alltest' failure occurs in the before and after, but pytest and unittest fail when [r9772] is reapplied. A --- **[bugs:#490] EncodingWarnings in io module** **Status:** open-fixed **Created:** Fri Jun 28, 2024 03:34 PM UTC by Jason R. Coombs **Last Updated:** Thu Aug 01, 2024 02:55 PM UTC **Owner:** nobody When running the [distutils](https://github.com/pypa/distutils) tests with `PYTHONWARNDEFAULTENCODING=1`, two warnings are emitted: ``` distutils/tests/test_check.py::TestCheck::test_check_restructuredtext /Users/jaraco/code/pypa/distutils/.tox/py/lib/python3.12/site-packages/docutils/io.py:381: EncodingWarning: 'encoding' argument not specified self.source = open(source_path, mode, distutils/tests/test_check.py::TestCheck::test_check_restructuredtext /Users/jaraco/code/pypa/distutils/.tox/py/lib/python3.12/site-packages/docutils/io.py:151: EncodingWarning: UTF-8 Mode affects locale.getpreferredencoding(). Consider locale.getencoding() instead. fallback = locale.getpreferredencoding(do_setlocale=False) ``` Docutils should honor [PEP 597](https://peps.python.org/pep-0597/) and address these warnings (and possibly others). In my experience, adding `encoding='utf-8'` to any io operation is the best approach - it's straight-up compatible with the default on non-Windows systems and usually honoring the Unix convention is suitable if not preferable on Windows. Not only that, but that behavior will become the default in Python 3.15 or so. --- Sent from sourceforge.net because doc...@li... is subscribed to https://sourceforge.net/p/docutils/bugs/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/docutils/admin/bugs/options. Or, if this is a mailing list, you can unsubscribe from the mailing list. |
From: Jason R. C. <ja...@us...> - 2024-08-01 16:14:34
|
> I'm not sure what the right behaviour here should be. In my opinion, the project should stop honoring the "preferred encoding" and instead expect UTF-8 unless otherwise specified, as that's going to become the default behavior in Python 3.14 for most IO operations. I'm unsure of compatibility implications. It does appear as if this test (`test_fallback_no_utf8`) would no longer be relevant in that regime, so I'd just delete it. Regarding the non-UTF8 mode, that does sound more complicated, although maybe that functionality too should be deprecated/removed. That is, IMHO, the user should be offered UTF-8 mode by default and an option to specify an encoding, maybe with "locale" as one option, but otherwise remove the implied "locale" behavior. I have a very weak understanding of docutils, however, so take my advice with a grain of salt. --- **[bugs:#490] EncodingWarnings in io module** **Status:** open-fixed **Created:** Fri Jun 28, 2024 03:34 PM UTC by Jason R. Coombs **Last Updated:** Thu Aug 01, 2024 02:57 PM UTC **Owner:** nobody When running the [distutils](https://github.com/pypa/distutils) tests with `PYTHONWARNDEFAULTENCODING=1`, two warnings are emitted: ``` distutils/tests/test_check.py::TestCheck::test_check_restructuredtext /Users/jaraco/code/pypa/distutils/.tox/py/lib/python3.12/site-packages/docutils/io.py:381: EncodingWarning: 'encoding' argument not specified self.source = open(source_path, mode, distutils/tests/test_check.py::TestCheck::test_check_restructuredtext /Users/jaraco/code/pypa/distutils/.tox/py/lib/python3.12/site-packages/docutils/io.py:151: EncodingWarning: UTF-8 Mode affects locale.getpreferredencoding(). Consider locale.getencoding() instead. fallback = locale.getpreferredencoding(do_setlocale=False) ``` Docutils should honor [PEP 597](https://peps.python.org/pep-0597/) and address these warnings (and possibly others). In my experience, adding `encoding='utf-8'` to any io operation is the best approach - it's straight-up compatible with the default on non-Windows systems and usually honoring the Unix convention is suitable if not preferable on Windows. Not only that, but that behavior will become the default in Python 3.15 or so. --- Sent from sourceforge.net because doc...@li... is subscribed to https://sourceforge.net/p/docutils/bugs/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/docutils/admin/bugs/options. Or, if this is a mailing list, you can unsubscribe from the mailing list. |
From: Adam T. <aa-...@us...> - 2024-08-01 20:24:30
|
> In my opinion, the project should stop honoring the "preferred encoding" and instead expect UTF-8 unless otherwise specified, as that's going to become the default behavior in Python 3.14 for most IO operations. I agree, however... > I'm unsure of compatibility implications. There was fairly extensive discussion of this in April last year. The core issue is that Docutils serialises to formats that have internal charset/encoding declarations (e.g. TeX, HTML, XML). If everything is UTF-8 then all is fine and simple, but if the user wants e.g. latin1 encoding then Docutils 'should' encode that in the relevant places in the output documents. Docutils also chooses whether to embed a Unicode character directly vs using an escape or macro (e.g. the dagger † footnote symbol) based on the chosen encoding. I am of the opinion that Docutils should remove support for encodings other than Unicode (UTF-8) in text mode for both input and output. UTF-8 is so ubiquitous that anyone running a modern enough Python to use this version of Docutils will either support UTF-8 or know how to work-around any problems. The only writer that makes runtime use of the output encoding setting is LaTeX. LuaTex and XeTeX have always supported UTF-8 in source files, and LaTeX has [since 2018](https://tug.org/TUGboat/tb39-1/tb121ltnews28.pdf). ---- To your original point, running with ``PYTHONWARNDEFAULTENCODING=1`` should now produce no warnings. If you still get warnings please let us know as it would mean we are missing test coverage (Docutils' tests [pass with -Werror and -Xwarn_default_encoding](https://github.com/docutils/docutils/actions/runs/10204589804/job/28233427232)). A --- **[bugs:#490] EncodingWarnings in io module** **Status:** open-fixed **Created:** Fri Jun 28, 2024 03:34 PM UTC by Jason R. Coombs **Last Updated:** Thu Aug 01, 2024 04:14 PM UTC **Owner:** nobody When running the [distutils](https://github.com/pypa/distutils) tests with `PYTHONWARNDEFAULTENCODING=1`, two warnings are emitted: ``` distutils/tests/test_check.py::TestCheck::test_check_restructuredtext /Users/jaraco/code/pypa/distutils/.tox/py/lib/python3.12/site-packages/docutils/io.py:381: EncodingWarning: 'encoding' argument not specified self.source = open(source_path, mode, distutils/tests/test_check.py::TestCheck::test_check_restructuredtext /Users/jaraco/code/pypa/distutils/.tox/py/lib/python3.12/site-packages/docutils/io.py:151: EncodingWarning: UTF-8 Mode affects locale.getpreferredencoding(). Consider locale.getencoding() instead. fallback = locale.getpreferredencoding(do_setlocale=False) ``` Docutils should honor [PEP 597](https://peps.python.org/pep-0597/) and address these warnings (and possibly others). In my experience, adding `encoding='utf-8'` to any io operation is the best approach - it's straight-up compatible with the default on non-Windows systems and usually honoring the Unix convention is suitable if not preferable on Windows. Not only that, but that behavior will become the default in Python 3.15 or so. --- Sent from sourceforge.net because doc...@li... is subscribed to https://sourceforge.net/p/docutils/bugs/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/docutils/admin/bugs/options. Or, if this is a mailing list, you can unsubscribe from the mailing list. |
From: Günter M. <mi...@us...> - 2024-08-07 15:57:44
|
> The only writer that makes runtime use of the output encoding setting is LaTeX. **Output** encoding defaults to "utf-8" for all writers since several years. Most writers honour the "output-encoding" setting and encode the output file accordingly. So, you may use "rst2html5 --output-encoding=ASCII:xmlcharrefreplace" to have a pure ASCII file. The HTML, XML, and LaTeX writers also specify the used encoding in the file, the LaTeX writer also provides replacements for Unicode characters that are not encodable if a legacy output encoding is selected. There should be no cases of `output_encoding == None`. **Input** encoding was "auto-select" with fallback to utf-8 and "locale encoding" until 0.21. After the discussion last year the transition to utf-8 started: 0.22 uses "utf-8" as input encoding default, we will remove the input encoding auto-detection code in Docutils 1.0. The offending test case, "test_fallback_no_utf8()" is more trouble than help and removed in [r9864]. --- **[bugs:#490] EncodingWarnings in io module** **Status:** open-fixed **Created:** Fri Jun 28, 2024 03:34 PM UTC by Jason R. Coombs **Last Updated:** Thu Aug 01, 2024 08:24 PM UTC **Owner:** nobody When running the [distutils](https://github.com/pypa/distutils) tests with `PYTHONWARNDEFAULTENCODING=1`, two warnings are emitted: ``` distutils/tests/test_check.py::TestCheck::test_check_restructuredtext /Users/jaraco/code/pypa/distutils/.tox/py/lib/python3.12/site-packages/docutils/io.py:381: EncodingWarning: 'encoding' argument not specified self.source = open(source_path, mode, distutils/tests/test_check.py::TestCheck::test_check_restructuredtext /Users/jaraco/code/pypa/distutils/.tox/py/lib/python3.12/site-packages/docutils/io.py:151: EncodingWarning: UTF-8 Mode affects locale.getpreferredencoding(). Consider locale.getencoding() instead. fallback = locale.getpreferredencoding(do_setlocale=False) ``` Docutils should honor [PEP 597](https://peps.python.org/pep-0597/) and address these warnings (and possibly others). In my experience, adding `encoding='utf-8'` to any io operation is the best approach - it's straight-up compatible with the default on non-Windows systems and usually honoring the Unix convention is suitable if not preferable on Windows. Not only that, but that behavior will become the default in Python 3.15 or so. --- Sent from sourceforge.net because doc...@li... is subscribed to https://sourceforge.net/p/docutils/bugs/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/docutils/admin/bugs/options. Or, if this is a mailing list, you can unsubscribe from the mailing list. |
From: Günter M. <mi...@us...> - 2025-07-30 09:04:43
|
- **status**: open-fixed --> closed-fixed - **Comment**: The bug is fixed in Docutils 0.22 released 2025-07-29. Thanks agaoin for reporting. --- **[bugs:#490] EncodingWarnings in io module** **Status:** closed-fixed **Created:** Fri Jun 28, 2024 03:34 PM UTC by Jason R. Coombs **Last Updated:** Wed Aug 07, 2024 03:57 PM UTC **Owner:** nobody When running the [distutils](https://github.com/pypa/distutils) tests with `PYTHONWARNDEFAULTENCODING=1`, two warnings are emitted: ``` distutils/tests/test_check.py::TestCheck::test_check_restructuredtext /Users/jaraco/code/pypa/distutils/.tox/py/lib/python3.12/site-packages/docutils/io.py:381: EncodingWarning: 'encoding' argument not specified self.source = open(source_path, mode, distutils/tests/test_check.py::TestCheck::test_check_restructuredtext /Users/jaraco/code/pypa/distutils/.tox/py/lib/python3.12/site-packages/docutils/io.py:151: EncodingWarning: UTF-8 Mode affects locale.getpreferredencoding(). Consider locale.getencoding() instead. fallback = locale.getpreferredencoding(do_setlocale=False) ``` Docutils should honor [PEP 597](https://peps.python.org/pep-0597/) and address these warnings (and possibly others). In my experience, adding `encoding='utf-8'` to any io operation is the best approach - it's straight-up compatible with the default on non-Windows systems and usually honoring the Unix convention is suitable if not preferable on Windows. Not only that, but that behavior will become the default in Python 3.15 or so. --- Sent from sourceforge.net because doc...@li... is subscribed to https://sourceforge.net/p/docutils/bugs/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/docutils/admin/bugs/options. Or, if this is a mailing list, you can unsubscribe from the mailing list. |