From: Adam T. <aa-...@us...> - 2024-08-01 14:55:34
|
[r9772] breaks tests for non-UTF locales on both Linux and Windows (e.g. ISO 88591), when not using Python's UTF-8 mode. See the following failures (from [GitHub Actions](https://github.com/AA-Turner/docutils/actions/runs/10200475399/job/28220043798?pr=16), scroll up to the first section `Run test suite (pytest ./test)`): ``` _____________________ FileInputTests.test_fallback_no_utf8 _____________________ self = <test.test_io.FileInputTests testMethod=test_fallback_no_utf8> @unittest.skipIf(preferredencoding in (None, 'ascii', 'utf-8'), 'locale encoding not set or UTF-8') def test_fallback_no_utf8(self): # If no encoding is given and decoding with 'utf-8' fails, # use the locale's preferred encoding (if not None). # Provisional: the default will become 'utf-8' # (without auto-detection and fallback) in Docutils 0.22. source = du_io.FileInput( source_path=os.path.join(DATA_ROOT, 'latin1.txt')) > data = source.read() test/test_io.py:321: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ docutils/io.py:412: in read data = self.source.read() _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ self = <encodings.utf_8.IncrementalDecoder object at 0x7f023f9906f0> input = b'Gr\xfc\xdfe\n', final = True > ??? E UnicodeDecodeError: 'utf-8' codec can't decode byte 0xfc in position 2: invalid start byte <frozen codecs>:322: UnicodeDecodeError ``` I'm not sure what the right behaviour here should be. ------ There's also a problem on the same non-UTF-8 locales when not in UTF-8 mode: ``` ====================================================================== ERROR: test_publish_cmdline (test_publisher.ConvenienceFunctionTests.test_publish_cmdline) ---------------------------------------------------------------------- Traceback (most recent call last): File "/home/runner/work/docutils/docutils/docutils/docutils/io.py", line 525, in write self.destination.write(data) File "/home/runner/work/docutils/docutils/docutils/test/alltests.py", line 63, in write self.stream.write(string) TypeError: write() argument must be str, not bytes During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/home/runner/work/docutils/docutils/docutils/docutils/io.py", line 529, in write self.destination.buffer.write(data) ^^^^^^^^^^^^^^^^^^^^^^^ AttributeError: 'Tee' object has no attribute 'buffer' During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/home/runner/work/docutils/docutils/docutils/test/test_publisher.py", line 160, in test_publish_cmdline core.publish_cmdline(writer_name='null', File "/home/runner/work/docutils/docutils/docutils/docutils/core.py", line 431, in publish_cmdline output = publisher.publish( ^^^^^^^^^^^^^^^^^^ File "/home/runner/work/docutils/docutils/docutils/docutils/core.py", line 261, in publish output = self.writer.write(self.document, self.destination) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/runner/work/docutils/docutils/docutils/docutils/writers/__init__.py", line 81, in write return self.destination.write(self.output) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/runner/work/docutils/docutils/docutils/docutils/io.py", line 533, in write raise ValueError( ValueError: Encoding of <file> (iso8859-1) differs from specified encoding (utf-8) ---------------------------------------------------------------------- Ran 1872 tests in 4.888s FAILED (errors=1, skipped=2) Elapsed time: 5.079 seconds ``` (On Windows it says ``ValueError: Encoding of <file> (cp1252) differs `` instead). This failure only happens with ``alltests.py``. Now that both ``pytest`` and ``unittest`` work with our test suite, we could consider removing ``alltests.py``. A --- **[bugs:#490] EncodingWarnings in io module** **Status:** open-fixed **Created:** Fri Jun 28, 2024 03:34 PM UTC by Jason R. Coombs **Last Updated:** Mon Jul 22, 2024 11:55 AM UTC **Owner:** nobody When running the [distutils](https://github.com/pypa/distutils) tests with `PYTHONWARNDEFAULTENCODING=1`, two warnings are emitted: ``` distutils/tests/test_check.py::TestCheck::test_check_restructuredtext /Users/jaraco/code/pypa/distutils/.tox/py/lib/python3.12/site-packages/docutils/io.py:381: EncodingWarning: 'encoding' argument not specified self.source = open(source_path, mode, distutils/tests/test_check.py::TestCheck::test_check_restructuredtext /Users/jaraco/code/pypa/distutils/.tox/py/lib/python3.12/site-packages/docutils/io.py:151: EncodingWarning: UTF-8 Mode affects locale.getpreferredencoding(). Consider locale.getencoding() instead. fallback = locale.getpreferredencoding(do_setlocale=False) ``` Docutils should honor [PEP 597](https://peps.python.org/pep-0597/) and address these warnings (and possibly others). In my experience, adding `encoding='utf-8'` to any io operation is the best approach - it's straight-up compatible with the default on non-Windows systems and usually honoring the Unix convention is suitable if not preferable on Windows. Not only that, but that behavior will become the default in Python 3.15 or so. --- Sent from sourceforge.net because doc...@li... is subscribed to https://sourceforge.net/p/docutils/bugs/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/docutils/admin/bugs/options. Or, if this is a mailing list, you can unsubscribe from the mailing list. |