From: Adam T. <aat...@ou...> - 2023-04-23 18:48:23
|
Dear Günter, all, > I finished my work on the preparations towards Docutils 0.20. > Please check and test. I have tested on Ubuntu 22.04 LTS and Windows 10, and against Sphinx. All tests are passing, though I would strongly reccomend to apply the attached patch to avoid a false-positive warning during testing. ---- > Engelbert, can you prepare a release for next week? Please may we release a 0.20b1 first so that I might ask downstream projects to test, as with the 0.19 release? I am happy to help with this, though I don't have the ability to upload to Docutils on PyPI at the moment. ---- >>> I left the decision about the end state of this transition open... >> A decision to make later, and one that doesn't block the 0.20 >> release! > Yes and no: if we want to give users advise on a stable recipe to > avoid beeing hit by the default-change, we would need agreement of > what will (most likely) be kept stable. > The API documentation "publisher.txt" now has the example > output = bytes(publish_string(...)) > (which depends on `OutputString` features). Can we mark this feature as provisional? Personally, I don't think that we should support this form of ``bytes`` conversion long-term, and I see the ``OutputString`` as a transitional class, again not one that will be around for a long time. For me, the point of this exercise and deprecation process is to reach an end-state where ``publish_string`` always returns ``str``. Perhaps we should return to discussing ``publish_str`` and ``publish_bytes`` functions? To summarise the problem as I understand it: * Some output formats may contain information about the encoding of the document - SGML based markup languages (XML, HTML) may contain an internal encoding declaration. - TeX based languages (LaTeX, XeLaTeX, etc) may contain an internal encoding macro. * All of these formats have default encodings - XML defaults to a UTF-8 encoding if the encoding attribute is not specified, since XML 1.0 (2008) https://www.w3.org/TR/xml/#charencoding - HTML 5 requires a UTF-8 charset https://html.spec.whatwg.org/#charset - LaTeX's default encoding is UTF-8, since 2018 https://tug.org/TUGboat/tb39-1/tb121ltnews28.pdf - XeTeX I believe has always defaulted to UTF-8. * If a user asks for output as a Unicode ``str``, I believe it is reasonable to assume these defaults (UTF-8 encoding). * If a user asks for output as a Unicode ``str``, but overrides the ``output_encoding`` setting, I believe it is reasonable to assume that the user is now responsible for conversion of the ``str`` to ``bytes`` for serialisation to disk, and we should not support an output format that does this by 'magic'. We could declare this as unsupported behaviour as an alternative, and just issue an error. * If a user asks for binary output (a ``bytes`` instance), I think it is reasonable to use ``output_encoding`` to encode the ``str`` instance we use internally to a ``bytes`` instance. * We therefore need to decide the following end-state positions: a) Do we want to support (long-term) outputting ``bytes`` from the core publish API? b) Do we want to support (long-term) encodings other than UTF-8? * If (a) is true, we should decide if it is through a dedicated function, or through an overloaded signature (the current status). You have previously argued for keeping the "core" interface as small as possible, and I would strongly advocate against overloaded return types, perhaps leading to us not supporting returning ``bytes`` from the core publish API. This may be a reasonable position, as if a user knows that he wants bytes output, he should set the output encoding explicitly anyway, and therefore he has control over the encoding from ``str`` to ``bytes`` as he can e.g. do: .. code:: python encoding = 'latin1' out_str = publish_string(source, settings_overrides={'output_encoding': encoding} ) assert isinstance(out_str, str) out_bytes = out_str.encode(encoding) In a hypothetical future where ``publish_string`` always returns ``str`` instances. * If (b) is false, we could simplify the I/O code a great deal. I think it may be reasonable to expect the user to be responsible for encoding conversions, or to move Docutils' code to handle that away from the core and into the command-line interface, for example. Sorry for the rather long message appended to a release thread, but as you note, perhaps the decision cannot be delayed, as the documentation contains a recipie that we may later regret declaring support for. ---- Thanks, Adam ---------- >From 5031c0ff9923057a5a12a80551b67992dcb2b4df Mon Sep 17 00:00:00 2001 From: Adam Turner <908...@us...> Date: Sun, 23 Apr 2023 16:50:15 +0100 Subject: [PATCH] Ignore ``CSVTable.HeaderDialect`` deprecation warning --- docutils/docutils/parsers/rst/directives/tables.py | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/docutils/docutils/parsers/rst/directives/tables.py b/docutils/docutils/parsers/rst/directives/tables.py index 446034828..8212f2cfc 100644 --- a/docutils/docutils/parsers/rst/directives/tables.py +++ b/docutils/docutils/parsers/rst/directives/tables.py @@ -64,8 +64,11 @@ def process_header_option(self): table_head = [] max_header_cols = 0 if 'header' in self.options: # separate table header in option + with warnings.catch_warnings(): + warnings.simplefilter('ignore') + header_dialect = self.HeaderDialect() rows, max_header_cols = self.parse_csv_data_into_rows( - self.options['header'].split('\n'), self.HeaderDialect(), + self.options['header'].split('\n'), header_dialect, source) table_head.extend(rows) return table_head, max_header_cols -- 2.40.0.windows.1 |