From: engelbert g. <eng...@gm...> - 2023-01-07 20:51:38
|
develop list would be the correct destination .... On Sat, 7 Jan 2023 at 12:21, Matthias Geier <mat...@gm...> wrote: > Dear docutils maintainers. > > A few months ago, I've reported a problem with sphinxcontrib-bibtex > (https://github.com/mcmtroffaes/sphinxcontrib-bibtex/issues/309), > which turned out to actually be a problem with docutils. > There is also a related Sphinx issue > (https://github.com/sphinx-doc/sphinx/issues/10784) which shows that > the problem can also appear without using sphinxcontrib-bibtex. > > The maintainer of sphinxcontrib-bibtex has kindly provided a patch > (https://sourceforge.net/p/docutils/patches/195/), which has already > been merged in the meantime. > > I don't know the release procedure nor the roadmap of docutils, but > would it be possible to create a new docutils release that contains > this fix (and maybe other improvements)? > > Thanks in advance! > > In case you are wondering how the problem looks in practice, here is > an example: > https://nbsphinx.readthedocs.io/en/0.8.11/a-normal-rst-file.html#citations > > cheers, > Matthias > > > _______________________________________________ > Docutils-users mailing list > Doc...@li... > https://lists.sourceforge.net/lists/listinfo/docutils-users > > Please use "Reply All" to reply to the list. > |
From: Guenter M. <mi...@us...> - 2023-01-10 12:24:25
|
Dear Docutils developers, a happy new year to everyone! On 2023-01-07, engelbert gruber wrote: > On 7 Jan 2023 Matthias Geier <mat...@gm...> wrote: >> A few months ago, I've reported a problem with sphinxcontrib-bibtex ... >> The maintainer of sphinxcontrib-bibtex has kindly provided a patch >> (https://sourceforge.net/p/docutils/patches/195/), which has already >> been merged in the meantime. >> I don't know the release procedure nor the roadmap of docutils, but >> would it be possible to create a new docutils release that contains >> this fix (and maybe other improvements)? ... All in all, we have fixed 4 bugs and merged 2 patches since the last release (see for tags "open-fixed" or "open-accepted" in https://sourceforge.net/p/docutils/_list/tickets) so YES, it is time for a new release. The long list of changes and improvements (see HISTORY.txt) indicates that this is not a pure bug-fix release, so (following our policies), the version number will be 0.20. Before the actual release, we should decide on the way ahead and update the announcments of future changes https://docutils.sourceforge.io/RELEASE-NOTES.html#future-changes front end tools: - Which Docutils version will drop the ``.py`` extension? - Keep/drop less often required ``rst2*`` tools? Which? - announce switch of packaging framework see also https://sourceforge.net/p/docutils/patches/186/ Proposal [GM]: - implement in 0.21 - keep [rst2html, rst2html4, rst2html5, rst2latex, rst2man, rst2odt, rst2pseudoxml, rst2s5, rst2xetex, rst2xml] input encoding: - Announce new default "utf-8"? For which Docutils version? - Which Docutils version shall implement the already announced changes? (They are rather a bugfix, make the code simpler, a patch is ready.) see also https://sourceforge.net/p/docutils/patches/194/ Proposal [GM]: implement already announced changes in 0.21 announce "utf-8" as default for 1.0 announce removal of encoding detection for 2.0 install.py - Remove in which Docutils version? (The removal is already announced but without affected version.) Proposal [GM]: Remove in 0.21, announce this now. Further decisions: * Can we agree on the Docutils command-line usage pattern change? :: - <toolname> [options] [<source> [<destination>]] + <toolname> [options] source [source2 [source3 [...]]] For the rationale, see https://clig.dev/#arguments-and-flags and https://sourceforge.net/p/docutils/feature-requests/36/ If yes, announce now (documentation patch is ready). Proposal [GM]: Yes * Drop support for older Python versions? (Which?, When?) Typing hints become a lot simpler (e.g. with the "|" operator) in 3.10. Proposal [GM] Raise requirement to >=3.9 in 0.21, announce now. (This is the default Python3 version in current Debian/stable, the most "conservative" major Linux release.) * Future of ``core.publish_string()`` API function: a) Keep current behaviour :: def publish_string(source: Union[bytes, str], [...] enable_exit_status=False) -> Union[bytes, str] as "wart", just improve documentation? b) Deprecate ``publish_string()`` and provide new ``publish_str()`` and ``publish_bytes()`` functions? c) Return a sub-class of ``str`` with ``__bytes__()`` method that encodes with ``encoding`` and ``encoding_errors`` set to the "output_encoding" and "output_encoding_errors" setting values? - as "subtly" changed behaviour, or - with a new function replacing ``publish_string()`` (find a good name!)? Proposal [GM] - explore c) - remove ``core.publish_bytes()`` before releasing 0.20. Are there other open issues that should be adressed before the next release? Thanks, Günter |
From: Guenter M. <mi...@us...> - 2023-01-22 00:01:18
|
Dear Docutils developers, an update and request for comments... On 2023-01-10, Guenter Milde via Docutils-develop wrote: ... > it is time for a new release. ... > * Future of ``core.publish_string()`` API function: > a) Keep current behaviour :: > def publish_string(source: Union[bytes, str], > [...] > enable_exit_status=False) -> Union[bytes, str] > as "wart", just improve documentation? > b) Deprecate ``publish_string()`` and provide new ``publish_str()`` > and ``publish_bytes()`` functions? > c) Return a sub-class of ``str`` with ``__bytes__()`` method that > encodes with ``encoding`` and ``encoding_errors`` set to the > "output_encoding" and "output_encoding_errors" setting values? > - as "subtly" changed behaviour, or > - with a new function replacing ``publish_string()`` > (find a good name!)? > Proposal [GM] > - explore c) > - remove ``core.publish_bytes()`` before releasing 0.20. I prepared a patch for option c) with a new function attribute 'auto_encode' for `core.publish_string()`. This would allow to keep the name and switch to a behaviour matching it (returning a string, not bytes) gradually (by switching the default value and eventually removing the option later). See below. > Are there other open issues that should be adressed before the next release? * Implement the patch for a configurable include_root? https://sourceforge.net/p/docutils/feature-requests/91/ Thanks, Günter Subject: [PATCH] Define and use new `str` sub-class for string output. New class `io.OutString` adds "encoding" and "errors" attributes to `str`. Use it for `io.StringOutput`. Allows storing the "output_encoding" and "output_encoding_error_handler" settings in a transparent and easy to process way. Add "auto_encode" argument to publish_string() and publish_programatically() to give the user an option to select the output type (`bytes` or `str`) in a way that does not interfere with the intended encoding of the output. --- docutils/docutils/core.py | 34 ++++++------- docutils/docutils/io.py | 84 +++++++++++++++++++++++++++++++-- docutils/test/test_io.py | 59 +++++++++++++++++++++++ docutils/test/test_publisher.py | 21 ++++++++- 4 files changed, 177 insertions(+), 21 deletions(-) diff --git a/docutils/docutils/core.py b/docutils/docutils/core.py index 03c60279f..5d4793f5d 100644 --- a/docutils/docutils/core.py +++ b/docutils/docutils/core.py @@ -427,26 +427,21 @@ def publish_string(source, source_path=None, destination_path=None, writer=None, writer_name='pseudoxml', settings=None, settings_spec=None, settings_overrides=None, config_section=None, - enable_exit_status=False): + enable_exit_status=False, + auto_encode=True): """ Set up & run a `Publisher` for programmatic use with string I/O. Accepts a `bytes` or `str` instance as `source`. - The output is encoded according to the "output_encoding" setting; - the return value is a `bytes` instance (unless `output_encoding`_ - is "unicode", see below). - To get Docutils output as `str` instance, use `publish_parts()`:: + If `auto_encode` is True, the output is encoded according to the + `output_encoding`_ setting; the return value is a `bytes` instance + (unless `output_encoding`_ is "unicode", + cf. `docutils.io.StringOutput.write()`). - output = publish_parts(...)['whole'] - - or set `output_encoding`_ to the pseudo encoding name "unicode", e.g.:: - - publish_string(..., settings_overrides={'output_encoding': 'unicode'}) - - Beware that the `output_encoding`_ setting may affect the content - of the output (e.g. an encoding declaration in HTML or XML or the - representation of characters as LaTeX macro vs. literal character). + If `auto_encode` is False, the output is an instance of a `str` + sub-class with "output_encoding" and "output_encoding_error_handler" + settings stored as `encoding` and `errors` attributes. Parameters: see `publish_programmatically()`. @@ -463,7 +458,8 @@ def publish_string(source, source_path=None, destination_path=None, settings=settings, settings_spec=settings_spec, settings_overrides=settings_overrides, config_section=config_section, - enable_exit_status=enable_exit_status) + enable_exit_status=enable_exit_status, + auto_encode=auto_encode) return output @@ -617,7 +613,8 @@ def publish_programmatically(source_class, source, source_path, writer, writer_name, settings, settings_spec, settings_overrides, config_section, - enable_exit_status): + enable_exit_status, + auto_encode=True): """ Set up & run a `Publisher` for custom programmatic use. @@ -709,6 +706,9 @@ def publish_programmatically(source_class, source, source_path, defined by `settings_spec`. Used only if no `settings` specified. * `enable_exit_status`: Boolean; enable exit status at end of processing? + + * `auto_encode`: Boolean; encode string output and return `bytes`? + Ignored with `io.FileOutput`. """ publisher = Publisher(reader, parser, writer, settings=settings, source_class=source_class, @@ -718,5 +718,7 @@ def publish_programmatically(source_class, source, source_path, settings_spec, settings_overrides, config_section) publisher.set_source(source, source_path) publisher.set_destination(destination, destination_path) + if not auto_encode and isinstance(publisher.destination, io.StringOutput): + publisher.destination.auto_encode = auto_encode output = publisher.publish(enable_exit_status=enable_exit_status) return output, publisher diff --git a/docutils/docutils/io.py b/docutils/docutils/io.py index 2007a5cef..2162db2b3 100644 --- a/docutils/docutils/io.py +++ b/docutils/docutils/io.py @@ -74,6 +74,57 @@ def error_string(err): return f'{err.__class__.__name__}: {err}' +class OutString(str): + """Return a string representation of `object` with known encoding. + + Differences to `str()`: + + If the `encoding` is given, both `str` instances and byte-like objects + are stored as text string, the latter decoded with `encoding` and + `errors` (defaulting to 'strict'). + + The encoding is never guessed. If `encoding` is None (the default), + an informal string representation is used, also if `errors` are given. + + The original or intended encoding and error handler are stored in the + attributes `encoding` and `errors`. + Typecasting to `bytes` uses the stored values. + """ + + def __new__(cls, object, encoding=None, errors='strict'): + """Return a new OutString object. + + Provisional. + """ + try: + # decode bytes-like objects if encoding is known + return super().__new__(cls, object, encoding, errors) + except TypeError: + return super().__new__(cls, object) + + def __init__(self, object, encoding=None, errors='strict'): + """Set "encoding" and "errors" attributes.""" + self.encoding = encoding + self.errors = errors + + def __bytes__(self): + try: + return super().encode(self.encoding, self.errors) + except TypeError: + raise TypeError('OutString instance without known encoding') + + def __repr__(self): + if self.errors != 'strict': + errors_arg = f', errors={self.errors!r}' + else: + errors_arg = '' + return (f'{self.__class__.__name__}({super().__repr__()}, ' + f'encoding={self.encoding!r}{errors_arg})') + + def encode(self, encoding=None, errors=None): + return super().encode(encoding or self.encoding, errors or self.errors) + + class Input(TransformSpec): """ Abstract base class for input wrappers. @@ -264,14 +315,14 @@ class Output(TransformSpec): raise NotImplementedError def encode(self, data): - """Encode and return `data`. + """ + Encode and return `data`. If `data` is a `bytes` instance, it is returned unchanged. Otherwise it is encoded with `self.encoding`. If `self.encoding` is set to the pseudo encoding name "unicode", `data` must be a `str` instance and is returned unchanged. - """ if self.encoding and self.encoding.lower() == 'unicode': assert isinstance(data, str), ('output encoding is "unicode" ' @@ -596,14 +647,39 @@ class StringOutput(Output): default_destination_path = '<string>' + def __init__(self, destination=None, destination_path=None, + encoding=None, error_handler='strict', auto_encode=True): + self.auto_encode = auto_encode + """Let `write()` encode the output document and return `bytes`.""" + super().__init__(destination, destination_path, + encoding, error_handler) + def write(self, data): - """Encode `data`, store it in `self.destination`, and return it. + """Store `data` in `self.destination`, and return it. + If `self.auto_encode` is False, store and return a `str` + sub-class instance with "encoding" and "errors" attributes + set to `self.encoding` and `self.error_handler`. + + If `self.auto_encode` is True, encode `data` with `self.encoding` + and `self.error_handler` and store/return a `bytes` instance. + Exception: If `self.encoding` is set to the pseudo encoding name "unicode", `data` must be a `str` instance and is returned unchanged (cf. `Output.encode`). + Beware that the `output_encoding`_ setting may affect the content + of the output (e.g. an encoding declaration in HTML or XML or the + representation of characters as LaTeX macro vs. literal character). """ - self.destination = self.encode(data) + if self.auto_encode: + self.destination = self.encode(data) + return self.destination + + if not self.encoding or self.encoding.lower() == 'unicode': + encoding = None + else: + encoding = self.encoding + self.destination = OutString(data, encoding, self.error_handler) return self.destination diff --git a/docutils/test/test_io.py b/docutils/test/test_io.py index 17b77eaa1..a1485ce0a 100755 --- a/docutils/test/test_io.py +++ b/docutils/test/test_io.py @@ -189,6 +189,19 @@ class OutputTests(unittest.TestCase): fo.write(self.udata) self.assertEqual(self.udrain.getvalue(), self.udata) + def test_write_auto_encode_false(self): + so = io.StringOutput(encoding='latin1', error_handler='replace', + auto_encode=False) + output = so.write(self.udata) + # store output in self.destination and also return it + self.assertEqual(output, self.udata) + self.assertEqual(so.destination, self.udata) + # store also encoding and encoding error handler ... + self.assertEqual(output.encoding, 'latin1') + self.assertEqual(output.errors, 'replace') + # ... to allow easy conversion to `bytes`: + self.assertEqual(bytes(output), self.bdata) + def test_FileOutput_hande_io_errors_deprection_warning(self): with self.assertWarnsRegex(DeprecationWarning, '"handle_io_errors" is ignored'): @@ -224,6 +237,52 @@ class OutputTests(unittest.TestCase): self.assertRaises(ValueError, fo.write, self.udata) +class OutStringTests(unittest.TestCase): + + def test__init__defaults(self): + """Test `__new__()` and `__init__()` with default values.""" + + os = io.OutString('Grüße') + self.assertEqual(str(os), 'Grüße') + self.assertEqual(os.encoding, None) + self.assertEqual(os.errors, 'strict') + # converting to `bytes` fails if the encoding is not known: + with self.assertRaises(TypeError): + self.assertEqual(bytes(os), 'Grüße') + # without known encoding, `bytes` and other incompatible types + # are converted to their string representation ... + bos = io.OutString(b'gut') + self.assertEqual(str(bos), "b'gut'") + bos_e = io.OutString('Grüße'.encode('latin1'), errors='ignore') + self.assertEqual(str(bos_e), r"b'Gr\xfc\xdfe'") + bos = io.OutString(b'gut', encoding=None) + self.assertEqual(str(bos), "b'gut'") + + def test__init__custom_attributes(self): + """Test `__new__()` and `__init__()` with custom encoding.""" + os8 = io.OutString('Grüße', encoding='utf-8') + self.assertEqual(str(os8), 'Grüße') + self.assertEqual(bytes(os8), b'Gr\xc3\xbc\xc3\x9fe') + self.assertEqual(repr(os8), "OutString('Grüße', encoding='utf-8')") + # With known encoding, "bytes-like" objects are decoded + bos1 = io.OutString(b'Gr\xfc\xdfe', encoding='latin1') + self.assertEqual(str(bos1), 'Grüße') + self.assertEqual(bytes(bos1), b'Gr\xfc\xdfe') + # Invalid encodings (including the empty string) raise an error + with self.assertRaises(LookupError): + io.OutString(b'Gr\xfc\xdfe', encoding='') + + def test__init__custom_errors(self): + """Test `__new__()` and `__init__()` with custom `errors`.""" + ts8_r = io.OutString('Grüße', encoding='utf-8', errors='replace') + # Encoding uses the stored error handler: + self.assertEqual(ts8_r.encode('ascii'), b'Gr??e') + # Initialization with a `bytes` object uses the error handler, too: + bts8_r = io.OutString(b'Gr\xfc\xdfe', encoding='utf-8', + errors='replace') + self.assertEqual(str(bts8_r), 'Gr��e') + + class ErrorOutputTests(unittest.TestCase): def test_defaults(self): e = io.ErrorOutput() diff --git a/docutils/test/test_publisher.py b/docutils/test/test_publisher.py index 6177ad6d2..a731d2434 100755 --- a/docutils/test/test_publisher.py +++ b/docutils/test/test_publisher.py @@ -80,7 +80,8 @@ class PublisherTests(unittest.TestCase): 'nonexisting/path'], settings_overrides={'traceback': True}) - def test_publish_string(self): + def test_publish_string_input_encoding(self): + """Test handling of encoded input.""" # Transparently decode `bytes` source (with "input_encoding" setting) # default: auto-detect, fallback utf-8 # Output is encoded according to "output_encoding" setting. @@ -102,6 +103,24 @@ class PublisherTests(unittest.TestCase): settings_overrides=settings) self.assertTrue(output.endswith('Grüße\n')) + def test_publish_string_output_encoding(self): + settings = {'_disable_config': True, + 'datestamp': False, + 'output_encoding': 'latin1', + 'output_encoding_error_handler': 'replace'} + source = 'Grüß → dich' + expected = ('<document source="<string>">\n' + ' <paragraph>\n' + ' Grüß → dich\n') + # current default: encode output, return `bytes` + output = core.publish_string(source, settings_overrides=settings) + self.assertEqual(output, expected.encode('latin1', 'replace')) + # no encoding if `auto_encode` is False: + output = core.publish_string(source, settings_overrides=settings, + auto_encode=False) + self.assertEqual(output, expected) + # self.assertEqual(output.encoding, 'latin1') + class PublishDoctreeTestCase(unittest.TestCase, docutils.SettingsSpec): -- 2.30.2 |
From: Adam T. <aat...@ou...> - 2023-03-24 18:26:31
|
Dear Günter, Docutils developers, Sorry for such a long delay in responding! [snip] > YES, it is time for a new release. Great! [snip] > Before the actual release, we should decide on the way ahead and update the > announcments of future changes > https://docutils.sourceforge.io/RELEASE-NOTES.html#future-changes > front end tools: > - Which Docutils version will drop the ``.py`` extension? > > - Keep/drop less often required ``rst2*`` tools? Which? > > - announce switch of packaging framework > > see also https://sourceforge.net/p/docutils/patches/186/ > > Proposal [GM]: > - implement in 0.21 > - keep [rst2html, rst2html4, rst2html5, rst2latex, rst2man, > rst2odt, rst2pseudoxml, rst2s5, rst2xetex, rst2xml] I agree with your proposal: drop in 0.21. I would suggest only keeping rst2html, rst2html5, rst2latex, rst2man, rst2odt, rst2pseudoxml, and rst2xml, but this can always be discussed later and should not futher delay the 0.20 release. > input encoding: > - Announce new default "utf-8"? > For which Docutils version? > > - Which Docutils version shall implement the already announced changes? > (They are rather a bugfix, make the code simpler, a patch is ready.) > > see also https://sourceforge.net/p/docutils/patches/194/ > > Proposal [GM]: > implement already announced changes in 0.21 > announce "utf-8" as default for 1.0 > announce removal of encoding detection for 2.0 I agree with all three of your proposals here. > install.py > - Remove in which Docutils version? > (The removal is already announced but without affected version.) > > Proposal [GM]: > Remove in 0.21, announce this now. Agree to remove in 0.21, announce in 0.20. > * Drop support for older Python versions? (Which?, When?) > > Typing hints become a lot simpler (e.g. with the "|" operator) in 3.10.0 > > Proposal [GM] > Raise requirement to >=3.9 in 0.21, announce now. > (This is the default Python3 version in current Debian/stable, > the most "conservative" major Linux release.) Agree to require 3.9 in 0.21, announce in 0.20. For interest, Sphinx has a policy__ to support: "all minor versions of Python released in the past 42 months from the ... release date with a minimum of 3 minor versions of Python" __ https://www.sphinx-doc.org/en/master/internals/release-process.html#python-version-support-policy >> * Future of ``core.publish_string()`` API function: >> >> a) Keep current behaviour :: >> >> def publish_string(source: Union[bytes, str], >> [...] >> enable_exit_status=False) -> Union[bytes, str] >> >> as "wart", just improve documentation? >> >> b) Deprecate ``publish_string()`` and provide new ``publish_str()`` >> and ``publish_bytes()`` functions? >> >> c) Return a sub-class of ``str`` with ``__bytes__()`` method that >> encodes with ``encoding`` and ``encoding_errors`` set to the >> "output_encoding" and "output_encoding_errors" setting values? >> >> - as "subtly" changed behaviour, or >> - with a new function replacing ``publish_string()`` >> (find a good name!)? >> >> Proposal [GM] >> - explore c) >> - remove ``core.publish_bytes()`` before releasing 0.20. > I prepared a patch for option c) with a new function attribute 'auto_encode > for `core.publish_string()`. This would allow to keep the name and switch to > a behaviour matching it (returning a string, not bytes) gradually (by > switching the default value and eventually removing the option later). See below. I am content to go with option (c), but I would want to simultaneously announce the version where the default would change to ``auto_encode=False`` and the version where ``publish_string()`` would only support returning ``str`` instances. I think that it might be possible to implement the new ``auto_encode`` parameter without a custom ``str`` subclass, but unfortunatley I won't have time until May to properly work on such an implementation. Otherwise I would support option (b) to add a new ``publish_str()``. I think we should keep the ``publish_bytes()`` function in either case. >> Are there other open issues that should be adressed before the next release? > * Implement the patch for a configurable include_root? > https://sourceforge.net/p/docutils/feature-requests/91/ I think the ``include_root`` change (and others) can be delayed, they are not requirements for 0.20 Thanks, Adam |
From: Guenter M. <mi...@us...> - 2023-04-06 13:21:14
|
Dear Adam, I am glad to hear from you again. On 2023-03-24, Adam Turner wrote: ... > For interest, Sphinx has a policy__ to support: > "all minor versions of Python released in the past 42 months from the ... > release date with a minimum of 3 minor versions of Python" > __ https://www.sphinx-doc.org/en/master/internals/release-process.html#python-version-support-policy Thank you for the pointer. I'd like to keep support for the default Python version of Debian/stable (even if this is longer than 42 months old). >>> * Future of ``core.publish_string()`` API function: >>> a) Keep current behaviour :: >>> def publish_string(source: Union[bytes, str], >>> [...] >>> enable_exit_status=False) -> Union[bytes, str] >>> as "wart", just improve documentation? >>> b) Deprecate ``publish_string()`` and provide new ``publish_str()`` >>> and ``publish_bytes()`` functions? >>> c) Return a sub-class of ``str`` with ``__bytes__()`` method that >>> encodes with ``encoding`` and ``encoding_errors`` set to the >>> "output_encoding" and "output_encoding_errors" setting values? >>> - as "subtly" changed behaviour, or >>> - with a new function replacing ``publish_string()`` >>> (find a good name!)? >>> Proposal [GM] >>> - explore c) >>> - remove ``core.publish_bytes()`` before releasing 0.20. >> I prepared a patch for option c) with a new function attribute >> 'auto_encode for `core.publish_string()`. This would allow to keep the >> name and switch to a behaviour matching it (returning a string, not >> bytes) gradually (by switching the default value and eventually >> removing the option later). See below. > I am content to go with option (c), but I would want to simultaneously announce > the version where the default would change to ``auto_encode=False`` and the > version where ``publish_string()`` would only support returning ``str`` > instances. I suggest to do the default switch 2 versions after announcement, i.e. 0.22 or 1.0 (announce as 0.22 or later). I would not announce removal of the option now (shold not be earlier than 3.0, maybe never). > I think that it might be possible to implement the new ``auto_encode`` > parameter without a custom ``str`` subclass, The problem with exporting an output document as `str` instance (auto_encode=False) is that * The output document by default contains an "encoding indicator" (at least in HTML and LaTeX) which is determined by the "output_encoding" setting (depending on a set of configuration files or command line input which may be "programatically overwritten"). * A `str` instance has no meta-data storing the "intended encoding". The application calling `publish_string()` would have to re-enact the configuration parsing or to grep in the string to find out the right encoding. The old approach is to use publish_parts() and from the returned dictionary use the "document" and "encoding" items. I am open for other suggestions to solve this problem. > I think we should keep the ``publish_bytes()`` function in either case. I don't see a convincing use case for ``publish_bytes()`` and would prefer to keep the "core" interface as small as sensible. Thanks, Günter |
From: Adam T. <aat...@ou...> - 2023-04-06 14:46:10
|
Dear Günter, > I'd like to keep support for the default Python version of Debian/stable > (even if this is longer than 42 months old). Fair enough! --------------------------------------------------------------------- >> I am content to go with option (c), but I would want to simultaneously announce >> the version where the default would change to ``auto_encode=False`` and the >> version where ``publish_string()`` would only support returning ``str`` >> instances. > I suggest to do the default switch 2 versions after announcement, i.e. 0.22 > or 1.0 (announce as 0.22 or later). > I would not announce removal of the option now (shold not be earlier than > 3.0, maybe never). Ok, this sounds good -- introduce the ``auto_encode`` argument now, defaulting to ``True``, and announce that it will switch to ``False`` in Docutils 0.22 or later. --------------------------------------------------------------------- >> I think that it might be possible to implement the new ``auto_encode`` >> parameter without a custom ``str`` subclass, > The problem with exporting an output document as `str` instance > (auto_encode=False) is that > * The output document by default contains an "encoding indicator" (at least > in HTML and LaTeX) which is determined by the "output_encoding" setting > (depending on a set of configuration files or command line input which may > be "programatically overwritten"). > * A `str` instance has no meta-data storing the "intended encoding". > > The application calling `publish_string()` would have to re-enact the > configuration parsing or to grep in the string to find out the > right encoding. > The old approach is to use publish_parts() and from the returned dictionary > use the "document" and "encoding" items. > I am open for other suggestions to solve this problem. I think here we should say "practicality beats purity" and go with the subclass, viewing it as a transitional measure. You make good arguments! --------------------------------------------------------------------- >> I think we should keep the ``publish_bytes()`` function in either case. > I don't see a convincing use case for ``publish_bytes()`` and would > prefer to keep the "core" interface as small as sensible. The main use(s) here would be for publishing binary formats (e.g. ODT) to a ``bytes`` object in memory rather than writing to disk, or for when call-sites use a non-unicode ``output_encoding`` setting. If it is to be removed, perhaps we could provide a recipie in the documentation for how to manage publishing to an in-memory byte sequence. Thanks, Adam |
From: Guenter M. <mi...@us...> - 2023-04-06 16:06:36
|
Dear Adam, On 2023-04-06, Adam Turner wrote: ... > --------------------------------------------------------------------- >>> I think we should keep the ``publish_bytes()`` function in either case. >> I don't see a convincing use case for ``publish_bytes()`` and would >> prefer to keep the "core" interface as small as sensible. Rationale: ``publish_bytes()`` makes sense alongside ``publish_str()``. However, ``publish_str()`` and the existing ``publish_string()`` are so close that confusion is to be expected. > The main use(s) here would be for publishing binary formats (e.g. ODT) > to a ``bytes`` object in memory rather than writing to disk, or for > when call-sites use a non-unicode ``output_encoding`` setting. IMV, the extended ``publish_string()`` providing publish_string(..., auto_encode=False) --> OutString publish_string(..., auto_encode=True) --> bytes with `OutString` beeing 100% compatible with `str` and easily convertible to bytes via ``bytes(result)`` can cater for such needs. (Also, is should not be too surprising that `publish_string` returns a `bytes` instance if the user tells it to "auto_encode".) > If it is to be removed, perhaps we could provide a recipie in the > documentation for how to manage publishing to an in-memory byte > sequence. This is part of the `publish_string()` docstring in my patch:: If `auto_encode` is True, the output is encoded according to the `output_encoding`_ setting; the return value is a `bytes` instance (unless `output_encoding`_ is "unicode", cf. `docutils.io.StringOutput.write()`). Thanks, Günter |
From: Adam T. <aat...@ou...> - 2023-04-06 16:52:45
|
Dear Günter, >> The main use(s) here would be for publishing binary formats (e.g. ODT) >> to a ``bytes`` object in memory rather than writing to disk, or for >> when call-sites use a non-unicode ``output_encoding`` setting. > IMV, the extended ``publish_string()`` providing > publish_string(..., auto_encode=False) --> OutString > publish_string(..., auto_encode=True) --> bytes > with `OutString` beeing 100% compatible with `str` and easily convertible > to bytes via ``bytes(result)`` can cater for such needs. > (Also, is should not be too surprising that `publish_string` returns a > `bytes` instance if the user tells it to "auto_encode".) Fair enough, though I suppose I had seen ``auto_encode`` as part of moving to a position where (eventually) ``publish_string`` always returns an instance of Python's ``str`` class, rather than the current overloaded return type. In the scenario I was imagining, users of Docutils who know that they want ``bytes`` output could change to using ``publish_bytes`` in Docutils 0.20, and not need to worry about the future of ``publish_string`` or the ``auto_encode`` argument. (There is also an argument for self-documenting code, in that if you know that you want ``bytes``, using the function with ``bytes`` in the name helps a non-expert reader to understand what is going on.) >> If it is to be removed, perhaps we could provide a recipie in the >> documentation for how to manage publishing to an in-memory byte >> sequence. > This is part of the `publish_string()` docstring in my patch:: > If `auto_encode` is True, the output is encoded according to the > `output_encoding`_ setting; the return value is a `bytes` instance > (unless `output_encoding`_ is "unicode", > cf. `docutils.io.StringOutput.write()`). Sorry, I had forgotten about this part of your patch. ----------- If you have time, perhaps you could commit your patch (or the latest version thereof), and we could make progress from there? As far as I can tell, the only unresolved point at the moment ahead of releasing Docutils 0.20 is the future of ``publish_bytes``. Thanks, Adam |
From: Guenter M. <mi...@us...> - 2023-04-06 18:51:48
|
Dear Adam, On 2023-04-06, Adam Turner wrote: >>> The main use(s) here would be for publishing binary formats (e.g. ODT) >>> to a ``bytes`` object in memory rather than writing to disk, or for >>> when call-sites use a non-unicode ``output_encoding`` setting. >> IMV, the extended ``publish_string()`` providing >> publish_string(..., auto_encode=False) --> OutString >> publish_string(..., auto_encode=True) --> bytes > >> with `OutString` beeing 100% compatible with `str` and easily convertible >> to bytes via ``bytes(result)`` can cater for such needs. >> (Also, is should not be too surprising that `publish_string` returns a >> `bytes` instance if the user tells it to "auto_encode".) > Fair enough, though I suppose I had seen ``auto_encode`` as part of > moving to a position where (eventually) ``publish_string`` always > returns an instance of Python's ``str`` class, rather than the current > overloaded return type. While I favour keeping "auto_encode" over a seprate `publish_bytes()` function, I left the decision about the end state of this transition open... > In the scenario I was imagining, users of > Docutils who know that they want ``bytes`` output could change to using > ``publish_bytes`` in Docutils 0.20, and not need to worry about the > future of ``publish_string`` or the ``auto_encode`` argument. (There > is also an argument for self-documenting code, in that if you know that > you want ``bytes``, using the function with ``bytes`` in the name helps > a non-expert reader to understand what is going on.) I imagine that in absence of `publish_bytes` users knowing they need an "encoded string" (i.e. `bytes`) will find `publish_string` and either just use ``bytes(publish_string(...))`` or read the help string or docs and use ``publish_string(..., auto_encode=True)``. This will become easier with type annotations in the source. (How about starting with annotating "core.py". I only postponed this because I don't know whether a partially type-hinted module will interfere with the present 3rd-party type hint stubs.) > If you have time, perhaps you could commit your patch (or the latest > version thereof), and we could make progress from there? As far as I > can tell, the only unresolved point at the moment ahead of releasing > Docutils 0.20 is the future of ``publish_bytes``. This is now [r9336]. A patch removing `publish_bytes` waits in a branch in my local Git repo. Thanks, Günter |
From: Adam T. <aat...@ou...> - 2023-04-08 09:50:08
|
Dear Günter, > While I favour keeping "auto_encode" over a seprate `publish_bytes()` > function, I left the decision about the end state of this transition open... A decision to make later, and one that doesn't block the 0.20 release! > This will become easier with type annotations in the source. > (How about starting with annotating "core.py". I only postponed this because > I don't know whether a partially type-hinted module will interfere with the > present 3rd-party type hint stubs.) I have a patch for adding type hints to Docutils, but wanted to wait until releasing Docutils 0.20 so as not to add major new changes to the repository. >> If you have time, perhaps you could commit your patch (or the latest >> version thereof), and we could make progress from there? As far as I >> can tell, the only unresolved point at the moment ahead of releasing >> Docutils 0.20 is the future of ``publish_bytes``. > This is now [r9336]. > A patch removing `publish_bytes` waits in a branch in my local Git repo. I think to un-block the release we should apply the patch removing ``publish_bytes``. Thanks, Adam |
From: Guenter M. <mi...@us...> - 2023-04-21 17:46:46
|
Dear Docutils developers, I finished my work on the preparations towards Docutils 0.20. Please check and test. I propose a freeze starting tomorrow: don't commit stuff without asking here on the list in advance. Engelbert, can you prepare a release for next week? Have a nice weekend, Günter |
From: Adam T. <aat...@ou...> - 2023-04-23 18:48:23
|
Dear Günter, all, > I finished my work on the preparations towards Docutils 0.20. > Please check and test. I have tested on Ubuntu 22.04 LTS and Windows 10, and against Sphinx. All tests are passing, though I would strongly reccomend to apply the attached patch to avoid a false-positive warning during testing. ---- > Engelbert, can you prepare a release for next week? Please may we release a 0.20b1 first so that I might ask downstream projects to test, as with the 0.19 release? I am happy to help with this, though I don't have the ability to upload to Docutils on PyPI at the moment. ---- >>> I left the decision about the end state of this transition open... >> A decision to make later, and one that doesn't block the 0.20 >> release! > Yes and no: if we want to give users advise on a stable recipe to > avoid beeing hit by the default-change, we would need agreement of > what will (most likely) be kept stable. > The API documentation "publisher.txt" now has the example > output = bytes(publish_string(...)) > (which depends on `OutputString` features). Can we mark this feature as provisional? Personally, I don't think that we should support this form of ``bytes`` conversion long-term, and I see the ``OutputString`` as a transitional class, again not one that will be around for a long time. For me, the point of this exercise and deprecation process is to reach an end-state where ``publish_string`` always returns ``str``. Perhaps we should return to discussing ``publish_str`` and ``publish_bytes`` functions? To summarise the problem as I understand it: * Some output formats may contain information about the encoding of the document - SGML based markup languages (XML, HTML) may contain an internal encoding declaration. - TeX based languages (LaTeX, XeLaTeX, etc) may contain an internal encoding macro. * All of these formats have default encodings - XML defaults to a UTF-8 encoding if the encoding attribute is not specified, since XML 1.0 (2008) https://www.w3.org/TR/xml/#charencoding - HTML 5 requires a UTF-8 charset https://html.spec.whatwg.org/#charset - LaTeX's default encoding is UTF-8, since 2018 https://tug.org/TUGboat/tb39-1/tb121ltnews28.pdf - XeTeX I believe has always defaulted to UTF-8. * If a user asks for output as a Unicode ``str``, I believe it is reasonable to assume these defaults (UTF-8 encoding). * If a user asks for output as a Unicode ``str``, but overrides the ``output_encoding`` setting, I believe it is reasonable to assume that the user is now responsible for conversion of the ``str`` to ``bytes`` for serialisation to disk, and we should not support an output format that does this by 'magic'. We could declare this as unsupported behaviour as an alternative, and just issue an error. * If a user asks for binary output (a ``bytes`` instance), I think it is reasonable to use ``output_encoding`` to encode the ``str`` instance we use internally to a ``bytes`` instance. * We therefore need to decide the following end-state positions: a) Do we want to support (long-term) outputting ``bytes`` from the core publish API? b) Do we want to support (long-term) encodings other than UTF-8? * If (a) is true, we should decide if it is through a dedicated function, or through an overloaded signature (the current status). You have previously argued for keeping the "core" interface as small as possible, and I would strongly advocate against overloaded return types, perhaps leading to us not supporting returning ``bytes`` from the core publish API. This may be a reasonable position, as if a user knows that he wants bytes output, he should set the output encoding explicitly anyway, and therefore he has control over the encoding from ``str`` to ``bytes`` as he can e.g. do: .. code:: python encoding = 'latin1' out_str = publish_string(source, settings_overrides={'output_encoding': encoding} ) assert isinstance(out_str, str) out_bytes = out_str.encode(encoding) In a hypothetical future where ``publish_string`` always returns ``str`` instances. * If (b) is false, we could simplify the I/O code a great deal. I think it may be reasonable to expect the user to be responsible for encoding conversions, or to move Docutils' code to handle that away from the core and into the command-line interface, for example. Sorry for the rather long message appended to a release thread, but as you note, perhaps the decision cannot be delayed, as the documentation contains a recipie that we may later regret declaring support for. ---- Thanks, Adam ---------- >From 5031c0ff9923057a5a12a80551b67992dcb2b4df Mon Sep 17 00:00:00 2001 From: Adam Turner <908...@us...> Date: Sun, 23 Apr 2023 16:50:15 +0100 Subject: [PATCH] Ignore ``CSVTable.HeaderDialect`` deprecation warning --- docutils/docutils/parsers/rst/directives/tables.py | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/docutils/docutils/parsers/rst/directives/tables.py b/docutils/docutils/parsers/rst/directives/tables.py index 446034828..8212f2cfc 100644 --- a/docutils/docutils/parsers/rst/directives/tables.py +++ b/docutils/docutils/parsers/rst/directives/tables.py @@ -64,8 +64,11 @@ def process_header_option(self): table_head = [] max_header_cols = 0 if 'header' in self.options: # separate table header in option + with warnings.catch_warnings(): + warnings.simplefilter('ignore') + header_dialect = self.HeaderDialect() rows, max_header_cols = self.parse_csv_data_into_rows( - self.options['header'].split('\n'), self.HeaderDialect(), + self.options['header'].split('\n'), header_dialect, source) table_head.extend(rows) return table_head, max_header_cols -- 2.40.0.windows.1 |
From: Guenter M. <mi...@us...> - 2023-04-24 11:57:57
|
Dear Docutils developers, On 2023-04-23, Adam Turner wrote: > I have tested on Ubuntu 22.04 LTS and Windows 10, and against Sphinx. > All tests are passing, Good news. What are the tested Python versions? Could someone test with Python 3.11? > though I would strongly reccomend to apply the > attached patch to avoid a false-positive warning during testing. I agree with the patch. ... > Please may we release a 0.20b1 first so that I might ask downstream > projects to test, as with the 0.19 release? My experience from the latest pre-releases was minimal to no feedback. However, if you have other experiences of expectations, I agree with a pre-release. I suggest "rc1" instead of "beta" (as I really like 0.20 coming out soon and rather see a not too distant 0.21...). ... >> The API documentation "publisher.txt" now has the example >> output = bytes(publish_string(...)) >> (which depends on `OutputString` features). > Can we mark this feature as provisional? Personally, I don't think > that we should support this form of ``bytes`` conversion long-term, > and I see the ``OutputString`` as a transitional class, again not > one that will be around for a long time. OK. ... > Sorry for the rather long message appended to a release thread, but > as you note, perhaps the decision cannot be delayed, as the > documentation contains a recipie that we may later regret declaring > support for. My suggestion for the next steps: * "Thaw" the repository. * Push "deprecation warning" patch (Adam). * Reach a consensus about the parts of `publish_string()` that need to be settled in release 0.20 (see separate reply). Implement/revert changes. (GM) * Release 0.21rc1 (Engelbert?) @engelbert: Can we thaw for "last minute fixes". Günter |
From: Adam T. <aat...@ou...> - 2023-04-25 10:41:02
|
Dear Günter, all, >> I have tested on Ubuntu 22.04 LTS and Windows 10, and against Sphinx. >> All tests are passing, > Good news. What are the tested Python versions? > Could someone test with Python 3.11? I have tested on Python 3.7, 3.8, 3.9, 3.10, 3.11, and 3.12.0a7 -- all supported Python versions. Following [r9363], all tests pass with no warnings and with ``-W error`` enabled. ... >> Please may we release a 0.20b1 first so that I might ask downstream >> projects to test, as with the 0.19 release? > My experience from the latest pre-releases was minimal to no feedback. > However, if you have other experiences of expectations, I agree with a > pre-release. > I suggest "rc1" instead of "beta" (as I really like > 0.20 coming out soon and rather see a not too distant 0.21...). Ok -- I have no strong feelings either way -- if Engelbert is happy to release 0.20 with no 'rc1' stage then that works with me! I agree we didn't get much feedback. ... >>> The API documentation "publisher.txt" now has the example >>> output = bytes(publish_string(...)) >>> (which depends on `OutputString` features). >> Can we mark this feature as provisional? Personally, I don't think >> that we should support this form of ``bytes`` conversion long-term, >> and I see the ``OutputString`` as a transitional class, again not >> one that will be around for a long time. > OK. I'll reply to this point in my reply to your other note ... >> Sorry for the rather long message appended to a release thread, but >> as you note, perhaps the decision cannot be delayed, as the >> documentation contains a recipie that we may later regret declaring >> support for. > My suggestion for the next steps: > ... > * Push "deprecation warning" patch (Adam). Done! [r9363] Thanks, Adam |
From: Guenter M. <mi...@us...> - 2023-04-24 16:10:14
|
Dear Docutils developers, On 2023-04-23, Adam Turner wrote: ... > For me, the point of this exercise and deprecation process is to > reach an end-state where ``publish_string`` always returns ``str``. This would be a clean and simple end state. It is problematic for ``output_encoding != 'utf-8'`` and/or ``output_encoding_error_handler != 'strict'``. > To summarise the problem as I understand it: > * Some output formats may contain information about the encoding of > the document > - SGML based markup languages (XML, HTML) may contain an internal > encoding declaration. > - TeX based languages (LaTeX, XeLaTeX, etc) may contain an internal > encoding macro. > * All of these formats have default encodings > - XML defaults to a UTF-8 encoding if the encoding attribute is not > specified, since XML 1.0 (2008) > https://www.w3.org/TR/xml/#charencoding > - HTML 5 requires a UTF-8 charset > https://html.spec.whatwg.org/#charset > - LaTeX's default encoding is UTF-8, since 2018 > https://tug.org/TUGboat/tb39-1/tb121ltnews28.pdf > - XeTeX I believe has always defaulted to UTF-8. * (g|n|t)roff (used for man pages) has no default encoding and (AFAIK) no universal syntax for an encoding declaration in the source. groff has no built-in support for UTF-8. https://www.gnu.org/software/groff/manual/groff.html#Input-Encodings There is a pre-processor for UTF-8 encoded sources. https://stackoverflow.com/questions/23138930/text-codepage-in-groff https://stackoverflow.com/questions/52732988/nroff-groff-does-not-properly-convert-utf-8-encoded-file * ODT and epub are binary formats without a universal "natural" representation as `str` (the output may include bitmap graphics). * The "output_encoding" setting also decides over the use of literal characters vs. a macro representation for several non-ASCII characters in LaTeX., e.g. ``\dag{}`` for the footnote mark † (0x2020). * For XML and HTML, the "output_encoding_error_handler" setting may decide over "make or break" in case of non-encodable characters. (With "xmlcharrefreplace", unencodable characters can be used with XML/HTML output. The unencodable characters are still present in the `str` representation of the output document.) > * If a user asks for output as a Unicode ``str``, I believe it is > reasonable to assume these defaults (UTF-8 encoding). > > * If a user asks for output as a Unicode ``str``, but overrides the > ``output_encoding`` setting, I believe it is reasonable to assume > that the user is now responsible for conversion of the ``str`` to > ``bytes`` for serialisation to disk, and we should not support an > output format that does this by 'magic'. We could declare this as > unsupported behaviour as an alternative, and just issue an error. Users of applications that utilise the Docutils publisher API may be unaware of the internals (whether this application calls ``publish_string()`` or uses another part of the API). Currently an end user of such applications can customise the output encoding and error handling in a "docutils.conf" config file (unless explicitly forbidden by the application). Changing the behaviour of the "string I/O" interface should not silently start ignoring the configuration settings. Application developers should be made aware of this change before it bites their downstream users (e.g. in the docstring of the new function, the "future changes" announcement and the API docs). > * If a user asks for binary output (a ``bytes`` instance), I think it > is reasonable to use ``output_encoding`` and ``output_encoding_error_handler`` > to encode the ``str`` > instance we use internally to a ``bytes`` instance. > * We therefore need to decide the following end-state positions: > a) Do we want to support (long-term) outputting ``bytes`` from > the core publish API? I agree to not returning ``bytes`` from a "String I/O" interface. (The core publish API also provides two functions with "File I/O" and publish_parts() and publish_doctree() with alternative interfaces.) > b) Do we want to support (long-term) encodings other than UTF-8? At least for a medium-term time-frame, I'd keep support for other encodings (not necessarily for the "String I/O" interface). > * If (a) is true, we should decide if it is through a dedicated > function, or through an overloaded signature (the current status) ... or through publish_parts() (see below). > You have previously argued for keeping the "core" interface as > small as possible, and I would strongly advocate against overloaded > return types, perhaps leading to us not supporting returning > ``bytes`` from the core publish API. > This may be a reasonable position, as if a user knows that he wants > bytes output, he should set the output encoding explicitly anyway, > and therefore he has control over the encoding from ``str`` to > ``bytes`` as he can e.g. do: > .. code:: python > encoding = 'latin1' > out_str = publish_string(source, > settings_overrides={'output_encoding': encoding} > ) > assert isinstance(out_str, str) > out_bytes = out_str.encode(encoding) > > in a hypothetical future where ``publish_string`` always returns > ``str`` instances. There are problems with this approach: The "settings_override" dictionay only overrides the "Docutils defaults" with "programmatic defaults". A different value in a configuration file would still override this programmatic default. Applications can disable configuration file parsing, but not for individual settings. To keep configurability, the application would need to parse configuration settings on its own and call publish_string() with a ``settings`` object: ``publish_string(source, settings=settings, …)``. An application developer does not need to be the end user (and hence may not know the desired output encoding), e.g., a 3rd party Docutils extension application may want to provide a file I/O interface but do some post-processing on the document returned from the writer. However, see the alternative "bytes-output recipe" below. > * If (b) is false, we could simplify the I/O code a great deal. I > think it may be reasonable to expect the user to be responsible > for encoding conversions, or to move Docutils' code to handle that > away from the core and into the command-line interface, for example. At least the "File I/O" interface (which is part of the core API) should IMO, support a configurable output encoding for the next couple of versions/years. The command line interface (`core.publish_cmdline()`) is part of the core API, too. Proposal ======== Keep it simple: * replace `publish_string()` with a new function `publish_str()` that returns a `str` instance and raises an error - for binary writer output (e.g. ODT writer) - if 'output_encoding' is not in ("utf-8", "") * Accordingly replace `io.StringOutput` with a new `io.StrOutput` class. * Implement `publish_str()` and `StrOutput` in Docutils 0.21 to give them proper testing and time for implementation details to settle while getting the bugfixes out now. * Think about the future of `publish_from_doctree()`. Rationale: * The different behaviour of the new string I/O interface merits a new function name. Application developers using the string I/O API will have to change their code anyway. Applications will at some stage break with the old function name, (hopefully tested by their developers), not only with certain configuration values (which may be easily overlooked by developers). * The confusing co-existence of `publish_str()` vs. `publish_string()` is temporary and moderated by the deprecation warning that comes with the use of `publish_string()`. Steps towards 0.20 * Revert the introduction of the "OutString" class. * Revert the addition of the "auto_encode" attribute. * Add ['errors'] to the `parts provided by all writers`__. __ https://docutils.sourceforge.io/docs/api/publisher.html#parts-provided-by-all-writers * Mark `core.publish_string()` and `io.StringOutput` as deprecated. (This includes deprecation of the special pseudo-encoding value "unicode".) * Document upcoming changes - There will be a new "string I/O interface" in 0.21. - The already working and future-proof way to get `str` output is :: out_str = publish_parts(...)['whole'] assert isinstance(out_str, str) # ODT writer returns `bytes` This approach ignores the "output_encoding" and "output_encoding_error_handler" settings. - The future-proof and configuration-proof way to get `bytes` output is :: parts = publish_parts(...) out = parts['whole'] if isinstance(out, str): out_bytes = out_str.encode(parts['encoding'], parts['errors']) Alternatively, the return value of `publish_file()` (with a "dummy" file object) can be used. Would this be a sensible way forward? Günter |
From: Guenter M. <mi...@us...> - 2023-05-03 13:15:22
|
Dear Docutils developers, after the next round of "last minute fixes" I propose to freeze the repository and release the current state as next Docutils version. On 2023-04-25, Adam Turner wrote: > I have tested on Python 3.7, 3.8, 3.9, 3.10, 3.11, and 3.12.0a7 -- all > supported Python versions. Following [r9363], all tests pass with no > warnings and with ``-W error`` enabled. Excellent. Thank you. ... > Ok -- I have no strong feelings either way -- if Engelbert is happy to > release 0.20 with no 'rc1' stage then that works with me! I agree we > didn't get much feedback. I suggest either 0.20rc1 before the weekend or 0.20 after the weekend. Engelbert, could you decide (and tell us) which you prefer and can manage? > ... >>>> The API documentation "publisher.txt" now has the example >>>> output = bytes(publish_string(...)) >>>> (which depends on `OutputString` features). >>> Can we mark this feature as provisional? Personally, I don't think >>> that we should support this form of ``bytes`` conversion long-term, >>> and I see the ``OutputString`` as a transitional class, again not >>> one that will be around for a long time. >> OK. > I'll reply to this point in my reply to your other note [r9369] reverts the addition of `io.OutString` and the "auto_encode" argument for core.publish_string() and core.publish_from_docstring(). I hope this can give us a clean start for discussing a consensus on the desired end state of the "String I/O interface" and the best way to reach it. Thanks, Günter |