Re: [Docutils-develop] Recent commit activity

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Dear Günter,

>> The overall aim with the test changes is to be able to use the standard
>> library's ``unittest`` runner, which will alleviate some of the points
>> you raise.

> I understand and support the overall aim to use standard tools for the tests.
> I would have preferred a "feature branch" (as mandated by the "policies")
> for this change set, though. (I agree that with git this would have been
> easier and have to apologise for the delay of my respective reviews.)

I'll bear in mind for future projects, thank you.

> Generally, if a file is created or maintained by an active member of the
> Docutils developer team, I'd prefer an "ask before change" (except for
> emergencies and small typos).

Sounds good!

>> r9140, r9141: Import ``DocutilsTestSupport`` from ``test``

> When editing a test script, I regularly run it from my editor (it is
> just one click). This is a feature I don't want to miss, the actual
> implementation is less important ...

OK, I will look to ensure this is still possible.

>>> r9167
> Introduces a backwards incompatible API change! (see below)

Partially reverted in [r9202] (see below)

>>>   IMV, an encoded string ("bytes" object) is no "binary" output.

>> Python 3 bytes objects are sequences of 8-bit bytes, so I would say
>> that it is binary output, but perhaps we could improve here.

> The Python 3.10 documentation for `bytes` calls it a "binary sequence
> type", so I may have to adapt.

> The naming of the `core.publish_string()` API function pre-dates Python3
> and uses the term "string" as a superordinate for "Unicode strings"
> (`str`/`unicode`) and "encoded strings" (`bytes`/`str`).
> The documentation is explicit about the return value
> (defaulting to 'utf-8' encoded `bytes` instances).

> The function is part of the "first level API": intended for use by
> 3rd-party code. Changing the return value has the potential of many
> breaks. 
> Please revert the changes to core.py and io.py.

> Unicode `str` output can be achieved passing
> ``settings_overwrites={'output_encoding': 'unicode'}`` to
> publish_string().

> (In the long run, I'd like to replace the "homegrown" docutils.io classes
> with standard objects, but this is a separate topic.)

I have reverted the change (thank you) but introduced a deprecation for returning non-(unicode) strings from "publish_string". I feel this is a reasonable step towards having a more intuitive API.

Naming of "publish_bytes" and "BytesOutput" I am not that set on, we can change to "BinaryXXX" if preferred.

>>> r9168 Use ``.rstrip()`` instead

> This depends on the context: over-specified tests contribute to noise due to
> false positives. Test results should not depend on the "native" line endings
> of the system running the tests.

Python converts all line-endings to "\n" when opening a file in text-mode (or in string literals), so this should not be an issue. [1] [2]

[1]: https://docs.python.org/3/tutorial/inputoutput.html#reading-and-writing-files
[2]: https://peps.python.org/pep-0278/

>>> r9178
> 8 lines of doctest code seems overly verbose in the context of an
> "overview of the Docutils architecture".

I tend to agree, "core.publish_string" would be nicer, but there is no way in Docutils to disable applying the default transforms.

> It can still be useful for development: testing the reST parser (including
> the option to generate input/output pairs for parser test scripts).
> I'd move it to tools/dev and remove the EasyDialogs part.

Done in [r9201]

Thanks,
Adam