|
From: Guenter M. <mi...@us...> - 2025-04-06 15:26:57
|
Dear Arne,
thanks for your feedback.
On 2025-04-05, Arne Skjærholt wrote:
> While spelunking in the code, I came across the following regex
> (docutils.parsers.rst.states.Body.patterns["field_marker"]) defining
> the syntax of field list names and their surrounding colons:
> :(?![: ])([^:\\]|\\.|:(?!([ `]|$)))*(?<! ):( +|$)
> I wonder if the final lookbehind assertion might not be slightly
> incorrect? The field name is not allowed to end with a space, which is
> fine, but given the allowance for backslash escapes one would normally
> expect ":foo\ :" to be a permitted (if odd) marker.
The rST escaping rules are a bit more complex:
* “Escaping” backslash characters are represented by NULL characters in
the Document Tree and removed from the output document by the
Docutils writers.
* Escaped non-white characters are prevented from playing a role in any
markup interpretation. The escaped character represents the character
itself. (A literal backslash can be specified by two backslashes in a
row – the first backslash escapes the second.)
* Escaped whitespace characters are removed from the output document
together with the escaping backslash. This allows for character-level
inline markup.
In "URI context", backslash-escaped whitespace represents a single space.
--- https://docutils.sourceforge.io/docs/ref/rst/restructuredtext.html#escaping-mechanism
The third rule provides for character level inline markup, e.g.,
"re\ *Structured*\ text" or hyper\ `links`_ to a part of a word.
.. _links: example.org/word
The escape-character and space are removed by the writer, i.e. the space
char is present and interpreted as space char when parsing.
On the downside, there is no way to specify a field name with trailing
whitespace. (The same holds for inline markup: you cannot specify an
emphasised text with trailing whitespace either::
*this\ * results in a warning,
whild both, ``this\*`` and ``this\ *`` results in ``this*``.)
> However, given that (?<! ) unconditionally looks for a single space
> before the terminating colon it is in fact rejected, and backslashes
> have a somewhat puzzling behaviour in field names.
> I couldn't find any indications in docs or tests either way for what
> is intended, but amending the assertion to be (?<![^\\] ) would yield
> a more intuitive syntax I think.
I agree, that the special case for ``\ `` may come as a surprise
(especially with the further special handling of ``\ `` in URIs).
However, changing this would break long existing documented behaviour.
Are there more "surprises" with backslashes in field names that differ
from backslash handling in inline markup or normal text?
As a side note: the following example works ::
What is `this\ `_?
.. _this: http://example.org
while I would expect the same warnings as for the variant without backslash:
What is `this `_?
.. _this: http://example.org
Günter
|