| 
      
      
      From: Arne S. <arn...@gm...> - 2025-04-05 10:20:06
       | 
| While spelunking in the code, I came across the following regex
(docutils.parsers.rst.states.Body.patterns["field_marker"]) defining
the syntax of field list names and their surrounding colons:
    :(?![: ])([^:\\]|\\.|:(?!([ `]|$)))*(?<! ):( +|$)
I wonder if the final lookbehind assertion might not be slightly
incorrect? The field name is not allowed to end with a space, which is
fine, but given the allowance for backslash escapes one would normally
expect ":foo\ :" to be a permitted (if odd) marker. However, given
that (?<! ) unconditionally looks for a single space before the
terminating colon it is in fact rejected, and backslashes have a
somewhat puzzling behaviour in field names.
I couldn't find any indications in docs or tests either way for what
is intended, but amending the assertion to be (?<![^\\] ) would yield
a more intuitive syntax I think.
 | 
| 
      
      
      From: Guenter M. <mi...@us...> - 2025-04-06 15:26:57
       | 
| Dear Arne,
thanks for your feedback.
On 2025-04-05, Arne Skjærholt wrote:
> While spelunking in the code, I came across the following regex
> (docutils.parsers.rst.states.Body.patterns["field_marker"]) defining
> the syntax of field list names and their surrounding colons:
>     :(?![: ])([^:\\]|\\.|:(?!([ `]|$)))*(?<! ):( +|$)
> I wonder if the final lookbehind assertion might not be slightly
> incorrect? The field name is not allowed to end with a space, which is
> fine, but given the allowance for backslash escapes one would normally
> expect ":foo\ :" to be a permitted (if odd) marker. 
The rST escaping rules are a bit more complex:
  * “Escaping” backslash characters are represented by NULL characters in
    the Document Tree and removed from the output document by the
    Docutils writers.
  * Escaped non-white characters are prevented from playing a role in any
    markup interpretation. The escaped character represents the character
    itself. (A literal backslash can be specified by two backslashes in a
    row – the first backslash escapes the second.)
  
  * Escaped whitespace characters are removed from the output document
    together with the escaping backslash. This allows for character-level
    inline markup.
  
    In "URI context", backslash-escaped whitespace represents a single space.
  --- https://docutils.sourceforge.io/docs/ref/rst/restructuredtext.html#escaping-mechanism
  
The third rule provides for character level inline markup, e.g.,
"re\ *Structured*\ text" or hyper\ `links`_ to a part of a word.
.. _links: example.org/word
The escape-character and space are removed by the writer, i.e. the space
char is present and interpreted as space char when parsing.
On the downside, there is no way to specify a field name with trailing
whitespace. (The same holds for inline markup: you cannot specify an
emphasised text with trailing whitespace either::
   *this\ * results in a warning,
   
whild both, ``this\*`` and ``this\ *`` results in ``this*``.)
> However, given that (?<! ) unconditionally looks for a single space
> before the terminating colon it is in fact rejected, and backslashes
> have a somewhat puzzling behaviour in field names.
> I couldn't find any indications in docs or tests either way for what
> is intended, but amending the assertion to be (?<![^\\] ) would yield
> a more intuitive syntax I think.
I agree, that the special case for ``\ `` may come as a surprise
(especially with the further special handling of ``\ `` in URIs).
However, changing this would break long existing documented behaviour.
Are there more "surprises" with backslashes in field names that differ
from backslash handling in inline markup or normal text?
As a side note: the following example works ::
  What is `this\ `_?
  .. _this: http://example.org
while I would expect the same warnings as for the variant without backslash:
  What is `this `_?
  .. _this: http://example.org
Günter
 | 
| 
      
      
      From: Karl O. P. <ko...@ka...> - 2025-04-06 19:04:45
       | 
| On Sun, 6 Apr 2025 15:26:24 -0000 (UTC) Guenter Milde via Docutils-develop <doc...@li...> wrote: > The rST escaping rules are a bit more complex: > > * “Escaping” backslash characters are represented by NULL > characters in the Document Tree and removed from the output document > by the Docutils writers. > > * Escaped non-white characters are prevented from playing a role in > any markup interpretation. The escaped character represents the > character itself. (A literal backslash can be specified by two > backslashes in a row – the first backslash escapes the second.) > > * Escaped whitespace characters are removed from the output document > together with the escaping backslash. This allows for > character-level inline markup. > > In "URI context", backslash-escaped whitespace represents a > single space. > > --- > https://docutils.sourceforge.io/docs/ref/rst/restructuredtext.html#escaping-mechanism > The third rule provides for character level inline markup, e.g., > "re\ *Structured*\ text" or hyper\ `links`_ to a part of a word. > > .. _links: example.org/word > > The escape-character and space are removed by the writer, i.e. the > space char is present and interpreted as space char when parsing. FYI, a more universal example is adding links to footnotes at the end of a sentence, without an extra space between the period and the footnote number/link. A footnoted sentence.\ [#f1]_ .. rubric:: Footnotes .. [#f1] Text of the footnote. Then you get: A footnoted sentence.1 Where the "1" is superscripted and hyperlinked. (If you're looking for an example to include in the docs.) Regards, Karl <ko...@ka...> Free Software: "You don't pay back, you pay forward." -- Robert A. Heinlein | 
| 
      
      
      From: Guenter M. <mi...@us...> - 2025-04-06 15:33:05
       | 
| On 2025-04-06, Guenter Milde via Docutils-develop wrote:
> On the downside, there is no way to specify a field name with trailing
> whitespace.
I found a way to circumvent this restriction: Inside field names, inline
syntax is recognized, so you can use a substitution::
  .. |space| unicode:: 32
     :trim:
  :strange field name |space|: and field
Günter
 | 
| 
      
      
      From: Guenter M. <mi...@us...> - 2025-04-07 20:39:01
       | 
| On 2025-04-06, Karl O. Pinc wrote: ... >> * Escaped whitespace characters are removed from the output document >> together with the escaping backslash. This allows for >> character-level inline markup. ... > FYI, a more universal example > is adding links to footnotes at the end of a sentence, > without an extra space between the period and the footnote > number/link. > A footnoted sentence.\ [#f1]_ > .. rubric:: Footnotes > .. [#f1] Text of the footnote. Yes, this is another use case. Thank you for reminding. Personally, I rely on the `trim_footnote_reference_space`__ configuration setting. It's default depends on the `footnote_reference`__ style: The footnote space is trimmed if the reference style is "superscript", and it is left if the reference style is "brackets". The advantage is that this way, there is no need to change the source when switching the footnote reference style between "superscript" and "brackets". Regards, Günter __ https://docutils.sourceforge.io/docs/user/config.html#trim-footnote-reference-space __ https://docutils.sourceforge.io/docs/user/config.html#footnote-references |