Menu

#75 Long text RFC role with section not parsed correctly

Default
closed-works-for-me
nobody
rfc (1) roles (1)
5
2021-04-03
2020-11-24
No

Hello. I've found problems parsing the rfc role "RFC 2045 section 6.8 <2045#section-6.8>". Seems to be an incomplete implementation in docutils.parsers.rst.roles.rfc_reference_role that is not handling long roles text with RFCs which include sections. This RFC appearance is included at xmlrpc.client module of official Python documentation.

Possible workaround

I've solved locally writing this in the code inside the try statement:

if "#" in text:
    rfcnum, section = utils.unescape(text).split("#", 1)
     if "<" in rfcnum:
         rfcnum = rfcnum.split("<")[1]
     if ">" in section:
          section = section.strip(">")
else:
    rfcnum, section  = utils.unescape(text), None
rfcnum = int(rfcnum)
if rfcnum < 1:
    raise ValueError

I would upload the patch myself, but I don't understand how to use subversion or post to sourceforge.

Minimal reproducible example

import docutils.parsers.rst
import docutils.frontend
from docutils.utils import new_document


components = (docutils.parsers.rst.Parser,)

parser = docutils.parsers.rst.Parser()
settings = docutils.frontend.OptionParser(
    components=components
).get_default_values()
document = new_document("<rst-doc>", settings=settings)
parser.parse(":rfc:`RFC 2045 section 6.8 <2045#section-6.8>`, :rfc:`2045#section-6.8`",
             document)

Discussion

  • Günter Milde

    Günter Milde - 2020-11-25

    Thank you for reporting a problem with Docutils.
    Checking the source of xmlrpc.client module documentation at https://docs.python.org/3.0/_sources/library/xmlrpc.client.txt,
    I found the rst::

      The encoded data will have newlines every 76 characters as per
      `RFC 2045 section 6.8 <http://tools.ietf.org/html/rfc2045#section-6.8>`_,
       which was the de facto standard base64 specification when the
       XML-RPC spec was written.
    

    which does not use the RFC role and is parsed correctly by Docutils.
    The :rfc-reference: role accepts a section part after the RFC number (which is added to the URL but not displayed in HTML) https://docutils.sourceforge.io/docs/ref/rst/roles.html#rfc-reference. It does, however, not implement the embedded URIs and Aliases__ syntax for standard hyperlink references (as used in the official Python documentation source for xmlrpc.client).
    __ https://docutils.sourceforge.io/docs/ref/rst/restructuredtext.html#embedded-uris-and-aliases

    In the example cases given, I propose to keep using the hyperlink reference syntax or something like

    ... as per :rfc:`2045#section-6.8`, which was ...
    

    or

    ... as per :rfc:`2045#section-6.8` section 6.8, which was ...
    

    for the given example (and all more complex references to RFC documents). The :rfc-reference: role is intended as a short alternative for simple use cases and should be kept simple.

     

    Last edit: Günter Milde 2020-12-07
  • Álvaro Mondéjar Rubio

    The documentation of Python uses rfc role at version 3.7 as you can see at https://docs.python.org/3.7/_sources/library/xmlrpc.client.rst.txt. Was changed in pull request 7103 and you can see the change here

    So maybe you are using an old source. Please, check it.

     

    Last edit: Álvaro Mondéjar Rubio 2020-11-26
    • Günter Milde

      Günter Milde - 2020-12-07

      Thank you for the link to the change set. (The link in the original description does not lead to the source and the relevant HTML part generated from the version I found is identical to the one in the linked HTML doc as the :rfc: roles is essentially syntactical sugar providing a shortcut in the rST source for simple cases.
      The change set introduced a bug in the xmlrpc.client documentation by changing a valid link with embedded URI into unsupported rST syntax. This problem can easily be fixed by either reverting to the previous link syntax or by the alternatives given in my first response.
      It does, IMV, not give compelling evidence to the complication of the rST specification and Docutils implementation that comes with this proposal.

       
      • Álvaro Mondéjar Rubio

        I'm not sure of what this means. Contains :rfc:`2045#section-6.8` an invalid role link or not?

         

        Last edit: Álvaro Mondéjar Rubio 2020-12-07
        • Günter Milde

          Günter Milde - 2020-12-17

          Contains :rfc:2045#section-6.8 an invalid role link or not?

          Since the implementation of https://sourceforge.net/p/docutils/feature-requests/63/ in r8254, this is valid. It results in the exact the same HTML as writing

          `RFC 2045 <http://tools.ietf.org/html/rfc2045#section-6.8>`_
          

          However, the < >syntax for Embedded URIs and Aliases
          (https://docutils.sourceforge.io/docs/ref/rst/restructuredtext.html#embedded-uris-and-aliases) is still invalid in "RFC" roles.

          The recommendation is to use the "RFC" role only for simple cases.

           
  • Emmanuel Arias

    Emmanuel Arias - 2020-11-29

    Hi Alvaro, I think you can simplify the logic added if you use regex. I can see that is very used on the project.

    Would be great if the proposal change can be applied.

     
  • Günter Milde

    Günter Milde - 2020-12-07

    Ticket moved from /p/docutils/bugs/409/

    Can't be converted:

    • _milestone:
     
  • Günter Milde

    Günter Milde - 2021-04-03
    • status: open --> closed-works-for-me
    • Group: --> Default
     

Log in to post a comment.