Menu

#511 Problems with nested parsing and sections.

pending-remind
nobody
None
5
2025-09-13
2025-08-26
No

In Docutils <= 0.22, parsing a nested content block with docutils.parsers.rst.state.RSTState.nested_parse() uses the document-wide "title style hierarchy".

With Docutils <=0.21, this method can loose complete sections without warning when the match_titles argument is True and the nested content block contains sections with a title style that matches a lower section level than the current section level (try the attached test with a Docutils version below 0.22).

Docutils itself does not use nested_parse() with match_titles True and did not test.
However, the feature is used by Sphinx and several Sphinx extensions (cf. [bugs:#508], [bugs:#509], and https://github.com/sphinx-doc/sphinx/issues/13845).

The new section parsing algorithm introduced in Docutils 0.22 fixes the data loss, but sections with a title style that matches a lower section level than the current section level are attached in wrong order: After the nested parsing is complete, the calling parser continues where it left of, messing the order of elements in the doctree (try the attached test with [r10204]).

This was fixed in [r10206] at the expense of dropping support for document-wide title styles in nested parsing.
As a result, the Sphinx "only" directive (which uses a new section style hierarchy with later re-attachment of the first section) now fails: https://github.com/sphinx-doc/sphinx/issues/13861

1 Attachments

Related

Bugs: #508
Bugs: #509
Commit: [r10204]
Commit: [r10206]

Discussion

  • Günter Milde

    Günter Milde - 2025-08-26

    Element after a section from nested parsing may be invalid.

    parsers.rst.RSTSTate.nested_parse() with match_titles=True (i.e. support for sections) leads to an invalid document tree, if the nested block contains a section but the element following the nested block is not a section.

    The structure model allows only a <section> as sibling after a <section>.

    An invalid doctree can be prevented if the following content is appended to the last nested section instead of its parent. The "nested" directive attempts this but fails if it is called from another nested state machine: In a nested state_machine we cannot access/change the node attribute ("insertion point") of the calling state_machine (cf. [r10222]).
    A fix using a new attribute to store the parent state machine of nested state machines is attached.

     

    Related

    Commit: [r10222]


    Last edit: Günter Milde 2025-09-13
    • Günter Milde

      Günter Milde - 2025-09-08

      The fix is implemented in [r10223].

       

      Related

      Commit: [r10223]

  • Günter Milde

    Günter Milde - 2025-08-26

    An idea how to fix Sphinx's "only" directive with a new internal attribute "title_style" for the nodes.section element.

     
  • Günter Milde

    Günter Milde - 2025-08-29

    An alternative idea: nested parse uses the document-wide title style hierarchy if the "node" argument is left at its default value. The result of the nested parsing is directly added to the document (at the "current" node). Sections are appended according to their level.

     
    • engelbert gruber

      assuming the included document. is complete, has a consistent title
      hierarchy
      means first title-style is top, next is 2nd asf

      standard use case is to include the document at a position where
      it's top level is one below the current in the including document

      e.g.

        section l1
        ==========
      
        section l2
        ----------
      
        .. included doc
      
          section l3
          ==========
      
          section l4
          ----------
      
        .. including doc
      
        section l2
        ----------
      

      is there a use case for including and setting a different level, absolute
      or relative ?

       

      Last edit: Günter Milde 2025-09-02
      • Günter Milde

        Günter Milde - 2025-09-02

        Consider the use cases:

        a) A main document includes rST blocks from various different sources (other projects documentation, docstrings, ...).

        We cannot guarantee a consistent title style hierarchy across all inclusions and want to use a separate title style hierarchy in the included blocks.

        b) A main document includes chapters from other source files of the same project after an introductory section. The project uses consistent title styles in all files.

        We want a document-wide title style hierarchy, so that the included files start a new top level section.

        c) A directive expects an rST content block (similar to admonition directives but with support for section titles).

        We want a document-wide title style hierarchy to prevent surprising results.

         
  • Günter Milde

    Günter Milde - 2025-09-08

    Commit [r10226] fixes the regeression in Sphinx.
    In order to correctly support sections in nested parsing, it reverts to using memo.section_level to keep record of the current section level.
    This is cumbersome and error prone because it needs to be updated with every switch
    of the current node.

    The attached patch implements an alternative:
    Store the difference between the intended start level of nested parsing and the
    number of parents of the base node in the new attribute section_level_offset.
    Use it to correct the section level determined via node.section_hierarchy().

     

    Related

    Commit: [r10226]

    • Günter Milde

      Günter Milde - 2025-09-13

      Applied in [r10229].

       

      Related

      Commit: [r10229]

  • Günter Milde

    Günter Milde - 2025-09-13

    The remaining issue is a way to tell RSTState.nested_parse() that it shall use a new, separate title style hierarchy for section headings (similar to Sphinx nested_parse_to_nodes()).

     

Log in to post a comment.

MongoDB Logo MongoDB