fantasai wrote:
> Why are the 'blank' and 'indented' transitions called implicitly,
> instead of listed in the transition list?
"blank" and "indented" are implicit transitions in "StateMachineWS",
which is a whitespace-specialized subclass of "StateMachine". The
"WS" stands for "whitespace". The whitespace transitions are very
common and implemented implicitly as a convenience; the default
behavior is built-in to the "StateMachineWS" and "StateWS" classes so
they don't have to be reinvented every time they're used for certain
types of parsing. The statemachine.py module is intended for general
use; I use it at work in many small parsing projects, and have reused
the "WS" subclasses.
> Why does SpecializedBody "pass" the definitions of transition
> methods instead of creating a specialized transition list?
First, re-read the module docstring of docutils/parsers/rst/states.py
to get an overview of how the parser works.
SpecializedBody subclasses need to recognize all the constructs
recognized by Body, but their transition methods are all redefined as
"invalid_input". In subclasses, only methods for the specific
transitions of interest are enabled. This allows nested parse
sessions to terminate when the compound element (list or list-like
construct) is exhausted. The reStructuredText parser is recursive,
paralleling the document tree produced; when a nested parse finishes,
the outer state machine takes over parsing.
SpecializedBody is a "Superclass for second and subsequent compound
element members." For example, once an initial bullet list item, say,
is recognized, the `BulletList` subclass takes over, with a
"bullet_list" node as its container. Upon encountering the initial
bullet list item, `Body.bullet` calls its ``self.nested_list_parse``
(`RSTState.nested_list_parse`), which starts up a nested parsing
session with `BulletList` as the initial state. Only the ``bullet``
transition method is enabled in `BulletList`; as long as only bullet
list items are encountered, they are parsed and inserted into the
container. The first construct which is *not* a bullet list item
triggers the `invalid_input` method, which ends the nested parse and
closes the container. `BulletList` needs to recognize input that is
invalid in the context of a bullet list, which means everything *other
than* bullet list items, so it inherits the transition list created in
`Body`.
--
David Goodger <go...@us...> Open-source projects:
- Python Docutils: http://docutils.sourceforge.net/
(includes reStructuredText: http://docutils.sf.net/rst.html)
- The Go Tools Project: http://gotools.sourceforge.net/
|