|
From: David G. <go...@us...> - 2002-07-18 03:28:15
|
fantasai wrote: > Why are the 'blank' and 'indented' transitions called implicitly, > instead of listed in the transition list? "blank" and "indented" are implicit transitions in "StateMachineWS", which is a whitespace-specialized subclass of "StateMachine". The "WS" stands for "whitespace". The whitespace transitions are very common and implemented implicitly as a convenience; the default behavior is built-in to the "StateMachineWS" and "StateWS" classes so they don't have to be reinvented every time they're used for certain types of parsing. The statemachine.py module is intended for general use; I use it at work in many small parsing projects, and have reused the "WS" subclasses. > Why does SpecializedBody "pass" the definitions of transition > methods instead of creating a specialized transition list? First, re-read the module docstring of docutils/parsers/rst/states.py to get an overview of how the parser works. SpecializedBody subclasses need to recognize all the constructs recognized by Body, but their transition methods are all redefined as "invalid_input". In subclasses, only methods for the specific transitions of interest are enabled. This allows nested parse sessions to terminate when the compound element (list or list-like construct) is exhausted. The reStructuredText parser is recursive, paralleling the document tree produced; when a nested parse finishes, the outer state machine takes over parsing. SpecializedBody is a "Superclass for second and subsequent compound element members." For example, once an initial bullet list item, say, is recognized, the `BulletList` subclass takes over, with a "bullet_list" node as its container. Upon encountering the initial bullet list item, `Body.bullet` calls its ``self.nested_list_parse`` (`RSTState.nested_list_parse`), which starts up a nested parsing session with `BulletList` as the initial state. Only the ``bullet`` transition method is enabled in `BulletList`; as long as only bullet list items are encountered, they are parsed and inserted into the container. The first construct which is *not* a bullet list item triggers the `invalid_input` method, which ends the nested parse and closes the container. `BulletList` needs to recognize input that is invalid in the context of a bullet list, which means everything *other than* bullet list items, so it inherits the transition list created in `Body`. -- David Goodger <go...@us...> Open-source projects: - Python Docutils: http://docutils.sourceforge.net/ (includes reStructuredText: http://docutils.sf.net/rst.html) - The Go Tools Project: http://gotools.sourceforge.net/ |