On Thu, Apr 22, 2010 at 6:38 AM, Přikryl Petr <prikryl@...> wrote:
> I am very new to docutils internals. I am searching
> for some tutorial, analyses, hints, etc. on using
> docutils.statemachine for building my own parser.
The only documentation for the statemachine module is in the module
itself, in the docstrings (which comprise a large chunk of the file).
Also there's a parsing overview in docutils/parsers/rst/states.py.
> I would like to rewrite our tool for generating
> documentation for our product. The tool was written
> earlier, in Python. The document target are HTML now.
> The document sources are text files with some
> HTML-like tags, with some XML-looking instructions
> (e.g. <?makeDoc img flowRT (622_mnu_mapy 75%
> 'Description of the image') ?>), with some non-HTML
> tags that are converted to span tags with classes
> (e.g. ...menu <mnu>Maps</mnu> or radio-button
> <rbtn>your choice</rbtn> plus some other extras.
Docutils' statemachine.py may not be the best way to parse your input.
A more traditional serial parser with a tokenizer, etc., may be much
better for your application. I first wrote the statemachine module to
help with complex stateful line-based parsing jobs, like "show me all
lines that start with X that follow within 5 lines of instances of Y,
but not after lines matching Z". Later I expanded it to allow for
2-dimentional parsing, where indentation was significant.
Your input text, being XML-like, may be much more amenable to
traditional parsing methods, about which there are whole textbooks
written. See, for example, "Parsing Techniques: A Practical Guide" by
Dick Grune & Ceriel Jacobs.
> Is this mailing list a good place to ask related
Sure, but again, judging by your description, I don't think using the
Docutils statemachine.py is the right approach.
David Goodger <http://python.net/~goodger>