From: David G. <dgo...@gm...> - 2010-04-22 13:42:56
|
On Thu, Apr 22, 2010 at 6:38 AM, Přikryl Petr <pr...@at...> wrote: > I am very new to docutils internals. I am searching > for some tutorial, analyses, hints, etc. on using > docutils.statemachine for building my own parser. The only documentation for the statemachine module is in the module itself, in the docstrings (which comprise a large chunk of the file). Also there's a parsing overview in docutils/parsers/rst/states.py. > I would like to rewrite our tool for generating > documentation for our product. The tool was written > earlier, in Python. The document target are HTML now. > The document sources are text files with some > HTML-like tags, with some XML-looking instructions > (e.g. <?makeDoc img flowRT (622_mnu_mapy 75% > 'Description of the image') ?>), with some non-HTML > tags that are converted to span tags with classes > (e.g. ...menu <mnu>Maps</mnu> or radio-button > <rbtn>your choice</rbtn> plus some other extras. Docutils' statemachine.py may not be the best way to parse your input. A more traditional serial parser with a tokenizer, etc., may be much better for your application. I first wrote the statemachine module to help with complex stateful line-based parsing jobs, like "show me all lines that start with X that follow within 5 lines of instances of Y, but not after lines matching Z". Later I expanded it to allow for 2-dimentional parsing, where indentation was significant. Your input text, being XML-like, may be much more amenable to traditional parsing methods, about which there are whole textbooks written. See, for example, "Parsing Techniques: A Practical Guide" by Dick Grune & Ceriel Jacobs. > Is this mailing list a good place to ask related > questions? Sure, but again, judging by your description, I don't think using the Docutils statemachine.py is the right approach. -- David Goodger <http://python.net/~goodger> |