Re: [Docutils-develop] C reStructuredText parser

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

On Fri, May 18, 2012 at 6:30 PM, Marvin Humphrey <ma...@re...> wrote:
> But one of the questions I'd have here is whether it makes more sense to
> approach parsing RST from the top down (LL, recursive descent) or from the
> bottom up (LR, LALR).  If the reference parser for RST uses something like an
> extremely complex regular expression, maybe it makes sense to hand-code a
> recursive descent parser?
>
>    http://en.wikipedia.org/wiki/Recursive_descent_parser#C_implementation

I've forgotten most of what I ever knew about formal parsing, and I
didn't approach writing the reST parser in a formal way. The reST
parser grew from a finite state machine that I wrote to filter log
files in a complex way (e.g. "give me lines that begin with X within
10 lines of lines that contain Y but not after lines that contain Z").

The module docstring of docutils.parsers.rst.states contains a "Parser Overview"
(http://docutils.sourceforge.net/docutils/parsers/rst/states.py). It
begins, "The reStructuredText parser is implemented as a recursive
state machine, examining its input one line at a time."

> It might be worthwhile to collect some opinions on stackoverflow.com.  And
> maybe it's time I bought the Dragon book and read the chapter on parsing. :)

You can't go wrong reading the Dragon book. And if you do post on
stackoverflow, please provide a link here.

-- 
David Goodger <http://python.net/~goodger>