For LSL Plus I’m using the Parsec parser combinator library, which is an LL(k) parser. It has facilities for error reporting (when a parse failure occurs, it reports the position and what tokens were expected) but not error recovery. I believe I’ve found a low-tech way to get simple error recovery implemented, but I’m curious about more general techniques. A parser combinator library which supports error recovery is desribed in this paper, which I’m on the verge of reading… but before I educate myself on formal techniques for error recovery, I’m curious about how ‘error recovery’ can be defined. I think you perhaps could define it in terms of ‘edit distance’.
The ‘edit distance’ between two strings is the number of insertions/deletions/substitutions that it would take to transform one string into another. The following is a Haskell function that computes the edit distance between two strings.
> ed :: Eq a => [a] -> [a] -> Int
> ed s [] = length s
> ed [] t = length t
> ed (s:ss) (t:ts) =
> minimum [(if s == t then 0 else 1)
> + ed ss ts,
> 1 + ed ss (t:ts),
> 1 + ed (s:ss) ts]
A parser that does error recovery would try to find a correct string (one that parses) that is the least distance from the actual string being parsed. There will be a bound on the search path, since the grammar will have shortest string that parses correctly; the edit distance must be less than or equal to the length of the input string plus the length of the shortest possible correct string. (I.e. you can always delete the entire string and insert a correct string, which is the worst case for error recovery). Searching all strings from edit distance 1 to the maximum edit distance is of course not feasible (except for very tiny strings), but perhaps there are approaches much better than brute force, which is something like O(tn), where t is the number of possible tokens.
Nevertheless, I’ll probably go with an error recovery approach using Parsec that is heavily tuned to my grammar and existing parser.