From: john s. <sk...@us...> - 2012-08-23 00:33:46
|
I have succeeded in de-coupling the first part of the parser engine Felix uses into a universal, stand-alone, dynamically loading parser. The model is basically like this: Dypgen's library and Ocs Scheme are combined with a fixed bootstrap grammar, which allows a suitable lexbuf to generate grammar specifications from a user EBNF grammar. By preserving the result the parser can be called again to translate a program written in this grammar. Alternatively the new grammar can be enabled on the fly and translation occur immediately. A special data structure called "sex" is provided as well with some functions to translate from Ocs-scheme s-expressions into Sex s-expressions, which are easier to pattern match. At this point a function to translate to text and parse back from sex format is provided. An XML driver could be written as well (though isn't available at present). With this library linked into your Ocaml program you do not need to invoke the Dypgen tool. The grammar processing is done dynamically. The bootstrap contains facilities to load and store both the parser automaton and the result of parsing a target file on disk. In practice this means the automaton is only built once, and a target file only needs to be re-parsed when it is changed. Facilities to automatically manage this caching and drive the parsing library from files are available in Felix and have not yet been de-coupled. The easiest way to get hold of this technology is to download and build Felix, then just copy the relevant files out of the source and build. http://felix-lang.org/download.html The primary constraint on the core parsing systems are: (a) you must use the provided source reference data type. It provides filename, start and end line and column. Of course you can translate this as desired. (b) You are currently stuck with the hard coded bootstrap language which is an EBNF like language with two statements: syntax name { grammar here } .. open syntax name; At present C and C++ style comments can be used. C comments can be nested. There is no facility for defining comment style at the moment (I plan to fix that). (c) Your language must support top level statements and expressions. The statements are needed to parse and embed syntax extensions. Any other non-terminals can be introduced when defining new productions for one of these. (d) C style #include is not supported. You must pre-process files which include other files in a way that will change parsing. This is a difficulty with Dypgen itself: there is no way to maintain a stack of lexbufs, and it is impossible in practice to recursively call a dypgen parser from within Dypgen itself. In any *sane* language -- and this clearly excludes C and C++ -- parsing should be invariant up to inline grammar modifications or packaged grammars. In C and C++ maintenance of a symbol table is required to recognise type names. This makes the parsing of a file dependent on foreign files (#include files). [Dypgen and the Felix parser of course allow maintenance of such symbol tables although some care is needed in case a modification is made by a proposed production is given up]. -- john skaller sk...@us... http://felix-lang.org |