On Sat, 2004-09-11 at 20:10, Nicolas Cannasse wrote:
> > (* extend the regexp type -- polymorpic variants are wonderful! *)
> That's true :)
> I got live demo from Jacques Garrigue showin variant reusability, that =
> great and this time I asked him if nobody thought of writing an extensi=
> parser based on polymorphic variants extensibility. Looks like you're d=
> this with regexp/lexer,
Trying to :)
> are you planing to do this with parser as well ?
> This could lead to an extensible grammar =C3=A0 la Camlp4.
Yes, however I thought I'd try the simpler case first.
The engine for a parser is more complex, and worse
there are many kinds of parser -- LALR, LL(1), LL(k),
Earley -- ....
> However since my ocaml bootstrapped language will not get polymorphic
> variants I can't use your implementation of regexp...
Well, at this stage you can regard it as an experiment.
Once a nice structure is discovered you can probably
'freeze' part of the typing, and use ordinary variants
instead. The remaining part you can either
(a) cheat somehow
(b) generate code dynamically
(c) use a different encoding -- for example you could
use strings, use a mini-parser and a kind of interpreter
to build a clunky but still flexible data structure
[EG XML is very clunky but you can still do lots with it]
I mention this because Felix generates C++ so I have
a far worse bootstrapping problem .. I have to implement
everything twice, once in Ocaml and then again in C++.
Clearly I'll use Felix code to generate the C++,
and it looks 'similar' to Ocaml in some ways, but still
it will be hard to maintain two compiler implementations.
One way I am trying to get around that is to make
the language so powerful, I can parameterise enough
things such as the grammar, that I only need to implement
the same parser engine for both languages -- then
grammar changes are reflected into both compilers.
For regexps it is easy -- if I can generate an interpretable
transition matrix for Ocaml .. it is easy enough to=20
print the matrix as C code and write C to do the equivalent.
The actual compiler engine (dfa.ml) is simple enough
to recode in C/C++ or Felix.
Obviously it get harder for a parser .. the term rewriting
in the optimiser will be laborious .. and the lookup=20
algorithm scares the **** out of me :) [That's my
most complex module]
Anyhow part of this process requires discovering the
maximally simple and expressive abstractions:
that's more a math problem with an Ocaml test model.
If I get nice terse powerful Ocaml representation,
it seems like the hard work -- using ordinary
variants for you, or a Felix or even C encoding
for me -- is more likely to actually succeed.
A side effect will be a nice set of Ocaml libraries.
It actually seems more sensible to work in the
abstract and ignore my specific needs -- quite apart
from that attracting other people. And also I=20
can't use either Str or PCRE (I refuse to use Str
because it isn't reentrant, and I can't use PCRE
because it depends on a non-standard C library)
Anyhow I think Ocaml could really use a nice set
of lexer/parser libs. I have James Woodyatts combinator
library to examine as well.
John Skaller, mailto:skaller@...
snail: PO BOX 401 Glebe NSW 2037 Australia
Checkout the Felix programming language http://felix.sf.net