From: skaller <sk...@us...> - 2004-09-11 08:16:21
|
Consider this: type ('x, 'a) regexp_t' = [ | `REGEXP_char of int | `REGEXP_seq of 'x * 'x (** concatenation *) | `REGEXP_alt of 'x * 'x (** alternation *) | `REGEXP_aster of 'x (** Kleene closure *) | `REGEXP_epsilon (** epsilon: null string *) | `REGEXP_code of 'a (** associated code *) ] type 'a regexp_t = ('x, 'a) regexp_t' as 'x (* extend the regexp type -- polymorpic variants are wonderful! *) type 'a gre_t = [ | ('b,'a) regexp_t' | `REGEXP_group of 'b ] as 'b ;; type 'b gcode = [`User of 'b | `Left of int | `Right of int ] ;; let groupify (re: 'a gre_t) : 'a gcode regexp_t = let counter = ref 0 in let rec aux (re: 'a gre_t) : 'a gcode regexp_t = match re with | `REGEXP_group a -> let n = !counter in incr counter; let a = aux a in seq (seq (code (`Left n)) a) (code (`Right n)) | `REGEXP_code a -> `REGEXP_code (`User a) | `REGEXP_alt (a,b) -> let a = aux a in let b = aux b in `REGEXP_alt (a,b) | `REGEXP_seq (a,b) -> let a = aux a in let b = aux b in `REGEXP_alt (a,b) | `REGEXP_aster a -> `REGEXP_aster (aux a) | `REGEXP_epsilon -> `REGEXP_epsilon | `REGEXP_char c -> `REGEXP_char c in aux re What this does is extend the regular expression type to allow a group combinator, but then reduce it by using left/right bracket codes. The intent of this demo is to show how the REGEXP_code thing can be used to *implement* groups without having to make them part of the regular expression compiler engine -- the client can use the group enhanced combinators, reduce it to the standard form with 'groupify', ensuring all brackets are balanced and systematically numbered, and then execute the group enhanced matching automaton with the compiled regexp. I would actually propose not to use this particular form except for compatibility. Named groups are MUCH nicer! In addition, groups could be used to build lists, instead of just the last substring matched: even better, since substring lists can nest in others, we could even build a crude parse tree. BTW: Left n just stores the current position in the input into the 'left' component of the n'th slot of an array. Right n just stores into the 'right' slot. At the end, the string in the range 'left' upto 'right' of array index position k is the k'th group. This only works matching the whole string, for searching and lexing we need save the array for each terminal state so it can be restored on a backtrack (matching never backtracks). I think a similar trick can allow context sensitive searching, and start/end of data detection (and perhaps some other interesting things :) -- John Skaller, mailto:sk...@us... voice: 061-2-9660-0850, snail: PO BOX 401 Glebe NSW 2037 Australia Checkout the Felix programming language http://felix.sf.net |
From: Nicolas C. <war...@fr...> - 2004-09-11 10:09:24
|
> Consider this: > > type ('x, 'a) regexp_t' =3D > [ > | `REGEXP_char of int > | `REGEXP_seq of 'x * 'x (** concatenation *) > | `REGEXP_alt of 'x * 'x (** alternation *) > | `REGEXP_aster of 'x (** Kleene closure *) > | `REGEXP_epsilon (** epsilon: null string *) > | `REGEXP_code of 'a (** associated code *) > ] > > type 'a regexp_t =3D ('x, 'a) regexp_t' as 'x > > > (* extend the regexp type -- polymorpic variants are wonderful! *) That's true :) I got live demo from Jacques Garrigue showin variant reusability, that wa= s great and this time I asked him if nobody thought of writing an extensibl= e parser based on polymorphic variants extensibility. Looks like you're doi= ng this with regexp/lexer, are you planing to do this with parser as well ? This could lead to an extensible grammar =E0 la Camlp4. However since my ocaml bootstrapped language will not get polymorphic variants I can't use your implementation of regexp... Nicolas |
From: skaller <sk...@us...> - 2004-09-11 11:10:15
|
On Sat, 2004-09-11 at 20:10, Nicolas Cannasse wrote: > > (* extend the regexp type -- polymorpic variants are wonderful! *) >=20 > That's true :) > I got live demo from Jacques Garrigue showin variant reusability, that = was > great and this time I asked him if nobody thought of writing an extensi= ble > parser based on polymorphic variants extensibility. Looks like you're d= oing > this with regexp/lexer, Trying to :) > are you planing to do this with parser as well ? > This could lead to an extensible grammar =C3=A0 la Camlp4. Yes, however I thought I'd try the simpler case first. The engine for a parser is more complex, and worse there are many kinds of parser -- LALR, LL(1), LL(k), Earley -- .... >=20 > However since my ocaml bootstrapped language will not get polymorphic > variants I can't use your implementation of regexp... Well, at this stage you can regard it as an experiment. Once a nice structure is discovered you can probably 'freeze' part of the typing, and use ordinary variants instead. The remaining part you can either (a) cheat somehow (b) generate code dynamically (c) use a different encoding -- for example you could use strings, use a mini-parser and a kind of interpreter to build a clunky but still flexible data structure [EG XML is very clunky but you can still do lots with it] I mention this because Felix generates C++ so I have a far worse bootstrapping problem .. I have to implement everything twice, once in Ocaml and then again in C++. Clearly I'll use Felix code to generate the C++, and it looks 'similar' to Ocaml in some ways, but still it will be hard to maintain two compiler implementations. One way I am trying to get around that is to make the language so powerful, I can parameterise enough things such as the grammar, that I only need to implement the same parser engine for both languages -- then grammar changes are reflected into both compilers. For regexps it is easy -- if I can generate an interpretable transition matrix for Ocaml .. it is easy enough to=20 print the matrix as C code and write C to do the equivalent. The actual compiler engine (dfa.ml) is simple enough to recode in C/C++ or Felix. Obviously it get harder for a parser .. the term rewriting in the optimiser will be laborious .. and the lookup=20 algorithm scares the **** out of me :) [That's my most complex module] Anyhow part of this process requires discovering the maximally simple and expressive abstractions: that's more a math problem with an Ocaml test model. If I get nice terse powerful Ocaml representation, it seems like the hard work -- using ordinary variants for you, or a Felix or even C encoding for me -- is more likely to actually succeed. A side effect will be a nice set of Ocaml libraries. It actually seems more sensible to work in the abstract and ignore my specific needs -- quite apart from that attracting other people. And also I=20 can't use either Str or PCRE (I refuse to use Str because it isn't reentrant, and I can't use PCRE because it depends on a non-standard C library) Anyhow I think Ocaml could really use a nice set of lexer/parser libs. I have James Woodyatts combinator library to examine as well. --=20 John Skaller, mailto:sk...@us... voice: 061-2-9660-0850,=20 snail: PO BOX 401 Glebe NSW 2037 Australia Checkout the Felix programming language http://felix.sf.net |
From: Bardur A. <oca...@sc...> - 2004-09-11 11:30:53
|
On Sat, Sep 11, 2004 at 09:10:05PM +1000, skaller wrote: [--snip--] > Anyhow I think Ocaml could really use a nice set > of lexer/parser libs. Definitely. I've been looking towards implementing a GLR parser (generator) myself, but unfortunately just haven't had the time to take a serious stab at it. It just seems that lots of small utilities I've written require simple (but "non-standard" parsers), almost all of them handwritten up until now because anything but GLR parsing just seems like premature -- and for my purposes unnecessary -- optimization, and almost always ends up being more work when implemented using ocamlyacc (or similar) because of the various hacks/warkarounds you have to apply to the grammars to make them LALR(1). Of course, if anyone knows of such a beast for OCaml, then any pointers to it would be appreciated. :) -- Bardur Arantsson <ba...@im...> <ba...@sc...> - I didn't go to medical school for six friggin' years to be called *Mr.* Evil! Dr. Evil | Austin Powers: Intl. Man of Mystery |
From: Nicolas C. <war...@fr...> - 2004-09-11 12:33:00
|
> It just seems that lots of small utilities I've written > require simple (but "non-standard" parsers), almost all of > them handwritten up until now because anything but GLR > parsing just seems like premature -- and for my purposes > unnecessary -- optimization, and almost always ends up > being more work when implemented using ocamlyacc (or > similar) because of the various hacks/warkarounds you have > to apply to the grammars to make them LALR(1). > > Of course, if anyone knows of such a beast for OCaml, then > any pointers to it would be appreciated. :) I stop using ocamlyacc. It's too much difficult to get the grammar right, and the approach is to me counterintuitive. I'm using LL(1) stream parsers of CamlP4, they're less more powerful but so much easy to understand, maintain, and extend ! Nicolas |
From: Nicolas C. <war...@fr...> - 2004-09-11 12:37:37
|
> > However since my ocaml bootstrapped language will not get polymorphic > > variants I can't use your implementation of regexp... > > Well, at this stage you can regard it as an experiment. > Once a nice structure is discovered you can probably > 'freeze' part of the typing, and use ordinary variants > instead. The remaining part you can either Yes that might be possible... > I mention this because Felix generates C++ so I have > a far worse bootstrapping problem .. I have to implement > everything twice, once in Ocaml and then again in C++. > Clearly I'll use Felix code to generate the C++, > and it looks 'similar' to Ocaml in some ways, but still > it will be hard to maintain two compiler implementations. That's hard :) I choose what I think is a nice bootstrapping way : i'm writing in OCaml a parser for my language that can produce an ocaml AST (and works with -pp ocamlc/ocamlopt flag) so if I keep the compiler into core language (an intersection between ocaml features and my language new features) then I can compile the compiler using only ocaml. However for full boostrapping I need of course a lexer/parser library written in my language, that can be first written in Ocaml then translated, but since I want to experiment the syntax changes I should write them directly in new language. Nicolas |
From: Alain F. <Ala...@en...> - 2004-09-11 12:46:01
|
On Sat, 11 Sep 2004, Nicolas Cannasse wrote: > ocamlc/ocamlopt flag) so if I keep the compiler into core language (an > intersection between ocaml features and my language new features) Just out of curiosity, would it be possible to have some information about your new language (feel free to answer off-list if you want) ? -- Alain |
From: skaller <sk...@us...> - 2004-09-11 13:33:19
|
On Sat, 2004-09-11 at 22:45, Alain Frisch wrote: > On Sat, 11 Sep 2004, Nicolas Cannasse wrote: > > > ocamlc/ocamlopt flag) so if I keep the compiler into core language (an > > intersection between ocaml features and my language new features) > > Just out of curiosity, would it be possible to have some information about > your new language (feel free to answer off-list if you want) ? Aw, answer on list please. Who know what cooperation can be found. Many people are using Ocaml for developing new languages. Talking about them does sometimes annoy Ocaml users who aren't doing new languages .. but it *is* what Ocaml is good for. And would be even better at if it has some more lexer/parser libs. I actually have two parsers -- the Felix one and also I'm using Frontc/Cil to parse C for my wrapper generator. I've tried to extend the latter to handle C++ but got stuck trying to parse constructor declarations. Both systems are currently ocamllex/ocamlyacc based. -- John Skaller, mailto:sk...@us... voice: 061-2-9660-0850, snail: PO BOX 401 Glebe NSW 2037 Australia Checkout the Felix programming language http://felix.sf.net |
From: Nicolas C. <war...@fr...> - 2004-09-11 14:16:47
|
> > > ocamlc/ocamlopt flag) so if I keep the compiler into core language (an > > > intersection between ocaml features and my language new features) > > > > Just out of curiosity, would it be possible to have some information about > > your new language (feel free to answer off-list if you want) ? > > Aw, answer on list please. Who know what cooperation > can be found. Many people are using Ocaml for developing > new languages. Talking about them does sometimes annoy > Ocaml users who aren't doing new languages .. but it > *is* what Ocaml is good for. Well I'm sorry I had first some talks privatly with Alain. Speaking french is more easy for exposing my idea and getting feedback :) Here's a translation of what I wrote : I'm thinking of keeping "core-ML" and adding several functionnalities in order to get introspection and dynamic loading. From a language point of view, it will looks more like CamlLight than OCaml, but a lot of work is needed for runtime and librairies. In particular, it will be possible to generate, compile, and execute code at runtime , then examine results using reflexion, so the language will be more dynamic. The goal is to have a language that helps writing other languages but also enable them to use the same runtime and interact together, a little like .net but more functionnal-oriented. Nicolas |
From: skaller <sk...@us...> - 2004-09-11 18:52:24
|
On Sun, 2004-09-12 at 00:17, Nicolas Cannasse wrote: > Well I'm sorry I had first some talks privatly with Alain. > Speaking french is more easy for exposing my idea and getting feedback :) I understand .. although I'm not sure my 'ramblings' in English are any more comprehenisble to an English speaker than your French :00 > I'm thinking of keeping "core-ML" and adding several functionnalities in > order to get introspection and dynamic loading. From a language point of > view, it will looks more like CamlLight than OCaml, but a lot of work is > needed for runtime and librairies. In particular, it will be possible to > generate, compile, and execute code at runtime , then examine results using > reflexion, so the language will be more dynamic. The goal is to have a > language that helps writing other languages but also enable them to use the > same runtime and interact together, a little like .net but more > functionnal-oriented. Ouch! Felix has a related goal, although our technical approach is different. It's intended that the client be able to extend it to implement Domain Specific Sub Languages. Regular expressions are an example of that, except this DSSL is currently built in since it is needed for bootstrapping (I need a parser and term rewriting system too) My approach here is to use existing C libraries to allow the client to use the domain specific functionality easily, and Felix generated wrappers which translate some DSSL layer to function calls on that library. Both the C libraries and DSSL library use traditional dlopen() style dynamic linkage. For me, actually obtaining this layer isn't so hard, at least once it is bootstrapped (which is still some way off). The big problem is probably the same as yours -- making it typesafe. BTW: why don't you use the same sublanguage of Ocaml that Ocaml is itself written in, and use the same bootstrapping mechanism? Or is that what you're doing? -- John Skaller, mailto:sk...@us... voice: 061-2-9660-0850, snail: PO BOX 401 Glebe NSW 2037 Australia Checkout the Felix programming language http://felix.sf.net |