From: Moritz H. <ant...@gm...> - 2010-04-26 06:38:56
|
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hi all, thanks to all of those who provided some information on an ooRexx grammar. Here, I want to share a central finding and my conclusion from it. Parsing (oo)Rexx is hard. The reason for this is that a symbol can be a symbol or a keyword, depending on the context. Have a look at the following example: do while + 0 ... end What does that mean? From a programmer's perspective it's clear: Repeat the loop while (+ 0) is true. + 0 evaluates to 0, as + is a prefix instruction, and 0 represents false, so the loop will never be executed. The important aspect is that the symbol "while" was recognized as a keyword. Now, let's look at a different example: i = 0 do i + 0 ... end This means to the programmer that the loop is to be executed i+0=0 times, so it's semantically the same as the first example. Here, "i" is not a keyword, but a variable symbol, which resolves to 0, and 0+0 resolves to 0, meaning that the loop shall be executed 0 times. - From a parser's point of view this is somewhat different. I experimented to write LR(*) grammar, which is context free top down parser grammar. Note that this means, there must not be any left recursion. Now, we define symbols to match something like: terminal SYMBOL: ('a'..'z'|'A'..'Z'|'_'|'!'|'?') ('a'..'z'|'A'..'Z'|'_'|'!'|'?'|'0'..'9'|'.')* and DoInstruction: 'do' repetitor=DoRep cond=DoCond ...; DoRep: DoRepTimes | DoRepOver | ...; DoRepTimes: expr=Expression; DoRepOver: symbol=Symbol 'over' expr=Expression; DoCond: ('while'|'until') expr=Expression; Expression: ... SYMBOL ...; Here, we can now see what the problem is: 'while' is a keyword and also a symbol. Due to this, we are not allowed to match the instruction "do while + 0" as (Do (DoRepTimes (Expression .. while .. + .. 0))) But, at other places we can write: while = 5 i = while +3 -- i is 8 The expression "while + 3" is also matched using the same expression rule. Hence, to implement a non ambiguous grammar for Rexx, one has to handle these cases using specialized expression rules. The expression rule for DoRepTimes must make sure that it does not match anything that may follow the expression, like 'while' and 'until'. It definitely is possible to write a grammar to match Rexx code, but as I have demonstrated there are some problems that need to be solved. If the grammar would provide semantic predicates, it would be much easier, but it hasn't got that... So, for the workbench project I now chose to only filter and parse directives, like ::class and ::method. This is also what is most important for a file's structure. Later, we can still add support for all rexx expressions. If you have any idea on how to solve this problem, feel free to tell me. I have written some grammars, but it's a very challenging field of computer science, so there are big chances I missed something. Kind regards, Moritz -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAkvVNPQACgkQl56sB+DIUZQ7CACdHLHmLUkcPQVYsNoX/kJKkaro yYQAn0/Va4gdvBlkYlGi3R293ZRzWMl8 =9ec+ -----END PGP SIGNATURE----- |