From: Thiago A. <thi...@gm...> - 2005-11-25 22:25:26
|
Leif, 2005/11/25, Leif Frenzel <hi...@le...>: > > So, we need some way to guess when the expression ends. I think we can > > mess with the equals sign, but there is also the problem that they can > > happen inside the expressions. > Should not layout help with this? I know it's tricky to take into > account in the antlr grammar, though. It would, in most cases. But we can't really rely on code layout, just because people can choose not to layout the program but use all the curly braces and semicolons. I am using 'rely on code layout' here as using the tokens' line and column information as input for the parser. This would be something like 'If the last top declaration started at column 4, then I can presume the next thing that starts at column <=3D 4 is a top declaration'. Did I missunderstood you here? > When I was experimenting in that direction, my idea was to run the > parser not on the sequence from the lexer, but on a filtered token > stream, i.e. the sequence would be lexer > filter > parser. The filter > would apply the layout rules and add in some helpful markers that are > equivalent to the curly braces (it could even insert curly braces :-). That is exatcly the approach currently taken. The lexer extracts tokens from a character stream. Those tokens are passed to a filter called formatter, that is responsible for inserting some layout tokens (the curly braces and semicolons) when needed. Finally, the parser reads from the formatter stream and doesn't have to deal with layout rules. This is also the same approach taken by the Language.Haskell parser (except for the fact that their lexer and formatter are actually merged). Maybe the formatter can insert some braces before and after each function declaration and the parser would accept a modified version of Haskell. But this is really awkard. Because people can actually write a program _with_ the braces, those artificially inserted braces could trick us. I am not very sure here, maybe I will experiment with this approach. I was taking a look at the report and it seems to me that semicolons can only happen inside a brace block. Maybe the function definition parser could lookahead and search for a semicolon or a closing brace without actually consuming it. If this is true, and if antlr supports that kind of cheat, I think we can get it done. But I really need to check the report more carefully. Cheers, Thiago Arrais |