Re: [eclipsefp-develop] JParser - Updates and call for help

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

Leif,

2005/11/25, Leif Frenzel <hi...@le...>:
> > So, we need some way to guess when the expression ends. I think we can
> > mess with the equals sign, but there is also the problem that they can
> > happen inside the expressions.
> Should not layout help with this? I know it's tricky to take into
> account in the antlr grammar, though.

It would, in most cases. But we can't really rely on code layout, just
because people can choose not to layout the program but use all the
curly braces and semicolons. I am using  'rely on code layout' here as
using the tokens' line and column information as input for the parser.
This would be something like 'If the last top declaration started at
column 4, then I can presume the next thing that starts at column <=3D 4
is a top declaration'.

Did I missunderstood you here?

> When I was experimenting in that direction, my idea was to run the
> parser not on the sequence from the lexer, but on a filtered token
> stream, i.e. the sequence would be lexer > filter > parser. The filter
> would apply the layout rules and add in some helpful markers that are
> equivalent to the curly braces (it could even insert curly braces :-).

That is exatcly the approach currently taken. The lexer extracts
tokens from a character stream. Those tokens are passed to a filter
called formatter, that is responsible for inserting some layout tokens
(the curly braces and semicolons) when needed. Finally, the parser
reads from the formatter stream and doesn't have to deal with layout
rules. This is also the same approach taken by the Language.Haskell
parser (except for the fact that their lexer and formatter are
actually merged).

Maybe the formatter can insert some braces before and after each
function declaration and the parser would accept a modified version of
Haskell. But this is really awkard. Because people can actually write
a program _with_ the braces, those artificially inserted braces could
trick us. I am not very sure here, maybe I will experiment with this
approach.

I was taking a look at the report and it seems to me that semicolons
can only happen inside a brace block. Maybe the function definition
parser could lookahead and search for a semicolon or a closing brace
without actually consuming it. If this is true, and if antlr supports
that kind of cheat, I think we can get it done. But I really need to
check the report more carefully.

Cheers,

Thiago Arrais