Re: [Tcl-gsoc] GSOC2011: Tcl Plugin for Netbeans

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Colin McCormack skrev 2011-05-08 02.56:
> I haven't been following closely, but I noticed 'tcl parsing' and wanted
> to point out http://wiki.tcl.tk/9620 and also and especially
> http://wiki.tcl.tk/9649 (which I use and find very good.)

Then I should point out http://abel.math.umu.se/~lars/tcl/parsetcl.pdf 
contains more documentation that the wiki page, and also has been updated 
to support the {*} feature of Tcl 8.5.

Regarding Arnulf's worry about whitespace: Since character indices are 
kept track of, it is straightforward to record whitespace in a 
post-processing phase. parsetcl::reinsert_indentation shows how to do that 
for indentation, and the same technique can be applied to interword 
whitespace. Parsing is tricky in itself, so there is no need to further 
complicate it with whitespace when that is not needed. KISS.

However, I suspect these Tcl-oriented approaches may be suboptimal for the 
Netbeans project; if a parser for context-free languages is available more 
natively, then using that is probably easier than operating Tcl parsing by 
remote. My reasoning is basically the following.

1. First distinguish the phases of lexing and parsing, for this 
discussion. (It is generally possible to unify them, but the resulting 
grammars don't tend to be something for human consumption.)

2. "Most" languages (well, C, Java, Pascal, and the like) tend to be 
regular at the lexing phase -- you could write a regexp for "the next 
token" -- but roughly context-free at the parsing phase. The latter is why 
people write BNFs when describing their syntax.

3. Tcl, on the other hand, is non-regular context-free at the lexing 
phase, and roughly regular[*] at the parsing phase. In fact, I think Tcl 
might be LR(0) at the lexing phase (which is probably why it was feasible 
to write parsetcl as an ad-hoc parser in the first place). Most of the 
Dodekalogue (Tcl(n) manpage) is about the lexing grammar, whereas the 
parsing grammar is presented on a per-command basis.

[*] Since the set of "tokens" is infinite, some care is needed when 
defining what it means to be "regular" in this case. I think one could 
still have a requirement that there are only finitely many "token classes" 
for the grammar to distinguish. Of course, some of those token classes are 
things like "Tcl script" and "Tcl [expr]ession", so there is a recursion 
which makes things complicated. Whether it is a problem depends on what 
you want to do.

Anyway, I think the basic point of "context-free lexer, regular parser" 
might provide some insight into the peculiarities of parsing Tcl.

Lars Hellström

Re: [Tcl-gsoc] GSOC2011: Tcl Plugin for Netbeans

The Tool Command Language implementation

Re: [Tcl-gsoc] GSOC2011: Tcl Plugin for Netbeans