From: Arnulf W. <ar...@wi...> - 2011-05-08 18:43:18
|
I see that there are a lot of changes to the version of parsetcl I have seen, so my worries may no longer be true. @Michal: I think you should read the paper of Lars and think about the points Lars has written down here, to decide, which way to go. You know better what is needed from netbeans side, or if not, you should find it out :). You should make the decision what to use. Myself and others will give comments, if there are problems finding the decision, so please ask, if necessary. Arnulf Am 08.05.2011 12:34, schrieb Lars Hellström: > Colin McCormack skrev 2011-05-08 02.56: >> I haven't been following closely, but I noticed 'tcl parsing' and wanted >> to point out http://wiki.tcl.tk/9620 and also and especially >> http://wiki.tcl.tk/9649 (which I use and find very good.) > > Then I should point out http://abel.math.umu.se/~lars/tcl/parsetcl.pdf > contains more documentation that the wiki page, and also has been updated > to support the {*} feature of Tcl 8.5. > > Regarding Arnulf's worry about whitespace: Since character indices are > kept track of, it is straightforward to record whitespace in a > post-processing phase. parsetcl::reinsert_indentation shows how to do that > for indentation, and the same technique can be applied to interword > whitespace. Parsing is tricky in itself, so there is no need to further > complicate it with whitespace when that is not needed. KISS. > > However, I suspect these Tcl-oriented approaches may be suboptimal for the > Netbeans project; if a parser for context-free languages is available more > natively, then using that is probably easier than operating Tcl parsing by > remote. My reasoning is basically the following. > > 1. First distinguish the phases of lexing and parsing, for this > discussion. (It is generally possible to unify them, but the resulting > grammars don't tend to be something for human consumption.) > > 2. "Most" languages (well, C, Java, Pascal, and the like) tend to be > regular at the lexing phase -- you could write a regexp for "the next > token" -- but roughly context-free at the parsing phase. The latter is why > people write BNFs when describing their syntax. > > 3. Tcl, on the other hand, is non-regular context-free at the lexing > phase, and roughly regular[*] at the parsing phase. In fact, I think Tcl > might be LR(0) at the lexing phase (which is probably why it was feasible > to write parsetcl as an ad-hoc parser in the first place). Most of the > Dodekalogue (Tcl(n) manpage) is about the lexing grammar, whereas the > parsing grammar is presented on a per-command basis. > > [*] Since the set of "tokens" is infinite, some care is needed when > defining what it means to be "regular" in this case. I think one could > still have a requirement that there are only finitely many "token classes" > for the grammar to distinguish. Of course, some of those token classes are > things like "Tcl script" and "Tcl [expr]ession", so there is a recursion > which makes things complicated. Whether it is a problem depends on what > you want to do. > > Anyway, I think the basic point of "context-free lexer, regular parser" > might provide some insight into the peculiarities of parsing Tcl. > > Lars Hellström > > > ------------------------------------------------------------------------------ > WhatsUp Gold - Download Free Network Management Software > The most intuitive, comprehensive, and cost-effective network > management toolset available today. Delivers lowest initial > acquisition cost and overall TCO of any competing solution. > http://p.sf.net/sfu/whatsupgold-sd > _______________________________________________ > Tcl-gsoc mailing list > Tcl...@li... > https://lists.sourceforge.net/lists/listinfo/tcl-gsoc |