Looks like if I set semantic-lex-depth -1, PAREN_BLOCK magically parses they
way I expect... Not sure it's the way to go, but yay, it works!
/me is rushing into writing that parser
-M.
2011/3/25 Mihai Călin Bazon <mihai.bazon@...>
> Just noticed that this wasn't sent to the list... GMails' reply button is
> really bugging me. :-\
>
>
> ---------- Forwarded message ----------
> From: Mihai Călin Bazon <mihai.bazon@...>
> Date: 2011/3/24
> Subject: Re: [cedet-semantic] JavaScript support
> To: "Eric M. Ludlam" <eric@...>
>
>
> Hi Eric,
>
> Thanks for your reply, and sorry for my late reaction -- quite busy during
> the week. I'd like to resume on this, could you help figuring out what's
> broken in the following code? I tried to reduce to a minimal test case
> about getting PAREN_BLOCK to work. I'm starting with a fresh Emacs, load
> the file below, C-c C-c, M-x eval-buffer (to evaluate the generated
> grammar)
> then M-x semantic-load-enable-semantic-debugging-helpers and I load a JS
> file that contains (foo, bar, baz). My expectation would be that it parses
> and semantic-fetch-tags would return a function tag, but it returns nil
> instead and everything except whitespace is underlined in red, which
> suggests a parsing error.
>
> Cheers,
> -Mihai
>
> ;;;; WY
>
> %package wisent-ecmascript
>
> %languagemode javascript-mode js-mode
>
> %start program
> %start statement
> %start namelist
>
> ;;; --- punctuation
>
> %type <punctuation>
>
> %token <punctuation> SEMICOLON ";"
> %token <punctuation> COMMA ","
>
> ;;; --- blocks
>
> %type <block> ;;syntax "\\s(\\|\\s)" matchdatatype block
>
> %token <block> PAREN_BLOCK "(LPAREN RPAREN)"
> %token <block> BRACE_BLOCK "(LBRACE RBRACE)"
> %token <block> BRACK_BLOCK "(LBRACK RBRACK)"
>
> %token <open-paren> LPAREN "("
> %token <close-paren> RPAREN ")"
> %token <open-paren> LBRACE "{"
> %token <close-paren> RBRACE "}"
> %token <open-paren> LBRACK "["
> %token <close-paren> RBRACK "]"
>
> ;;; --- symbols
>
> %type <symbol>
> %token <symbol> NAME
>
> %%
>
> program
> : statement
> ;
>
> statement
> : PAREN_BLOCK
> (FUNCTION-TAG "test" nil (EXPANDFULL $1 namelist))
> ;
>
> namelist
> : LPAREN
> ()
> | RPAREN
> ()
> | NAME
> (VARIABLE-TAG $1 nil nil)
> | COMMA
> ()
> ;
>
> %%
>
> (require 'semantic-java)
> (require 'semantic-wisent)
>
> (define-lex ecmascript-lexer
> ""
> semantic-lex-ignore-whitespace
> semantic-lex-ignore-newline
> semantic-lex-ignore-comments
>
> wisent-ecmascript--<symbol>-regexp-analyzer
> wisent-ecmascript--<punctuation>-string-analyzer
> wisent-ecmascript--<block>-block-analyzer
>
> semantic-lex-default-action)
>
> (defun wisent-ecmascript-setup-parser ()
> (wisent-ecmascript--install-parser)
> (setq semantic-lex-analyzer 'ecmascript-lexer
> semantic-lex-number-expression semantic-java-number-regexp
> semantic-lex-depth nil
> semantic-command-separation-character ";"))
>
> (add-hook 'js-mode-hook 'wisent-ecmascript-setup-parser)
> (add-hook 'javascript-mode-hook 'wisent-ecmascript-setup-parser)
> (add-hook 'ecmascript-mode-hook 'wisent-ecmascript-setup-parser)
>
> 2011/3/21 Eric M. Ludlam <eric@...>
>
> On 03/21/2011 04:29 AM, Mihai Călin Bazon wrote:
>>
>>> Hi folks,
>>>
>>> I've spent my weekend with CEDET and must say it's amazing; if only I'd
>>> understand it better. :-) My goal was to add proper support for
>>> JavaScript
>>> (sorry but the existing parser doesn't cut it for real world code). I've
>>> started it from scratch, to better understand how to write parsers, but I
>>> didn't get far.
>>>
>>
>> It's always nice to have some new folks trying things out.
>>
>>
>> The Semantic/Wisent manuals are quite good, yet I've had trouble getting
>>> started and doing simple things. I think a step-by-step HOWTO on adding
>>> support for a simple language (with nested structures) would be very
>>> welcome!
>>>
>>
>> That's a good idea. There are some skeleton files around, but I don't
>> think they go too deep into anything like that.
>>
>>
>> So anyway, I'm attaching my (highly incomplete) work so far and hope for
>>> some advice on how to continue. Questions:
>>>
>>
>> I will attempt to answer given the brief amount of time i have this AM.
>>
>>
>> - I don't seem to be able to parse more than one statement. Presumably
>>> because the return value of the `statement' rule is wrong. Generally I
>>> couldn't figure out how to return the proper values.
>>>
>>
>> Each nonterminal you define with a %start pragma should return 1
>> production. The entire grammar you create is called iteratively, and the
>> automatic value passing of the wisent parser generator framework is setup
>> for this. The iterative nature makes error recover very simple. The grammar
>> just "fails", and the upper level iterative parser skips over the bad
>> semantics and moves on.
>>
>> Thus, it returns only one thing because that is all it can do. If you
>> change statlist to only return statement, then after it finds the first
>> statement, it should get called a second time, and return the second
>> statement, and the parser framework will keep track of it all for you.
>>
>>
>> - I tried to use PAREN_BLOCK and the `iterative style' to parse variable
>>> declarations, did that exactly as in other existing parsers and as
>>> documented, but it wouldn't work... I know "it doesn't work" is not
>>> good
>>> information, but that's all I can say, the blocks simply didn't
>>> parse. So
>>> I switched to the recursive style and collecting (EXPANDTAG
>>> (VARIABLE-TAG
>>> ...)). (btw, perhaps something similar should go in `statlist'?).
>>>
>>
>> EXPANDFULL will use the same iterative nature as I describe above inside a
>> parent block. The nonterminal symbol passed to EXPANDFULL should have rules
>> about (, ), and some variable declation.
>>
>> If you use EXPANDTAG, you need to create your own rule that parsers (
>> varlist ) and the varlist will need to cons all the found variables together
>> itself. It is much easier to use EXPANDFULL, as it handles bad syntax
>> easily.
>>
>>
>> - That seems to parse: a function declaration with an argument list:
>>>
>>> function foo(a, b) {
>>> }
>>>
>>> (semantic-fetch-tags) returns the function tag and the variables are
>>> there. However, if I complicate that a bit:
>>>
>>> function foo(a, b) {
>>> function bar() {
>>> }
>>> }
>>>
>>> only the outer function is returned. Inner functions are ubiquituous
>>> in
>>> JS and they need to be parsed correctly to provide useful functionality
>>> (BTW, the existing JS parser distributed with Wisent fails here too).
>>>
>>
>> The semantic lexer skips over { } and ( ) blocks and does not go into them
>> unless a rule action explicitly calls EXPANDTAG or EXPANDFULL on the value
>> returned from the PAREN_BLOCK.
>>
>> In your nonterminal for a function, the BRACE_BLOCK part of the rule will
>> need to be passed into EXPANDFULL which will iteratively parser your
>> function body looking for more functions. Code will show up as bad syntax
>> unless you write rules for all that too.
>>
>>
>> - what is EXPANDTAG? and is it related to the value of
>>> semantic-tag-expand-function? (I just copied the expander from the
>>> existing JS mode for now, but I'd like to understand why is it useful,
>>> what argument it receives and what it should return). Didn't put too
>>> much
>>> time into this yet, but from the docs I'm not clear.
>>>
>>
>> EXPANDTAG and EXPANDFULL let you look inside some _BLOCK with a new
>> nonterminal start. For each rule you pass to EXPAND* you need to add a
>> %start pragma. The output of EXPAND* will be (presumably) some tag or tag
>> list. EXPANDFULL will return a tag list and handle expanding and the data
>> needed for "cooking" the tags so they are bound into the buffer with
>> overlays.
>>
>> In your support file, you will need to write an overload of
>> semantic-tag-components if you do anything besides function arguments or
>> type members. For function args and type members, you just need to put the
>> tag lists into the correct tag attributes.
>>
>>
>> - generally, how do you debug a grammar?
>>>
>>
>> You can debug a rule in wisent, but not how the wisent grammar parses. the
>> grammar debugger was never ported to wisent. :(
>>
>>
>> * * *
>>>
>>> I have in mind a few things for now:
>>>
>>> - be able to detect the local variables around the cursor. For example
>>> if I
>>> place the cursor on a variable, it should highlight the occurrences of
>>> that name in the enclosing scope. I already did something like this
>>> for
>>> js2-mode [1], but I'd like to get rid of that setup.
>>>
>>
>> If you visit http://cedet.sourceforge.net/addlang.shtml step 4 is about
>> context parsing.
>>
>>
>> - having done the above, it should be easy to provide some keybindings to
>>> quickly move through such occurrences, and a keybinding to rename the
>>> variable (again, my js2-mode setup supports that).
>>>
>>
>> semantic-symref output (using idutils, gnu global, or other) has features
>> like that that may "just work" for you.
>>
>>
>> - use the knowledge from the parser to indent var properly:
>>>
>>> var foo = 1,
>>> bar = 2;
>>>
>>
>> Many folks have wanted to do this, but as far as I know, no one has built
>> a framework for it.
>>
>>
>> Then of course I know that much functionality would come for free from
>>> existing Semantic applications.
>>>
>>> JavaScript is quickly taking over the world (it's the most popular
>>> language
>>> on GitHub right now) and it's a pity not to support it properly. I have
>>> some previous knowledge on parsing JavaScript [2] and I use Emacs for 12
>>> years now; though I'm not very skilled at Elisp, I do Common Lisp at my
>>> day
>>> job and have good knowledge of it. I'm willing to invest the time to
>>> write
>>> this parser for Semantic, just need some help! :-)
>>>
>>> Thanks in advance!
>>> -Mihai
>>>
>>> [1]
>>>
>>> http://mihai.bazon.net/projects/editing-javascript-with-emacs-js2-mode/js2-highlight-vars-mode
>>> [2] https://github.com/mishoo/UglifyJS
>>>
>>> PS: By the way, don't you guys consider switching to GitHub? SourceForge
>>> is... uhm... better not say it.
>>>
>>
>> I've been too lazy to look into reasons to move anything. Right now we're
>> just trying to get a good method for keeping synchronized with Emacs. The
>> tactic is good, but it takes a lot of time.
>>
>> Eric
>>
>
>
>
> --
> Mihai Bazon,
> http://mihai.bazon.net/blog
>
>
>
> --
> Mihai Bazon,
> http://mihai.bazon.net/blog
>
--
Mihai Bazon,
http://mihai.bazon.net/blog
|