[cedet-semantic] Fwd: JavaScript support
Brought to you by:
zappo
|
From: Mihai C. B. <mih...@gm...> - 2011-03-25 07:02:26
|
Just noticed that this wasn't sent to the list... GMails' reply button is
really bugging me. :-\
---------- Forwarded message ----------
From: Mihai Călin Bazon <mih...@gm...>
Date: 2011/3/24
Subject: Re: [cedet-semantic] JavaScript support
To: "Eric M. Ludlam" <er...@si...>
Hi Eric,
Thanks for your reply, and sorry for my late reaction -- quite busy during
the week. I'd like to resume on this, could you help figuring out what's
broken in the following code? I tried to reduce to a minimal test case
about getting PAREN_BLOCK to work. I'm starting with a fresh Emacs, load
the file below, C-c C-c, M-x eval-buffer (to evaluate the generated grammar)
then M-x semantic-load-enable-semantic-debugging-helpers and I load a JS
file that contains (foo, bar, baz). My expectation would be that it parses
and semantic-fetch-tags would return a function tag, but it returns nil
instead and everything except whitespace is underlined in red, which
suggests a parsing error.
Cheers,
-Mihai
;;;; WY
%package wisent-ecmascript
%languagemode javascript-mode js-mode
%start program
%start statement
%start namelist
;;; --- punctuation
%type <punctuation>
%token <punctuation> SEMICOLON ";"
%token <punctuation> COMMA ","
;;; --- blocks
%type <block> ;;syntax "\\s(\\|\\s)" matchdatatype block
%token <block> PAREN_BLOCK "(LPAREN RPAREN)"
%token <block> BRACE_BLOCK "(LBRACE RBRACE)"
%token <block> BRACK_BLOCK "(LBRACK RBRACK)"
%token <open-paren> LPAREN "("
%token <close-paren> RPAREN ")"
%token <open-paren> LBRACE "{"
%token <close-paren> RBRACE "}"
%token <open-paren> LBRACK "["
%token <close-paren> RBRACK "]"
;;; --- symbols
%type <symbol>
%token <symbol> NAME
%%
program
: statement
;
statement
: PAREN_BLOCK
(FUNCTION-TAG "test" nil (EXPANDFULL $1 namelist))
;
namelist
: LPAREN
()
| RPAREN
()
| NAME
(VARIABLE-TAG $1 nil nil)
| COMMA
()
;
%%
(require 'semantic-java)
(require 'semantic-wisent)
(define-lex ecmascript-lexer
""
semantic-lex-ignore-whitespace
semantic-lex-ignore-newline
semantic-lex-ignore-comments
wisent-ecmascript--<symbol>-regexp-analyzer
wisent-ecmascript--<punctuation>-string-analyzer
wisent-ecmascript--<block>-block-analyzer
semantic-lex-default-action)
(defun wisent-ecmascript-setup-parser ()
(wisent-ecmascript--install-parser)
(setq semantic-lex-analyzer 'ecmascript-lexer
semantic-lex-number-expression semantic-java-number-regexp
semantic-lex-depth nil
semantic-command-separation-character ";"))
(add-hook 'js-mode-hook 'wisent-ecmascript-setup-parser)
(add-hook 'javascript-mode-hook 'wisent-ecmascript-setup-parser)
(add-hook 'ecmascript-mode-hook 'wisent-ecmascript-setup-parser)
2011/3/21 Eric M. Ludlam <er...@si...>
On 03/21/2011 04:29 AM, Mihai Călin Bazon wrote:
>
>> Hi folks,
>>
>> I've spent my weekend with CEDET and must say it's amazing; if only I'd
>> understand it better. :-) My goal was to add proper support for JavaScript
>> (sorry but the existing parser doesn't cut it for real world code). I've
>> started it from scratch, to better understand how to write parsers, but I
>> didn't get far.
>>
>
> It's always nice to have some new folks trying things out.
>
>
> The Semantic/Wisent manuals are quite good, yet I've had trouble getting
>> started and doing simple things. I think a step-by-step HOWTO on adding
>> support for a simple language (with nested structures) would be very
>> welcome!
>>
>
> That's a good idea. There are some skeleton files around, but I don't
> think they go too deep into anything like that.
>
>
> So anyway, I'm attaching my (highly incomplete) work so far and hope for
>> some advice on how to continue. Questions:
>>
>
> I will attempt to answer given the brief amount of time i have this AM.
>
>
> - I don't seem to be able to parse more than one statement. Presumably
>> because the return value of the `statement' rule is wrong. Generally I
>> couldn't figure out how to return the proper values.
>>
>
> Each nonterminal you define with a %start pragma should return 1
> production. The entire grammar you create is called iteratively, and the
> automatic value passing of the wisent parser generator framework is setup
> for this. The iterative nature makes error recover very simple. The grammar
> just "fails", and the upper level iterative parser skips over the bad
> semantics and moves on.
>
> Thus, it returns only one thing because that is all it can do. If you
> change statlist to only return statement, then after it finds the first
> statement, it should get called a second time, and return the second
> statement, and the parser framework will keep track of it all for you.
>
>
> - I tried to use PAREN_BLOCK and the `iterative style' to parse variable
>> declarations, did that exactly as in other existing parsers and as
>> documented, but it wouldn't work... I know "it doesn't work" is not
>> good
>> information, but that's all I can say, the blocks simply didn't
>> parse. So
>> I switched to the recursive style and collecting (EXPANDTAG
>> (VARIABLE-TAG
>> ...)). (btw, perhaps something similar should go in `statlist'?).
>>
>
> EXPANDFULL will use the same iterative nature as I describe above inside a
> parent block. The nonterminal symbol passed to EXPANDFULL should have rules
> about (, ), and some variable declation.
>
> If you use EXPANDTAG, you need to create your own rule that parsers (
> varlist ) and the varlist will need to cons all the found variables together
> itself. It is much easier to use EXPANDFULL, as it handles bad syntax
> easily.
>
>
> - That seems to parse: a function declaration with an argument list:
>>
>> function foo(a, b) {
>> }
>>
>> (semantic-fetch-tags) returns the function tag and the variables are
>> there. However, if I complicate that a bit:
>>
>> function foo(a, b) {
>> function bar() {
>> }
>> }
>>
>> only the outer function is returned. Inner functions are ubiquituous in
>> JS and they need to be parsed correctly to provide useful functionality
>> (BTW, the existing JS parser distributed with Wisent fails here too).
>>
>
> The semantic lexer skips over { } and ( ) blocks and does not go into them
> unless a rule action explicitly calls EXPANDTAG or EXPANDFULL on the value
> returned from the PAREN_BLOCK.
>
> In your nonterminal for a function, the BRACE_BLOCK part of the rule will
> need to be passed into EXPANDFULL which will iteratively parser your
> function body looking for more functions. Code will show up as bad syntax
> unless you write rules for all that too.
>
>
> - what is EXPANDTAG? and is it related to the value of
>> semantic-tag-expand-function? (I just copied the expander from the
>> existing JS mode for now, but I'd like to understand why is it useful,
>> what argument it receives and what it should return). Didn't put too
>> much
>> time into this yet, but from the docs I'm not clear.
>>
>
> EXPANDTAG and EXPANDFULL let you look inside some _BLOCK with a new
> nonterminal start. For each rule you pass to EXPAND* you need to add a
> %start pragma. The output of EXPAND* will be (presumably) some tag or tag
> list. EXPANDFULL will return a tag list and handle expanding and the data
> needed for "cooking" the tags so they are bound into the buffer with
> overlays.
>
> In your support file, you will need to write an overload of
> semantic-tag-components if you do anything besides function arguments or
> type members. For function args and type members, you just need to put the
> tag lists into the correct tag attributes.
>
>
> - generally, how do you debug a grammar?
>>
>
> You can debug a rule in wisent, but not how the wisent grammar parses. the
> grammar debugger was never ported to wisent. :(
>
>
> * * *
>>
>> I have in mind a few things for now:
>>
>> - be able to detect the local variables around the cursor. For example if
>> I
>> place the cursor on a variable, it should highlight the occurrences of
>> that name in the enclosing scope. I already did something like this for
>> js2-mode [1], but I'd like to get rid of that setup.
>>
>
> If you visit http://cedet.sourceforge.net/addlang.shtml step 4 is about
> context parsing.
>
>
> - having done the above, it should be easy to provide some keybindings to
>> quickly move through such occurrences, and a keybinding to rename the
>> variable (again, my js2-mode setup supports that).
>>
>
> semantic-symref output (using idutils, gnu global, or other) has features
> like that that may "just work" for you.
>
>
> - use the knowledge from the parser to indent var properly:
>>
>> var foo = 1,
>> bar = 2;
>>
>
> Many folks have wanted to do this, but as far as I know, no one has built a
> framework for it.
>
>
> Then of course I know that much functionality would come for free from
>> existing Semantic applications.
>>
>> JavaScript is quickly taking over the world (it's the most popular
>> language
>> on GitHub right now) and it's a pity not to support it properly. I have
>> some previous knowledge on parsing JavaScript [2] and I use Emacs for 12
>> years now; though I'm not very skilled at Elisp, I do Common Lisp at my
>> day
>> job and have good knowledge of it. I'm willing to invest the time to
>> write
>> this parser for Semantic, just need some help! :-)
>>
>> Thanks in advance!
>> -Mihai
>>
>> [1]
>>
>> http://mihai.bazon.net/projects/editing-javascript-with-emacs-js2-mode/js2-highlight-vars-mode
>> [2] https://github.com/mishoo/UglifyJS
>>
>> PS: By the way, don't you guys consider switching to GitHub? SourceForge
>> is... uhm... better not say it.
>>
>
> I've been too lazy to look into reasons to move anything. Right now we're
> just trying to get a good method for keeping synchronized with Emacs. The
> tactic is good, but it takes a lot of time.
>
> Eric
>
--
Mihai Bazon,
http://mihai.bazon.net/blog
--
Mihai Bazon,
http://mihai.bazon.net/blog
|