[cedet-semantic] Fwd: JavaScript support

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Just noticed that this wasn't sent to the list...  GMails' reply button is
really bugging me. :-\

---------- Forwarded message ----------
From: Mihai Călin Bazon <mih...@gm...>
Date: 2011/3/24
Subject: Re: [cedet-semantic] JavaScript support
To: "Eric M. Ludlam" <er...@si...>

Hi Eric,

Thanks for your reply, and sorry for my late reaction -- quite busy during
the week.  I'd like to resume on this, could you help figuring out what's
broken in the following code?  I tried to reduce to a minimal test case
about getting PAREN_BLOCK to work.  I'm starting with a fresh Emacs, load
the file below, C-c C-c, M-x eval-buffer (to evaluate the generated grammar)
then M-x semantic-load-enable-semantic-debugging-helpers and I load a JS
file that contains (foo, bar, baz).  My expectation would be that it parses
and semantic-fetch-tags would return a function tag, but it returns nil
instead and everything except whitespace is underlined in red, which
suggests a parsing error.

Cheers,
-Mihai

;;;; WY

%package wisent-ecmascript

%languagemode javascript-mode js-mode

%start program
%start statement
%start namelist

;;; --- punctuation

%type <punctuation>

%token <punctuation> SEMICOLON ";"
%token <punctuation> COMMA ","

;;; --- blocks

%type <block> ;;syntax "\\s(\\|\\s)" matchdatatype block

%token <block> PAREN_BLOCK "(LPAREN RPAREN)"
%token <block> BRACE_BLOCK "(LBRACE RBRACE)"
%token <block> BRACK_BLOCK "(LBRACK RBRACK)"

%token <open-paren>  LPAREN "("
%token <close-paren> RPAREN ")"
%token <open-paren>  LBRACE "{"
%token <close-paren> RBRACE "}"
%token <open-paren>  LBRACK "["
%token <close-paren> RBRACK "]"

;;; --- symbols

%type <symbol>
%token <symbol> NAME

%%

program
  : statement
  ;

statement
  : PAREN_BLOCK
    (FUNCTION-TAG "test" nil (EXPANDFULL $1 namelist))
  ;

namelist
  : LPAREN
    ()
  | RPAREN
    ()
  | NAME
    (VARIABLE-TAG $1 nil nil)
  | COMMA
    ()
  ;

%%

(require 'semantic-java)
(require 'semantic-wisent)

(define-lex ecmascript-lexer
  ""
  semantic-lex-ignore-whitespace
  semantic-lex-ignore-newline
  semantic-lex-ignore-comments

  wisent-ecmascript--<symbol>-regexp-analyzer
  wisent-ecmascript--<punctuation>-string-analyzer
  wisent-ecmascript--<block>-block-analyzer

  semantic-lex-default-action)

(defun wisent-ecmascript-setup-parser ()
  (wisent-ecmascript--install-parser)
  (setq semantic-lex-analyzer 'ecmascript-lexer
        semantic-lex-number-expression semantic-java-number-regexp
        semantic-lex-depth nil
        semantic-command-separation-character ";"))

(add-hook 'js-mode-hook 'wisent-ecmascript-setup-parser)
(add-hook 'javascript-mode-hook 'wisent-ecmascript-setup-parser)
(add-hook 'ecmascript-mode-hook 'wisent-ecmascript-setup-parser)

2011/3/21 Eric M. Ludlam <er...@si...>

On 03/21/2011 04:29 AM, Mihai Călin Bazon wrote:
>
>> Hi folks,
>>
>> I've spent my weekend with CEDET and must say it's amazing; if only I'd
>> understand it better. :-) My goal was to add proper support for JavaScript
>> (sorry but the existing parser doesn't cut it for real world code).  I've
>> started it from scratch, to better understand how to write parsers, but I
>> didn't get far.
>>
>
> It's always nice to have some new folks trying things out.
>
>
>  The Semantic/Wisent manuals are quite good, yet I've had trouble getting
>> started and doing simple things.  I think a step-by-step HOWTO on adding
>> support for a simple language (with nested structures) would be very
>> welcome!
>>
>
> That's a good idea.  There are some skeleton files around, but I don't
> think they go too deep into anything like that.
>
>
>  So anyway, I'm attaching my (highly incomplete) work so far and hope for
>> some advice on how to continue.  Questions:
>>
>
> I will attempt to answer given the brief amount of time i have this AM.
>
>
>  - I don't seem to be able to parse more than one statement.  Presumably
>>   because the return value of the `statement' rule is wrong.  Generally I
>>   couldn't figure out how to return the proper values.
>>
>
> Each nonterminal you define with a %start pragma should return 1
> production.  The entire grammar you create is called iteratively, and the
> automatic value passing of the wisent parser generator framework is setup
> for this.  The iterative nature makes error recover very simple. The grammar
> just "fails", and the upper level iterative parser skips over the bad
> semantics and moves on.
>
> Thus, it returns only one thing because that is all it can do.  If you
> change statlist to only return statement, then after it finds the first
> statement, it should get called a second time, and return the second
> statement, and the parser framework will keep track of it all for you.
>
>
>  - I tried to use PAREN_BLOCK and the `iterative style' to parse variable
>>   declarations, did that exactly as in other existing parsers and as
>>   documented, but it wouldn't work...  I know "it doesn't work" is not
>> good
>>   information, but that's all I can say, the blocks simply didn't
>> parse.  So
>>   I switched to the recursive style and collecting (EXPANDTAG
>> (VARIABLE-TAG
>>   ...)).  (btw, perhaps something similar should go in `statlist'?).
>>
>
> EXPANDFULL will use the same iterative nature as I describe above inside a
> parent block.  The nonterminal symbol passed to EXPANDFULL should have rules
> about (,  ), and some variable declation.
>
> If you use EXPANDTAG, you need to create your own rule that parsers (
> varlist ) and the varlist will need to cons all the found variables together
> itself.  It is much easier to use EXPANDFULL, as it handles bad syntax
> easily.
>
>
>  - That seems to parse: a function declaration with an argument list:
>>
>>   function foo(a, b) {
>>   }
>>
>>   (semantic-fetch-tags) returns the function tag and the variables are
>>   there.  However, if I complicate that a bit:
>>
>>   function foo(a, b) {
>>     function bar() {
>>     }
>>   }
>>
>>   only the outer function is returned.  Inner functions are ubiquituous in
>>   JS and they need to be parsed correctly to provide useful functionality
>>   (BTW, the existing JS parser distributed with Wisent fails here too).
>>
>
> The semantic lexer skips over { } and ( ) blocks and does not go into them
> unless a rule action explicitly calls EXPANDTAG or EXPANDFULL on the value
> returned from the PAREN_BLOCK.
>
> In your nonterminal for a function, the BRACE_BLOCK part of the rule will
> need to be passed into EXPANDFULL which will iteratively parser your
> function body looking for more functions.  Code will show up as bad syntax
> unless you write rules for all that too.
>
>
>  - what is EXPANDTAG? and is it related to the value of
>>   semantic-tag-expand-function? (I just copied the expander from the
>>   existing JS mode for now, but I'd like to understand why is it useful,
>>   what argument it receives and what it should return).  Didn't put too
>> much
>>   time into this yet, but from the docs I'm not clear.
>>
>
> EXPANDTAG and EXPANDFULL let you look inside some _BLOCK with a new
> nonterminal start.  For each rule you pass to EXPAND* you need to add a
> %start pragma.  The output of EXPAND* will be (presumably) some tag or tag
> list.  EXPANDFULL will return a tag list and handle expanding and the data
> needed for "cooking" the tags so they are bound into the buffer with
> overlays.
>
> In your support file, you will need to write an overload of
> semantic-tag-components if you do anything besides function arguments or
> type members.  For function args and type members, you just need to put the
> tag lists into the correct tag attributes.
>
>
>  - generally, how do you debug a grammar?
>>
>
> You can debug a rule in wisent, but not how the wisent grammar parses. the
> grammar debugger was never ported to wisent. :(
>
>
>  * * *
>>
>> I have in mind a few things for now:
>>
>> - be able to detect the local variables around the cursor.  For example if
>> I
>>   place the cursor on a variable, it should highlight the occurrences of
>>   that name in the enclosing scope.  I already did something like this for
>>   js2-mode [1], but I'd like to get rid of that setup.
>>
>
> If you visit http://cedet.sourceforge.net/addlang.shtml step 4 is about
> context parsing.
>
>
>  - having done the above, it should be easy to provide some keybindings to
>>   quickly move through such occurrences, and a keybinding to rename the
>>   variable (again, my js2-mode setup supports that).
>>
>
> semantic-symref output (using idutils, gnu global, or other) has features
> like that that may "just work" for you.
>
>
>  - use the knowledge from the parser to indent var properly:
>>
>>   var foo = 1,
>>       bar = 2;
>>
>
> Many folks have wanted to do this, but as far as I know, no one has built a
> framework for it.
>
>
>  Then of course I know that much functionality would come for free from
>> existing Semantic applications.
>>
>> JavaScript is quickly taking over the world (it's the most popular
>> language
>> on GitHub right now) and it's a pity not to support it properly.  I have
>> some previous knowledge on parsing JavaScript [2] and I use Emacs for 12
>> years now; though I'm not very skilled at Elisp, I do Common Lisp at my
>> day
>> job and have good knowledge of it.  I'm willing to invest the time to
>> write
>> this parser for Semantic, just need some help! :-)
>>
>> Thanks in advance!
>> -Mihai
>>
>> [1]
>>
>> http://mihai.bazon.net/projects/editing-javascript-with-emacs-js2-mode/js2-highlight-vars-mode
>> [2] https://github.com/mishoo/UglifyJS
>>
>> PS: By the way, don't you guys consider switching to GitHub?  SourceForge
>> is... uhm... better not say it.
>>
>
> I've been too lazy to look into reasons to move anything.  Right now we're
> just trying to get a good method for keeping synchronized with Emacs.  The
> tactic is good, but it takes a lot of time.
>
> Eric
>

-- 
Mihai Bazon,
http://mihai.bazon.net/blog

-- 
Mihai Bazon,
http://mihai.bazon.net/blog