Menu

#3974 syntax highlighter doesn't like division. confusion with RegExp //?

minor bug
open
None
5
2016-04-08
2016-04-07
No
mediaSizeB=(atold_(t.value,true,true,',',["b"],false)/8)+atold_(t.value,true,true,',',["B"],false);//column 1 is media size in bytes

vista x32 java 8.4 je 5.3
syntax highlighter (.js) shows this with incorrect highlighting from

/8)+atold_(t.value,true,true,',',["B"],false);//column 1 is media size in bytes

and on

Discussion

  • Jim Michaels

    Jim Michaels - 2016-04-07

    it turns blue.until it hits the first / in the comment. common problem with js syntax highlighters. parsing problem. see the ecmascript (less functions, no DOM) or javascript EBNF.

     
  • Dale Anson

    Dale Anson - 2016-04-07

    This looks like an easy fix, it's caused by this line in the javascript mode file:

    <SEQ_REGEXP TYPE="MARKUP" HASH_CHAR="/"        AT_WORD_START="TRUE">/[^\p{Blank}]*?/</SEQ_REGEXP>
    

    Simply removing this line from the mode file appears to fix this particular problem, however, It's not clear to me exactly what this line line is supposed to do.

     
  • Dale Anson

    Dale Anson - 2016-04-07
    • assigned_to: Matthieu Casanova
     
  • Dale Anson

    Dale Anson - 2016-04-07

    Matthieu, from the svn log, it looks like you added this line back in 2008 in revision 12059 and adjusted it in revision 12089 as part of work on this ticket:

    https://sourceforge.net/p/jedit/bugs/2253/

    I'm not up on javascript regular expression usage, but I'm wondering if there is a better way to highlight the regex without consuming a division sign followed by an end of line comment on the same line?

     
  • Jim Michaels

    Jim Michaels - 2016-04-07

    there are 3 situations where / is used:

    /* comment... */
    //comment
    Regexp(/regexp here, can contain \//)
    string.split(/regexp here, can contain \//);
    var n=z/364.25/24/60/60/1000;//comment or /*comment
    

    and on the var line, is that an even number of /'s or odd?

     
  • Marc Häfner

    Marc Häfner - 2016-04-08

    Parsing javascripts regular expression literals vs. division operator is notoriously difficult as the interpretation of a single / is context sensitive.

    The concrete problem is that AT_WORD_START matches, which is almost always correct (e.g. after , or =), but not in this case since after a closing parenthesis only a division operator makes any syntactical sense. (Or to put it more parsery: regular expression literals, parenthesized expression, super/call expressions are all parsed as LeftHandSideExpression and there is no rule with two of those next to each other, as per ecma-262/6.0)

    As a lazy fix one could replace the <SEQ TYPE="OPERATOR">)</SEQ> (a few lines above the SEQ_REGEXP for regular expression literals) with this ugly thing:

    <SEQ_REGEXP TYPE="OPERATOR" HASH_CHAR=")">\)(\s*/(?![/*]))?</SEQ_REGEXP>
    

    This marks the closing parenthesis and a possibly following single slash (but not a comment). Although this treats both characters and any whitespace between as one token, (so far) I can not produce any unwanted side effects while testing.

    Another route would be to emulate part of the actual parser logic with delegation to distinguish between states where a division operator or a regex literal would be expected. This would be rather complex, but also enable a more permissive matching of regex literals, specifically allowing whitespaces (which are legal anywhere in regexes).

     

Log in to post a comment.