#1622 html lexer crash

Bug
closed-fixed
Neil Hodgson
5
2014-08-14
2014-07-11
John Ehresman
No

The html lexer is crashing given the attached mako file. The problems seems to be with the text: <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">

I think what's happening is that the < is colored as an operator, setting the startSeq to 23 (given the file) and then the block for a SGML DTD is hit and styler.ColourTo(i - 2, StateToPrint); is invoked. i - 2 is 21 and the assert in ColourTo fails. I'm also seeing a segfault in non-debug mode and can get a backtrace for it if you need it.

1 Attachments

Discussion

  • Neil Hodgson
    Neil Hodgson
    2014-07-11

    This is only happening with Mako turned on. When it sees the "##" it goes into Mako comment mode. Then, at the end of the line it colours the line in SCE_HP_COMMENTLINE and sets the state to Python default SCE_HP_DEFAULT. However, unlike the other Mako block starts, it doesn't set the scriptLanguage to eScriptPython so it is still eligible for the SGML branch.

     
    • John Ehresman
      John Ehresman
      2014-07-14

      Will setting scriptLanguage to eScriptPython after the ## line fix it?

       
  • Hmm, looks like people use ## both within mako script area and outside, in the HTML. I think the simplest solution is to remove the "state = SCE_HP_DEFAULT;" from the end-of-mako-comment block of code and add another block to make sure we really skip over the comment line just below, so that section of code looks something like this:

                ...
        // handle start of Mako comment line
        if (isMako && ch == '#' && chNext == '#') {
            makoComment = 1;
        }
    
        // handle end of Mako comment line
        else if (isMako && makoComment && (ch == '\r' || ch == '\n')) {
            makoComment = 0;
            styler.ColourTo(i, SCE_HP_COMMENTLINE);
        }
    
        // Allow falling through to mako handling code if newline is going to end a block
        if (((ch == '\r' && chNext != '\n') || (ch == '\n')) &&
            (!isMako || (0 != strcmp(makoBlockType, "%")))) {
        }
        // Ignore everything in mako comment until the line ends
        else if (isMako && makoComment) {
        }
    
        // generic end of script processing
        else if ((inScriptType == eNonHtmlScript) && (ch == '<') && (chNext == '/')) {
                ...
    

    Then state and inScriptType remain unaltered by the presence of a mako comment and should still be correct for whatever context we were in before the comment line.

     
    • Neil Hodgson
      Neil Hodgson
      2014-07-15

      That seems to work OK on the problem file as well as some random Mako examples. Just to be sure I copied this correctly here is a patch, with +2 lines -1 line.

       
      Attachments
  • Yes, those are the changes I made. I may still be overlooking something but this seems like an improvement because the old code could colorize things after ## and before the end of the line and could set the wrong state after the comment. Now processing of a comment line is pretty isolated.

     
  • Neil Hodgson
    Neil Hodgson
    2014-07-16

    • labels: --> scintilla, lexer, html, mako
    • status: open --> open-fixed
    • assigned_to: Neil Hodgson
     
  • Neil Hodgson
    Neil Hodgson
    2014-07-16

    Committed as [4cf1ec].

     

    Related

    Commit: [4cf1ec]

  • John Ehresman
    John Ehresman
    2014-07-25

    The fix causes crashes with ## comments in python sections because it would use styler.ColourTo(i, SCE_HP_COMMENTLINE); and then if state == SCE_HP_COMMENTLINE, the case SCE_HP_COMMENTLINE below would call styler.ColourTo(i - 1, SCE_HP_COMMENTLINE); The comment-in-pyscript.mako illustrates this bug.

    My current attempt to fix this is to set the state to SCE_HP_COMMENTLINE at ## and then set it back to SCE_HP_DEFAULT or SCE_H_DEFAULT base on whether a python script section is active or not; the diff is attached in set-to-h-default.diff The original bug was caused by the state set to SCE_HP_DEFAULT outside of a python script section,

     
    • Neil Hodgson
      Neil Hodgson
      2014-07-26

      That appears to work well and not cause any assertions on the examples I have tried.

      One aspect I like (but which may not be intentional) is that the ## comment is displayed in server-side Python (SCE_HPA_COMMENTLINE) instead of client-side Python (SCE_HP_COMMENTLINE). The transformation from SCE_HP_COMMENTLINE to SCE_HPA_COMMENTLINE is due to the call of statePrintForState early in the loop body.

      Committed as [a93614] despite the above comment sounding a little uncertain as it stops an assertion failure.

       

      Related

      Commit: [a93614]

    • Neil Hodgson
      Neil Hodgson
      2014-07-27

      While the patch appears to work, it has a type problem where inScriptType (a script_mode) is compared to eScriptPython (a script_type: value 3). The effect is that inScriptType is compared to eNonHtmlScriptPreProc which also has value 3. If the intent is to check if the current language is Python, then scriptLanguage should be used.

       
      • John Ehresman
        John Ehresman
        2014-07-28

        I think scriptLanguage == eScriptPython should be used. The use of SCE_HPA_COMMENTLINE wasn't planned, but I think it's a good change. Thanks.

         
        • Neil Hodgson
          Neil Hodgson
          2014-07-28

          Committed as [e69d01].

           

          Related

          Commit: [e69d01]

  • Neil Hodgson
    Neil Hodgson
    2014-08-14

    • status: open-fixed --> closed-fixed