#1622 html lexer crash


The html lexer is crashing given the attached mako file. The problems seems to be with the text: <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">

I think what's happening is that the < is colored as an operator, setting the startSeq to 23 (given the file) and then the block for a SGML DTD is hit and styler.ColourTo(i - 2, StateToPrint); is invoked. i - 2 is 21 and the assert in ColourTo fails. I'm also seeing a segfault in non-debug mode and can get a backtrace for it if you need it.

1 Attachments


  • Neil Hodgson

    Neil Hodgson - 2014-07-11

    This is only happening with Mako turned on. When it sees the "##" it goes into Mako comment mode. Then, at the end of the line it colours the line in SCE_HP_COMMENTLINE and sets the state to Python default SCE_HP_DEFAULT. However, unlike the other Mako block starts, it doesn't set the scriptLanguage to eScriptPython so it is still eligible for the SGML branch.

    • John Ehresman

      John Ehresman - 2014-07-14

      Will setting scriptLanguage to eScriptPython after the ## line fix it?

  • Stephan R.A. Deibel

    Hmm, looks like people use ## both within mako script area and outside, in the HTML. I think the simplest solution is to remove the "state = SCE_HP_DEFAULT;" from the end-of-mako-comment block of code and add another block to make sure we really skip over the comment line just below, so that section of code looks something like this:

        // handle start of Mako comment line
        if (isMako && ch == '#' && chNext == '#') {
            makoComment = 1;
        // handle end of Mako comment line
        else if (isMako && makoComment && (ch == '\r' || ch == '\n')) {
            makoComment = 0;
            styler.ColourTo(i, SCE_HP_COMMENTLINE);
        // Allow falling through to mako handling code if newline is going to end a block
        if (((ch == '\r' && chNext != '\n') || (ch == '\n')) &&
            (!isMako || (0 != strcmp(makoBlockType, "%")))) {
        // Ignore everything in mako comment until the line ends
        else if (isMako && makoComment) {
        // generic end of script processing
        else if ((inScriptType == eNonHtmlScript) && (ch == '<') && (chNext == '/')) {

    Then state and inScriptType remain unaltered by the presence of a mako comment and should still be correct for whatever context we were in before the comment line.

    • Neil Hodgson

      Neil Hodgson - 2014-07-15

      That seems to work OK on the problem file as well as some random Mako examples. Just to be sure I copied this correctly here is a patch, with +2 lines -1 line.

  • Stephan R.A. Deibel

    Yes, those are the changes I made. I may still be overlooking something but this seems like an improvement because the old code could colorize things after ## and before the end of the line and could set the wrong state after the comment. Now processing of a comment line is pretty isolated.

  • Neil Hodgson

    Neil Hodgson - 2014-07-16
    • labels: --> scintilla, lexer, html, mako
    • status: open --> open-fixed
    • assigned_to: Neil Hodgson
  • John Ehresman

    John Ehresman - 2014-07-25

    The fix causes crashes with ## comments in python sections because it would use styler.ColourTo(i, SCE_HP_COMMENTLINE); and then if state == SCE_HP_COMMENTLINE, the case SCE_HP_COMMENTLINE below would call styler.ColourTo(i - 1, SCE_HP_COMMENTLINE); The comment-in-pyscript.mako illustrates this bug.

    My current attempt to fix this is to set the state to SCE_HP_COMMENTLINE at ## and then set it back to SCE_HP_DEFAULT or SCE_H_DEFAULT base on whether a python script section is active or not; the diff is attached in set-to-h-default.diff The original bug was caused by the state set to SCE_HP_DEFAULT outside of a python script section,

    • Neil Hodgson

      Neil Hodgson - 2014-07-26

      That appears to work well and not cause any assertions on the examples I have tried.

      One aspect I like (but which may not be intentional) is that the ## comment is displayed in server-side Python (SCE_HPA_COMMENTLINE) instead of client-side Python (SCE_HP_COMMENTLINE). The transformation from SCE_HP_COMMENTLINE to SCE_HPA_COMMENTLINE is due to the call of statePrintForState early in the loop body.

      Committed as [a93614] despite the above comment sounding a little uncertain as it stops an assertion failure.



      Commit: [a93614]

    • Neil Hodgson

      Neil Hodgson - 2014-07-27

      While the patch appears to work, it has a type problem where inScriptType (a script_mode) is compared to eScriptPython (a script_type: value 3). The effect is that inScriptType is compared to eNonHtmlScriptPreProc which also has value 3. If the intent is to check if the current language is Python, then scriptLanguage should be used.

      • John Ehresman

        John Ehresman - 2014-07-28

        I think scriptLanguage == eScriptPython should be used. The use of SCE_HPA_COMMENTLINE wasn't planned, but I think it's a good change. Thanks.

  • Neil Hodgson

    Neil Hodgson - 2014-08-14
    • status: open-fixed --> closed-fixed

Log in to post a comment.