Learn how easy it is to sync an existing GitHub or Google Code repo to a SourceForge project! See Demo

Close

#214 Incorrect start-of-line recognition

current_cvs
closed-fixed
Andre-Littoz
6
2012-08-03
2012-07-20
Andre-Littoz
No

Some languages or language constructs are strictly positionned at the start of lines. The identification pattern must then be anchored at the start of the line with ^ (caret). Unhappily, the parser does not pass lines but chunks beginning where the last fragment ended. The caret anchors at the start of the chunk, which is rarely the start of the line.

Consequently, anchored syntactic categories will be rarely correctly recognised. A false positive is generated every time the start pattern is detected at the beginning of the chunk.

Suggested fix: when initialising the parser, regexps starting with a caret are modified to match an "impossible" character after the caret and that same "impossible" character is preprended to the line when it is read. The extra character can be stripped during line numbering in the HTML generation process.

Discussion

  • Andre-Littoz
    Andre-Littoz
    2012-08-03

    • status: open --> closed-fixed
     
  • Andre-Littoz
    Andre-Littoz
    2012-08-03

    Fix involves 3 patches:
    1 - SimpleParse.pm, sub init: in regexps for start, end, lock and 'atom', an eventual initial ^ is replaced by a test for byte \xFF,
    2 - SimpleParse.pm, sub nextfrag: when a new line is read, byte \xFF is added at the start of the line,
    3 - Markup.pm, sub htmlquote: any \xFF byte is erased (this sub is always invoked before outputting HTML chunks)