[Lxr-dev] [ lxr-Bugs-3546293 ] Incorrect start-of-line recognition

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Bugs item #3546293, was opened at 2012-07-20 05:04
Message generated for change (Tracker Item Submitted) made by ajlittoz
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=390117&aid=3546293&group_id=27350

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Lang support
Group: current cvs
Status: Open
Resolution: None
Priority: 6
Private: No
Submitted By: Andre-Littoz (ajlittoz)
Assigned to: Andre-Littoz (ajlittoz)
Summary: Incorrect start-of-line recognition

Initial Comment:
Some languages or language constructs are strictly positionned at the start of lines. The identification pattern must then be anchored at the start of the line with ^ (caret). Unhappily, the parser does not pass lines but chunks beginning where the last fragment ended. The caret anchors at the start of the chunk, which is rarely the start of the line.

Consequently, anchored syntactic categories will be rarely correctly recognised. A false positive is generated every time the start pattern is detected at the beginning of the chunk.

Suggested fix: when initialising the parser, regexps starting with a caret are modified to match an "impossible" character after the caret and that same "impossible" character is preprended to the line when it is read. The extra character can be stripped during line numbering in the HTML generation process.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=390117&aid=3546293&group_id=27350