[Tcl-bugs] [ tcl-Bugs-2826551 ] Line-sensitivity at -start, but also -all

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

Bugs item #2826551, was opened at 2009-07-24 14:53
Message generated for change (Comment added) made by lars_h
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=110894&aid=2826551&group_id=10894

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: 43. Regexp
Group: current: 8.6b1
Status: Open
Resolution: None
Priority: 9
Private: No
Submitted By: Lars Hellström (lars_h)
Assigned to: Pavel Goran (pvgoran)
Summary: Line-sensitivity at -start, but also -all

Initial Comment:
I was going to use [regexp -all -inline] to split a unified diff into chunks, 
but found that this would skip every second chunk. This seems to be 
related to ^ not matching the beginning of line if the newline is before 
the -start position.

SMALL EXAMPLE:

set data {
@1
 2
+3
@4
-5
+6
 7
@8
 9
}
regexp -all -inline {(?n)^@.*\n(?:[^@].*\n?)*} $data

RETURNS

{@1
 2
+3
} {@8
 9
}

RATHER THAN

{@1
 2
+3
} {@4
-5
+6
 7
} {@8
 9
}

EXPLANATION: A diff chunk consists of one header line beginning with @, followed by a body 
consisting of lines not beginning with @ (but rather space, plus, or minus). The header line
is thus in line-sensitive mode matched by ^@.* and a body line is matched by ^[^@].*, so 
a sensible combined regexp is (?n)^@.*\n(?:[^@].*\n?)*  Each chunk is indeed matched by
this, but when -all is used, only every second chunk is found.

In this case, dropping the ^ from the regexp allows all chunks to be found:

% regexp -all -inline {(?n)@.*\n(?:[^@].*\n?)*} $data
{@1
 2
+3
} {@4
-5
+6
 7
} {@8
 9
}

This suggests the problem may be that ^ fails to match when at the -start position, as one obvious 
way to implement [regexp -all] is as the equivalent of a loop around [regexp -start]. Indeed,

% regexp -start 1 -indices {(?n)^a} "\nab" match
0
% regexp -start 0 -indices {(?n)^a} "\nab" match
1
% set match
1 1

----------------------------------------------------------------------

>Comment By: Lars Hellström (lars_h)
Date: 2010-08-11 23:20

Message:
To answer dgp's question: the bug was originally encountered under 8.5
(possibly even 8.4), so a fix there would be nice too. Given how little
tends to change in the regexp engine, I would be surprised if fixing 8.5 is
different from fixing 8.6.

----------------------------------------------------------------------

Comment By: Don Porter (dgp)
Date: 2010-08-09 18:23

Message:
status?  This bug is/was 8.6 branch only?
Or is there still a need to fix a bug for new 8.5.* releases too?

----------------------------------------------------------------------

Comment By: Donal K. Fellows (dkf)
Date: 2010-02-11 12:15

Message:
That patch now applied.

----------------------------------------------------------------------

Comment By: Mo DeJong (mdejong)
Date: 2010-02-09 12:28

Message:
Added patch 2948425 to addres this issue.

----------------------------------------------------------------------

Comment By: Joachim Kock (jkock)
Date: 2009-10-16 00:25

Message:
Here is another description of the bug, and a contrast:

In regexp with the -all option, it is clear that the matches
must not overlap.  But how is the rule if the expression
starts with a constraint (width zero), like for example ^
or \m ?

To my surprise you cannot find a linestart after a
matched \n:

% regexp -all -inline -indices {(?n)^a\n} "a\na\na\n"
{0 1} {4 5}

The behaviour I expected is the one seen with \m instead of ^:
you _can_ find a wordstart after a matched \n:

% regexp -all -inline -indices {(?n)\ma\n} "a\na\na\n"
{0 1} {2 3} {4 5}

So somehow the regexp engine is allowed to look back to
see that a \n preceded and hence constitutes a word boundary,
but it is not allowed to look back and detect that this same
\n also constitutes a line boundary...

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=110894&aid=2826551&group_id=10894

[Tcl-bugs] [ tcl-Bugs-2826551 ] Line-sensitivity at -start, but also -all

The Tool Command Language implementation

[Tcl-bugs] [ tcl-Bugs-2826551 ] Line-sensitivity at -start, but also -all