Menu

#1344 non-greedy quantifiers break capturing parens

obsolete: 8.4a1
closed-duplicate
nobody
5
2000-11-15
2000-10-26
Anonymous
No

OriginalBugID: 5996 Bug
Version: 8.4a1
SubmitDate: '2000-07-10'
LastModified: '2000-10-25'
Severity: LOW
Status: UnAssn
Submitter: techsupp
OS: Linux-Red Hat
OSVersion: Tcl 8.3, 8.4a1, Windows 98 and RedHat Linux 6.x
FixedDate: '2000-10-25'
ClosedDate: '2000-10-25'

Name:
Vadim Nasardinov

ReproducibleScript:
set str "start <s> continue <s> in both regexes, parens should capture up to here"

regexp {.*<s>(.*)} $str match parens

puts "
Greedy match (works ok):
match: >$match<
parens: >$parens<
"

regexp {.*?<s>(.*)} $str match parens

puts "
Non-greedy match (p2 does not get set):
match: >$match<
parens: >$parens<
"

ObservedBehavior:
Using a non-greedy quantifier causes breaks capturing parens.

DesiredBehavior:
The above code snippet should produce the following output:

Greedy match (works ok):
match: >start <s> continue <s> in both regexes, parens should capture up to here<
parens: > in both regexes, parens should capture up to here<

Non-greedy match (p2 does not get set):
match: >start <s><
parens: > continue <s> in both regexes, parens should capture up to here<

Discussion

  • Donal K. Fellows

    This is known behaviour of the regexp engine (the first quantifier in a regexp sets the greediness for the whole regexp) though I've yet to see a convincing explanation of why this is correct...

     
  • Donal K. Fellows

    • labels: 104246 --> 43. Regexp
    • status: open --> closed-duplicate