#1248 greedy vs. non-greedy confusion

final: 8.2.3
open
5
2001-09-07
2000-10-26
Anonymous
No

OriginalBugID: 4001 Bug
Version: 8.2.3
SubmitDate: '2000-01-10'
LastModified: '2000-01-27'
Severity: MED
Status: Assigned
Submitter: techsupp
ChangedBy: hobbs
RelatedBugIDs: 2866
OS: BSD
OSVersion: NetBSD-1.4.1/i386
FixedDate: '2000-10-25'
ClosedDate: '2000-10-25'

Name: hume smith

ReproducibleScript:
> tclsh
% regexp {x.*?([a-z]+)} {1234x56789word101112} a b
1
% set a
x56789w
% set b
w
% set tcl_patchLevel
8.2.3
%

ObservedBehavior:
the + was matched in a nongreedy fashion; shouldn't it be greedy?

DesiredBehavior:
% set a
x56789word
% set b
word
%

Patch:

PatchFiles:

Henry Spencer has noted some problems with mixing greedy and
non-greedy quantifiers in the new regexp code. He's cc'ed on
this, but in the meantime, the work-around is:
regexp {x[^a-z]*([a-z]+)} {1234x56789word101112} a b
-- 01/10/2000 hobbs
From HS:

This is the same old problem: people accustomed to Perl are not grasping
the idea that the whole RE is greedy or non-greedy, but *not* some mixture
of the two. In this case, it is non-greedy since the first thing in it
which cares is non-greedy. The + is being as greedy as it can, within the
constraints set by the behavior of the whole RE.

In short, the behavior, while surprising, is as documented. It's not an
outright bug; it may, however, be a misfeature.

-- 01/10/2000 hobbs

Discussion

  • Donal K. Fellows

    Perhaps this requires a documentation change? Well, either that or a behaviour change. Would it be possible to have a flag to force greediness or non-greediness instead of guessing it from the first quantifier in the regular expression? (With the default being greedy unless all the "top-level" quantifiers were non-greedy, maybe?)

     
  • Andreas Kupries

    Andreas Kupries - 2001-09-07
    • assigned_to: nobody --> hobbs
     
  • Nobody/Anonymous

    Logged In: NO

    I couldnt find anything to confirm this behavior in 'man
    re_syntax(n)'. Where has it been documented into
    semi-legitimacy?

    Any correlation between familiarity with perl and noticing
    this behavior is purely coincidental. I mean, come on...
    If greedyness is meant to be a 'whole expression'
    behavior, why isnt it implemented as a switch, like with
    the '-nocase' option? Calling it a misfeature is being
    too kind, especially considering the amount of grief it
    causes the people who are affected by it. ;')

     
  • Christian Segeth

    Logged In: YES
    user_id=1613941

    The problem greedy mixing non-greedy still exists in tcl
    v8.4. . In my opinion mixing those two one's should be
    possible, actually i have to split the RE ... one for
    non-greedy and one for greedy. It isn't explained in the
    manual re_syntax, that you can't create a mixture of greedy
    and non-greedy quantifiers.

    I like to know is it a documentation bug or a bug in the tcl
    interpreter?

     

Get latest updates about Open Source Projects, Conferences and News.

Sign up for the SourceForge newsletter:

JavaScript is required for this form.





No, thanks