|
From: <no...@so...> - 2001-11-19 22:46:12
|
Bugs item #219219, was opened at 2000-10-25 22:03 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=110894&aid=219219&group_id=10894 Category: 41. Regexp Group: : 8.2.3 Status: Open Resolution: None Priority: 5 Submitted By: Nobody/Anonymous (nobody) Assigned to: Jeffrey Hobbs (hobbs) Summary: greedy vs. non-greedy confusion Initial Comment: OriginalBugID: 4001 Bug Version: 8.2.3 SubmitDate: '2000-01-10' LastModified: '2000-01-27' Severity: MED Status: Assigned Submitter: techsupp ChangedBy: hobbs RelatedBugIDs: 2866 OS: BSD OSVersion: NetBSD-1.4.1/i386 FixedDate: '2000-10-25' ClosedDate: '2000-10-25' Name: hume smith ReproducibleScript: > tclsh % regexp {x.*?([a-z]+)} {1234x56789word101112} a b 1 % set a x56789w % set b w % set tcl_patchLevel 8.2.3 % ObservedBehavior: the + was matched in a nongreedy fashion; shouldn't it be greedy? DesiredBehavior: % set a x56789word % set b word % Patch: PatchFiles: Henry Spencer has noted some problems with mixing greedy and non-greedy quantifiers in the new regexp code. He's cc'ed on this, but in the meantime, the work-around is: regexp {x[^a-z]*([a-z]+)} {1234x56789word101112} a b -- 01/10/2000 hobbs From HS: This is the same old problem: people accustomed to Perl are not grasping the idea that the whole RE is greedy or non-greedy, but *not* some mixture of the two. In this case, it is non-greedy since the first thing in it which cares is non-greedy. The + is being as greedy as it can, within the constraints set by the behavior of the whole RE. In short, the behavior, while surprising, is as documented. It's not an outright bug; it may, however, be a misfeature. -- 01/10/2000 hobbs ---------------------------------------------------------------------- Comment By: Nobody/Anonymous (nobody) Date: 2001-11-19 14:45 Message: Logged In: NO I couldnt find anything to confirm this behavior in 'man re_syntax(n)'. Where has it been documented into semi-legitimacy? Any correlation between familiarity with perl and noticing this behavior is purely coincidental. I mean, come on... If greedyness is meant to be a 'whole expression' behavior, why isnt it implemented as a switch, like with the '-nocase' option? Calling it a misfeature is being too kind, especially considering the amount of grief it causes the people who are affected by it. ;') ---------------------------------------------------------------------- Comment By: Donal K. Fellows (dkf) Date: 2000-11-10 02:48 Message: Perhaps this requires a documentation change? Well, either that or a behaviour change. Would it be possible to have a flag to force greediness or non-greediness instead of guessing it from the first quantifier in the regular expression? (With the default being greedy unless all the "top-level" quantifiers were non-greedy, maybe?) ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=110894&aid=219219&group_id=10894 |