From: SourceForge.net <no...@so...> - 2006-03-20 09:21:42
|
Bugs item #1452969, was opened at 2006-03-17 20:51 Message generated for change (Comment added) made by nobody You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=110894&aid=1452969&group_id=10894 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: 42. Regexp Group: development: 8.4.13 Status: Open Resolution: None Priority: 5 Submitted By: bstellar (bstellar) Assigned to: Jeffrey Hobbs (hobbs) Summary: performance: regexp 20sec delay from 8.0 to 8.3 Initial Comment: Hi I am having considerable performance issue, from v8.0 to v8.3 on regexp for the following statement in my program. I have atttached the program, which responses in same time for 8.0 version as below: Before if - The time is Fri Mar 17 9:17:12 PM US Mountain Standard Time 2006 Inside the if The time is Fri Mar 17 9:17:12 PM US Mountain Standard Time 2006 lsWzrdEcnRequestedByCompleteList_choices and 20 seconds for 8.3: Before if - The time is Fri Mar 17 9:18:53 PM US Mountain Standard Time 2006 Inside the if The time is Fri Mar 17 9:19:22 PM US Mountain Standard Time 2006 lsWzrdEcnRequestedByCompleteList_choices Please help me, if it is known issue. Thanks, Balaji ---------------------------------------------------------------------- Comment By: Nobody/Anonymous (nobody) Date: 2006-03-20 01:21 Message: Logged In: NO Another solution is just to use the ? non greedy quamtifier: regexp {^(.*?)=(.*)$} ... Nice and quick. Also probably what the user intended (in the case where there is a second '='). ---------------------------------------------------------------------- Comment By: Donal K. Fellows (dkf) Date: 2006-03-19 15:04 Message: Logged In: YES user_id=79902 The RE engine is pretty close to a black box to me; I know how to fix a very small subset of the problems it has, and this is definitely not one of them! Possible consideration: special-case in the outer [regexp] code to detect and handle the case where we have: ^(.*)<SOME_LITERAL>(.*)$ as that's replacable by a simple substring search instead of the much more complex backtracking stuff that the RE engine does. But I've no time to implement this anyway. ---------------------------------------------------------------------- Comment By: Jeffrey Hobbs (hobbs) Date: 2006-03-19 12:14 Message: Logged In: YES user_id=72656 The capturing appears to be the real killer. Remove that the slowdown is small. The match is still accurate, so I don't know what extra bits the RE is doing when capturing. ---------------------------------------------------------------------- Comment By: miguel sofer (msofer) Date: 2006-03-18 11:51 Message: Logged In: YES user_id=148712 bstellar's problem is solved, but the performance bug is still there ... ---------------------------------------------------------------------- Comment By: bstellar (bstellar) Date: 2006-03-18 11:33 Message: Logged In: YES user_id=1479322 Excellent, this help me very much... ---------------------------------------------------------------------- Comment By: miguel sofer (msofer) Date: 2006-03-18 05:50 Message: Logged In: YES user_id=148712 Oops: [string last], not [string first]. Still much faster: % time { if {[set n [string last = $sEnvPair]] != -1} { set sEnvName [string range $sEnvPair 0 [expr {$n-1}]] set sEnvValue [string range $sEnvPair [incr n] end] } } 1000 1040.019 microseconds per iteration Just by looking at your snippet of code, I'm guessing that you want [string first], and that your regexp is wrong as it will produce the wrong results if anybody has an '=' in his name. ---------------------------------------------------------------------- Comment By: miguel sofer (msofer) Date: 2006-03-18 05:35 Message: Logged In: YES user_id=148712 The performance confirmed bad on modern Tcl (8.3 is already quite old, if you are migrating you should use 8.4). Since 8.0 the regexp engine has been completely replaced; the new unicode awareness does make it slower. However, for your particular case, the use of regexp is probably not necessary: a combination of [string last] and [string range] is much more efficient. Look: % time { if {[set n [string first = $sEnvPair]] != -1} { set sEnvName [string range $sEnvPair 0 [expr {$n-1}]] set sEnvValue [string range $sEnvPair [incr n] end] } } 1000 168.87 microseconds per iteration % time { if {[regexp {^(.*)=(.*)$} $sEnvPair sDummy sEnvName sEnvValue] == 1} { } } 22609833 microseconds per iteration BTW: priority 9 is for release-blocking bugs; this is nowhere near qualifying. ---------------------------------------------------------------------- Comment By: bstellar (bstellar) Date: 2006-03-17 20:55 Message: Logged In: YES user_id=1479322 This regexp creates delay, please help; if { [ regexp {^(.*)=(.*)$} $sEnvPair sDummy sEnvName sEnvValue ] == 1 } { ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=110894&aid=1452969&group_id=10894 |