From: SourceForge.net <no...@so...> - 2010-01-30 06:46:25
|
Bugs item #2942697, was opened at 2010-01-30 01:46 Message generated for change (Tracker Item Submitted) made by tgl You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=110894&aid=2942697&group_id=10894 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: 43. Regexp Group: obsolete: 8.5.7 Status: Open Resolution: None Priority: 5 Private: No Submitted By: Tom Lane (tgl) Assigned to: Pavel Goran (pvgoran) Summary: Pathologically slow regexp execution Initial Comment: The attached test case "slowregexp.tcl" demonstrates a case that results in O(N^2) slowdown in execution of a regexp match, as a result of nested searches in crevdissect: the outer invocation searches for a suitable midpoint, and for each possible midpoint the inner execution checks all possible locations for its midpoint. I tested this against 8.5.7 but it doesn't look like the relevant files have changed since then. There are a number of ways that this might be attacked, but I attach a fairly simple patch that seems to improve matters for me. The idea is that to verify a midpoint requires a longest() call as well as recursively cdissect()ing the left and right sides --- but we could do the longest() check first since that's the fastest. This reduces the runtime of the given test case about 500X on my machine. There are two possible flaws in this solution: 1. It won't work if longest() depends on cdissect() having set up some state --- for instance, establishing backref string boundaries. As far as I can see, that isn't the case, but I'm no expert on this code so I might've missed something. I can say that the Tcl regression tests still pass, though. 2. It's conceivable that there are cases where this way is slower rather than faster. I can't answer that at all. The regression test cases seem to take just about the same amount of time as before, though. FYI, this problem was originally submitted to the Postgres project at http://archives.postgresql.org/pgsql-hackers/2010-01/msg02864.php ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=110894&aid=2942697&group_id=10894 |