#4530 Pathologically slow regexp execution

obsolete: 8.5.7
closed-fixed
7
2010-02-01
2010-01-30
Tom Lane
No

The attached test case "slowregexp.tcl" demonstrates a case that results in O(N^2) slowdown in execution of a regexp match, as a result of nested searches in crevdissect: the outer invocation searches for a suitable midpoint, and for each possible midpoint the inner execution checks all possible locations for its midpoint.
I tested this against 8.5.7 but it doesn't look like the relevant files have changed since then.

There are a number of ways that this might be attacked, but I attach a fairly simple patch that seems to improve matters for me. The idea is that to verify a midpoint requires a longest() call as well as recursively cdissect()ing the left and right sides --- but we could do the longest() check first since that's the fastest. This reduces the runtime of the given test case about 500X on my machine. There are two possible flaws in this solution:

1. It won't work if longest() depends on cdissect() having set up some state --- for instance, establishing backref string boundaries. As far as I can see, that isn't the case, but I'm no expert on this code so I might've missed something. I can say that the Tcl regression tests still pass, though.

2. It's conceivable that there are cases where this way is slower rather than faster. I can't answer that at all. The regression test cases seem to take just about the same amount of time as before, though.

FYI, this problem was originally submitted to the Postgres project at http://archives.postgresql.org/pgsql-hackers/2010-01/msg02864.php

Discussion

  • Tom Lane
    Tom Lane
    2010-01-30

    slowregexp.tcl

     
    Attachments
  • Tom Lane
    Tom Lane
    2010-01-30

    regexp.patch

     
    Attachments
  • As far as I can see this works, and I can't see any problems with it. (The speedup is nearly 800 times on the sample case as it happens: 94.3s -> 120ms)

    Applied to HEAD. Backports still pending.

     
    • priority: 5 --> 7
    • assigned_to: pvgoran --> dkf
    • status: open --> open-accepted
     
  • Backported to 8.5

     
  • Backported to 8.4

     
    • status: open-accepted --> closed-fixed