#75 severe matching performance regression

closed-fixed
Jeffrey Hobbs
None
6
2010-08-27
2010-06-02
Per Cederqvist
No

The gate keeper glob pattern matching added on 2007-07-11 can be very slow. In a test suite I use it seems to be O(n*n) or worse in the length of the input buffer. The attached test file spawns an echo command that prints a fixed string, truncated to different lengths. It then uses a regular expression to match the first line of the input.

In expect 5.43.0, this was super-fast (the test prints 0 or 1 ms for each test). Using 5.44.1.15, I get the following output on the same computer:

Trying 80... done (86 ms)
Trying 90... done (194 ms)
Trying 95... done (397 ms)
Trying 100... done (789 ms)
Trying 101... done (1088 ms)
Trying 102... done (1134 ms)
Trying 103... done (1551 ms)
Trying 104... done (1573 ms)
Trying 105... done (1609 ms)
Trying 106... done (1627 ms)
Trying 107... done (2183 ms)
Trying 108... done (2209 ms)
Trying 109... done (2867 ms)
Trying 110... done (2917 ms)
Trying 111... done (2958 ms)
Trying 112... done (2996 ms)
Trying 113... done (3857 ms)
Trying 114... done (3898 ms)
Trying 115... done (5088 ms)

Yes, that is right. More than five seconds to perform a match that expect 5.43 did in 1 millisecond.

The test case is a simplified demonstration of a bug I found in a real-life test suite for the LysKOM server. See http://www.lysator.liu.se/lyskom/lyskom-server/ if you want to see the real-life test suite.

Expect version: 5.44.1.15 (compiled from source on Ubuntu 8.04) and 5.44.1.14 (as included in Ubuntu 10.04).

Discussion

1 2 > >> (Page 1 of 2)
  • Per Cederqvist
    Per Cederqvist
    2010-06-02

    Reproduce the slow matching

     
    Attachments
  • Per Cederqvist
    Per Cederqvist
    2010-06-02

    Running expect with -d indicates that it uses this glob gate:

    MRK:client1: :18 14 6 * * * * * * * * * 5 0 8 0 2 { 0 1 6 5 } 0 *\n

    Things might work better if you ensure that the glob gate never contains more than a single "*". I susepct that this would be a much better gate glob:

    MRK:client1: :18 14 6 *\n

     
  • Jeffrey Hobbs
    Jeffrey Hobbs
    2010-06-02

    • priority: 5 --> 6
    • assigned_to: nobody --> hobbs
     
  • Jeffrey Hobbs
    Jeffrey Hobbs
    2010-06-02

    I believe this is related to the slow performance of the Re2Glob glob matching for multiple * patterns.

     
  • Jeffrey Hobbs
    Jeffrey Hobbs
    2010-06-02

    BTW, while I do mean to address this in the code, the RE presented is quite suboptimal. It is:

    "^MRK:client1: :18 14 6 \[0-9\]* \[0-9\]* \[0-9\]* \[0-9\]* \[0-9\]* \[0-9\]* \[0-9\]* \[0-9\]* \[0-9\]* 5 0 8 0 2 { 0 1 6 5 } 0 \\\\*\r?\n"

    But what is the point of all the _zero_ or more matches one after the other? It really should be just one * (or RE .*).

     
  • Jeffrey Hobbs
    Jeffrey Hobbs
    2010-06-02

    Let me adjust that ... it is just allowing for separation between space elements. Still could be optimized, but that level of opt isn't being handled low-level in Re2Glob.

     
  • Andreas was kind enough to explain the basic mechanism of the 'glob gate' to me on the chat. Now I am scared, because this mechanism implies that even a carefully deterministic RE passed to Expect in -re, will get approximated by a dumbo glob pattern which is much less deterministic, hence will take ages to parse.

    Example: {^[0-9]+ [0-9]+$}, which is fully deterministic, becomes something like {* *}, which is not at all.

    [00:10] * jenglish observing from the sidelines, notes that "Q: is this regular expression deterministic" is approximately as expensive as "Q: please convert this regular expression to an NFA, kthxbye."
    [00:11] * ferrieux wonders why there's an RE2GLOB in the first place, for this very reason.

    Question: since (1) the regexp engine works by compiling the RE into a DFA, (2) DFA runtime is extremely fast, (3) DFA compilation time is not an issue for the -re flag of Expect, and (4) DFA size will be very small for typically hand-determinized regexps... why do we go through this dangerous glob step ?

     
  • Per Cederqvist
    Per Cederqvist
    2010-07-26

    Add test case for this issue.

     
    Attachments
  • Per Cederqvist
    Per Cederqvist
    2010-07-26

    Preferred solution: remove the glob gate functionality.

     
    Attachments
  • Per Cederqvist
    Per Cederqvist
    2010-07-26

    Alternate solution: disable glob gate if the generated glob is too complex.

     
    Attachments
1 2 > >> (Page 1 of 2)