Menu

#2413 Reset RESearch nfa when compiling new pattern

Bug
closed-fixed
5
2023-12-25
2023-11-10
Zufu Liu
No

failure steps:

  1. find text with \<.
  2. find text with \>, the pattern failed to compile, due to:
case '>':
    if (*sp == BOW)
        return badpat("Null pattern inside \\<\\>");
    *mp++ = EOW;
    break;
1 Attachments

Discussion

  • Zufu Liu

    Zufu Liu - 2023-11-10

    tagstk is moved local into RESearch::Compile().

     
  • Zufu Liu

    Zufu Liu - 2023-11-10

    bittab also needs to be cleared in case of earlier return inside /* match char class */ block.

     
  • Neil Hodgson

    Neil Hodgson - 2023-11-16

    This moves a 4K memset from one-time creation to each search. Have you checked the performance impact?

     
  • Neil Hodgson

    Neil Hodgson - 2023-11-17

    Its likely a more complex check for \<\> would avoid the false failure. It needs to check the previous op-code for BOW: something like mp[-1] == BOW with appropriate checking for start of buffer to avoid out-of-bounds.

     
  • Zufu Liu

    Zufu Liu - 2023-11-17

    I think the failure is due to read saved pointer (sp or lp) beyond max written pointer mp, wrap *sp with following helper also fixed the bug:

    constexpr char GetOpCode(const char *sp, const char *mp) noexcept {
        return (sp < mp) ? *sp : END;
    }
    
    if (GetOpCode(sp, mp) == BOW)
        return badpat("Null pattern inside \\<\\>");
    

    I think above change is better than using mp[-1] as distance between sp and mp may large than 1.

    RESearch-2413-memset-1117.diff contains same change, but changed std::fill(bittab to memset(bitta. I'm plan to add another change to replace following loop:

    for (n = 0; n < BITBLK; bittab[n++] = 0)
        *mp++ = static_cast<char>(mask ^ bittab[n]);
    

    with memcpy() and memset() (negative part is only needed for /* match char class */ block):

    if (negative) {
        size_t * const ptr = reinterpret_cast<size_t *>(bittab);
        for (unsigned n = 0; n < BITBLK / sizeof(size_t); n++) {
            ptr[n] = ~ptr[n];
        }
    }
    
    *mp++ = CCL;
    memcpy(mp, bittab, BITBLK);
    memset(bittab, 0, BITBLK);
    mp += BITBLK;
    
     
  • Neil Hodgson

    Neil Hodgson - 2023-11-17

    but changed std::fill(bittab to memset(bitta

    std::fill is more type-safe than memset.

     
  • Zufu Liu

    Zufu Liu - 2023-11-17

    Revert patch to RESearch-nfa-bittab-1111.diff, but with only nfa[0] = END; instead of clear whole 4K: reading from sp is valid after the loop been executed once, after sp = lp;, then sp is always lees than mp.

     
  • Neil Hodgson

    Neil Hodgson - 2023-11-21
    • status: open --> open-fixed
     
  • Neil Hodgson

    Neil Hodgson - 2023-11-21

    Committed with [03bbb3] including minor changes. Since std::fill may throw, noexcept removed from Compile. Its not unreasonable for something as complex as Compile to throw and it already reports failure with strings. Also changed unit test to point at this issue.

     

    Related

    Commit: [03bbb3]

  • Neil Hodgson

    Neil Hodgson - 2023-12-25
    • status: open-fixed --> closed-fixed
     

Log in to post a comment.

MongoDB Logo MongoDB