From: SourceForge.net <no...@so...> - 2013-03-03 17:34:42
|
Bugs item #3606683, was opened at 2013-03-03 09:34 Message generated for change (Tracker Item Submitted) made by tgl You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=110894&aid=3606683&group_id=10894 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: 43. Regexp Group: current: 8.6.0 Status: Open Resolution: None Priority: 5 Private: No Submitted By: Tom Lane (tgl) Assigned to: Pavel Goran (pvgoran) Summary: fix for 3604074 does not work Initial Comment: I couldn't find any way to reopen or comment on the already-closed bug 3604074, so here is a new bug report to say that the fix for that one doesn't work: if you build with the asserts in Spencer's code enabled, you'll find that the Tcl regression tests draw an assertion failure, because fixempties() fails to eliminate all the EMPTY arcs. The proximate cause of that is a small oversight in the patch: the first sub-loop in fixempties supposes that it doesn't need to do the "nexta = a->outchain" dance, but that's wrong because unempty is still capable of deleting the arc immediately, if the arc is a "vacuous loop". However, even with that repaired I don't find the fix implemented for 3604074 to be very trustworthy. The patch will cope if expansion of a later out-arc for the current state tries to re-add an empty out-arc *to that state* that was already processed. However, it doesn't detect the case where we re-add an empty out-arc to a previously processed state. Nor is it obvious that a removed empty out-arc can't be regenerated in a later pass (which after all was the failure mode exhibited in 3604074). I couldn't find a test case exhibiting an infinite loop, but I don't have a good feeling about that. I am also suspicious that the new code is outright wrong, because it seems to leave out some arguably-necessary arcs when I compare its final NFA state to the rewritten version attached. I don't have a test case to prove that either, but I'm suspicious that the timing of elimination of empty arcs might not be quite right now. After some thought and experimentation I devised a new algorithm that clearly terminates, because it doesn't have any "repeat until done" flavor to it. As a bonus, it seems to be faster than the old one, and it's clearer that it creates all the required replacement arcs. A few miscellaneous notes: * It was alleged in the comments to 3604074 that Postgres had fixed this bug, but actually the reason it doesn't show up for us is only that in http://git.postgresql.org/gitweb/?p=postgresql.git&a=commitdiff&h=173e29aa5deefd9e71c183583ba37805c8102a72 we had gotten rid of the expansion of "a*" to "a+|", and it's the empty alternative that's tickling this bug. If you try select 'a' ~ '((((((a+|)+|)+|)+|)+|)+|)'; it still fails in released Postgres versions. * The reason I found out that your patch is broken is that as Postgres has things set up, the assert()s in Spencer's code are enabled in standard development builds. This evidently isn't the case in Tcl, but I suggest you might want to rethink that. * While fooling with the fixempties algorithm, I found that it was very easy to make CVE-2007-4772 (the infinite loop in pullback() for "($|^)*") come back, if the output NFA was even slightly nonoptimal. So I fear that that bug is not really fixed either: it seems to be fairly easy to make pullback() create new states as fast as it can process them. pullback looks vaguely similar to what fixempties is doing, so I wonder whether it could be rewritten in a similar style to this patch. I've not tried it though. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=110894&aid=3606683&group_id=10894 |