Hi Tcl team,
Recently a colleague (Tavis Ormandy) and I uncovered some bugs in
Postgresql's regular expression engine. It turns out that they're using
code derived from Tcl. With Tom Lane's help, we have some tentative
patches, but first the bugs :-)
The out of band read in the UCS-4-enabled build -- [CVE-2007-4769]
(the numeric value must be > INT_MAX and <= UINT_MAX)
select text (E'\\' || '3161573148') ~ (E'\\' || '3161573148');
regexp {\3161573148} {\3161573148};
(The code location applies to postgres
Backreferences over INT_MAX slip through the check for a single-digit due
to
a signedness error in regcomp.c's lexescape() near line 948. v->numsubexp
is a signed integer and c is unsigned. The cast to a signed int avoids
warnings but means that any c > INT_MAX will pass the test against the
default nsubexp value of 10.
This mistake allows indexing into v->subs in regcomp.c's parsebranch()
near
line 948 to be controlled. However, this is just a NULL test. If the
value
given indexes out of valid memory, a segmentation fault will occur and the
child process will die.
An infinite loop -- [CVE-2007-4772]\
regexp {($|^)*} {x}
Tom Lane provided an analysis:
"""
The problem seems to occur when
an NFA state has a regular ^-constraint as an outgoing arc, plus both
^-constraint and $-constraint outgoing arcs that loop back to itself.
The latter two should be destroyed as useless, but the code didn't get
rid of the $-constraint soon enough and instead transferred it to the
new state that was generated as part of pushing the regular ^-constraint
back to this state's predecessor states. This then re-created a
situation that needed another iteration of pullback().
The proposed fix deals with this by changing the order of operations:
useless arcs are destroyed in a separate pass before we do the pull()
calls.
I made a similar change in pushfwd(), which has effectively identical
logic. This part of the patch is not needed to fix the particular
test cases we have, and I'm not certain it can be a problem once
pullback() has gotten rid of ^-constraints. But it seems like a good
idea to keep the logic in sync, and maybe there are cases where it's
necessary for pushfwd() to protect itself.
The patches pass the Postgres and Tcl regression tests, but I'd sure
like to have someone who understands this regex code look at them.
In particular I'm not entirely sure whether it's OK to flush *all*
non-PLAIN circular arcs.
"""
An expensive regular expression:
regexp {(x{200}){200}$y} {x}
Again with excellent analysis from Tom Lane:
"""
This is not as slow as your original (about 2m rather than 15m on
my machine), but it illustrates the basic issue: the doubly nested
iteration generates 200*200 = 40000 NFA states, all of which are
later found to be dead because the trailing "$y" cannot possibly
match anything, and then when we try to clean them up we spend
O(N^2) time at it because they're all on the same colorchain.
AFAICS the only use of the colorchain data structure is in okcolors(),
which could probably be rewritten to do one iteration over all the NFA's
states/arcs. So I'm tempted to propose just getting rid of the
colorchain lists. But I'd *really* want to talk to someone who knows
this code better, first. There might be cases where the colorchains
are essential for performance.
"""
It'd be great if we could chat about this more, but I wasn't sure the best
place to email. I know that Tom Lane would appreciate any additional
insight before patching postgresql. Tom currently has a few patches
explicitly for postgres, but they appear that they'll be pretty portable.
Here are the relevant addresses:
Will Drewry <wad@google.com>
Tavis Ormandy <taviso@google.com>
Tom Lane <tgl@sss.pgh.pa.us>
Postgresql Security <security@postgresql.org>
Thanks!
will
Donal K. Fellows
43. Regexp
current: 8.4.19
Public
|
Date: 2008-05-26 22:12
|
|
Date: 2008-05-26 20:53
|
|
Date: 2007-12-18 11:25
|
|
Date: 2007-12-18 10:54
|
|
Date: 2007-12-17 20:17
|
|
Date: 2007-12-01 20:49
|
|
Date: 2007-11-26 17:55
|
|
Date: 2007-11-15 22:05
|
|
Date: 2007-11-15 22:03
|
|
Date: 2007-11-15 22:01
|
|
Date: 2007-11-15 21:15
|
|
Date: 2007-11-15 21:07
|
|
Date: 2007-10-30 10:30
|
|
Date: 2007-10-29 22:20
|
|
Date: 2007-10-29 21:24
|
|
Date: 2007-10-27 14:00
|
| Filename | Description | Download |
|---|---|---|
| tcl8.5b3-state-limit.patch | Patch for TCL 8.5b3 which implements a maximum total number of states for a given NFA and its children. | Download |
| Field | Old Value | Date | By |
|---|---|---|---|
| close_date | - | 2008-05-26 22:12 | hobbs |
| status_id | Open | 2008-05-26 22:12 | hobbs |
| status_id | Closed | 2008-05-26 20:53 | hobbs |
| close_date | 2007-12-18 11:25 | 2008-05-26 20:53 | hobbs |
| artifact_group_id | obsolete: 8.5b1 | 2008-05-26 20:53 | hobbs |
| resolution_id | None | 2007-12-18 11:25 | dkf |
| close_date | - | 2007-12-18 11:25 | dkf |
| is_private | 1 | 2007-12-18 11:25 | dkf |
| status_id | Open | 2007-12-18 11:25 | dkf |
| priority | 5 | 2007-12-18 10:54 | dkf |
| assigned_to | pvgoran | 2007-12-17 21:17 | dgp |
| is_private | 0 | 2007-12-17 20:17 | gwad |
| File Added | 258928: tcl8.5b3-state-limit.patch | 2007-12-17 20:17 | gwad |
| is_private | 1 | 2007-12-01 20:49 | dkf |
| summary | Out of band read and expensive/looping operations | 2007-11-15 22:05 | dgp |
| priority | 9 | 2007-11-15 22:01 | dgp |
| assigned_to | dkf | 2007-10-30 10:30 | dkf |
| assigned_to | pvgoran | 2007-10-29 22:20 | dgp |
| priority | 5 | 2007-10-09 18:20 | hobbs |
Copyright © 2010 Geeknet, Inc. All rights reserved. Terms of Use