Ivar Zarrans and myself have both encountered (separately) an
interesting "bug" in PostgreSQL performance. Basically, the query
planner, if given a query like this:
UPDATE trigger SET frozen=False WHERE object_id=23;
where object_id is an indexed big-int field of trigger, will *always*
choose a sequential scan. On a large table, that is going to be
unacceptably slow, especially when one is writing the queries with the
assumption that you are using an indexed field. The problem here is
that, even when a PgInt8 is passed as the object_id value, it is
inserted into the query as an int4 literal (a simple number). The
server then sees a request to search for the field with a value of a
*different type* than the available index, so ignores the index.
Apparently PostgreSQL 7.5 has a fix checked in for it, however, in the
meantime, any 7.4 or below DB using bigints for specifying where's on
large tables will be ridiculously slow on UPDATE (think multiple whole
minutes to update a single row of a 88,000 row table). Apparently this
is a general problem, int2 indexed fields also won't use the index if an
int4 literal is specified as the search term, though I'd guess that int8
is the most common problem, as both people discovering it were using that.
This can be worked around by:
* adding quotes around the value '23', which will make it an unknown
type
* coercing value to target type; i.e. 23::bigint anywhere you are
using an integer as a query term against a field that is int8 and
for which you need indexing
* altering all queries to explicitly coerce to the target type
* Greg Stark, on the postgres performance list, suggested using "the
new binary protocol for parameters [within pyPgSQL] so that there
are no quoting issues at all", but that's not going to work with
older PostgreSQL versions, and still wouldn't help if someone had
passed a regular integer (it would help if a PgInt8 was passed,
however).
The first two need to be done by creating a subclass or wrapper of
PgInt8 which does the munged representation in it's __str__/__repr__
method (so that pyPgSQL won't complain about an improperly formatted
value). Here's a sample class:
class Wrapper:
def __init__( self, value ):
self.value = value
def __repr__( self ):
return '%s::int8'%(self.value, )
__str__ = __repr__
It doesn't address the underlying problem, but it should allow people to
move forward if they find themselves in this situation.
Enjoy,
Mike
_______________________________________
Mike C. Fletcher
Designer, VR Plumber, Coder
http://members.rogers.com/mcfletch/
|