Re: Indexes

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Indices are only useful for an exact search not a fuzzy search as is used
on most of the queries looking for a name or partnumber.

example:
dws=# create index parts_idx on parts (partnumber);
CREATE
dws=# explain select * from parts where partnumber = '123';
NOTICE:  QUERY PLAN:

Index Scan using parts_idx on parts  (cost=0.00..2.01 rows=1 width=120)

EXPLAIN
dws=# explain select * from parts where partnumber LIKE '%123';
NOTICE:  QUERY PLAN:

Seq Scan on parts  (cost=0.00..17.06 rows=1 width=120)

EXPLAIN

You see that the index is not used at all on a 'LIKE' search.

If you remove the % from the backend code the index is used

dws=# explain select * from parts where partnumber LIKE '123';
NOTICE:  QUERY PLAN:

Index Scan using parts_idx on parts  (cost=0.00..2.01 rows=1 width=120)

EXPLAIN

If you run a database with 10,000+ tuples in the parts table it certainly
pays off to change the backend from a fuzzy search to an exact search. You
lose a lot of flexibility but the performance increase makes it all worth
while.

With the exception of having an index for the ID, dates and tablepointers
it does not make much sense to create an index for a name, partnumber or
invnumber.

Dieter Simader    http://www.sql-ledger.org   (780) 472-8161
DWS Systems Inc.     Accounting Software       Fax: 478-5281
=========== On a clear disk you can seek forever ===========

On Mon, 7 Jan 2002, Ho-Sheng Hsiao wrote:

> On Sun, Jan 06, 2002 at 09:11:11AM -0800, Wes Warner wrote:
> > Could you please post these commands?
> 
> Sure. You can cut-and-paste or save it into a file:
> 
> BEGIN WORK;
> create index acc_trans_tid_idx ON acc_trans (trans_id);
> create index acc_trans_cid_idx ON acc_trans (chart_id);
> create index acc_trans_tdate_idx ON acc_trans (transdate);
> create unique index parts_id_idx ON parts (id);
> create index parts_lower_desc_idx ON parts (lower(description));
> create index parts_lower_pnumber_idx ON parts (lower(partnumber));
> create unique index ap_id_idx ON ap (id);
> create index ap_transdate_idx ON ap (transdate);
> create index ap_datepaid_idx ON ap (datepaid);
> create index ap_invnumber_idx ON ap (invnumber);
> create index ap_ordnumber_idx on ap (ordnumber);
> create index ap_vendor_id_idx ON ap (vendor);
> create unique index ar_id_idx ON ar (id);
> create index ar_transdate_idx ON ar (transdate);
> create index ar_datepaid_idx ON ar (datepaid);
> create index ar_invnumber_idx ON ar (invnumber);
> create index ar_customer_id_idx ON ar (customer);
> create unique index customer_id_idx ON customer (id);
> create index customer_macc_idx ON customer (macc);
> create index customer_lower_name_idx ON customer (lower(name));
> create unique index vendor_id_idx ON vendor (id);
> create index vendor_macc_idx ON vendor (macc);
> create index vendor_lower_name_idx ON customer (lower(name));
> create index customer_tax_idx ON customertax (customer_id);
> create index vendor_tax_idx ON vendortax (vendor_id);
> create index parts_tax_idx ON partstax (parts_id);
> create index chart_link_idx ON chart (link);
> create index tax_idx ON tax (chart_id);
> create unique index gl_idx ON gl (id);
> create index gl_transdate_idx ON gl (transdate);
> COMMIT WORK;
> 
> For 1.8.0, at the very least add:
> 
> BEGIN WORK;
> create unique index oe_id_idx ON oe (id);
> create index oe_ordnumber_idx ON oe (ordnumber);
> create index oe_transdate_idx ON oe (transdate);
> COMMIT WORK;
> 
> I have not tested this stuff against a 1.8.x platform so, Your Mileage
> May Vary.
> 
> For those who want to tinker, I've listed below the way I've chosen
> these indexes were based on several criteria:
> 
> - All primary keys MUST have UNIQUE INDEX. That would be the
> customer.id, vendor.id, parts.id, oe.id, ap.id, ar.id, etc. MySQL has
> a PRIMARY KEY syntax which might even be in standard ANSI SQL and
> therefore PostgreSQL -- declaring within a table that a field is a
> primary key automagically creates the unique indexes which in turn
> gives the optimizer something to work with. 
> 
> - Then, the secondary indexes are chosen on common JOIN points or
> WHERE clauses. Having messed with the guts of the code for 80 hours
> last week, I had a pretty good idea of what were frequently called
> stuff. For example, in searching for parts by description, the code
> uses lower()  -- which I understand has been taken out in 1.8.x -- so
> in the above I had a
> 
> CREATE INDEX customer_lower_name_idx ON customer (lower(name));
> 
> That's supposed to make the search faster. Another commonly searched
> criteria is based on dates. So I made sure all the transaction date
> fields has an index. I used guestimates. If I were in the "slow and
> careful" rather than the "fast and sloppy" mode, I would have used the
> EXPLAIN for each of the queries and have PG tell me precisely which
> indices were being used and which ones weren't. 
> 
> I might have gone overboard and added too many fields. That typically
> affects INSERTs and UPDATEs where for each index, the backend has to
> add another entry into the index. What's fun though, is that inserts
> into the indices are typically O(log n) rather than O(n) (e.g. 20
> instead of 1 million). So there's a tradeoff between what sort of
> indices are necessary and what arn't. Certainly there's room for tweaks. 
> 
> I timed the operation by calling the modules via commandline:
> 
> cd sql-ledger
> time ./ar.pl "login=myloginstuff&action=something& ... "
> 
> It works assuming you've created a login where no passwords are
> required. 
> 
>