|
From: Ho-Sheng H. <qa...@ne...> - 2002-01-06 09:31:49
|
Hi I noticed that my install of PostgreSQL didn't add indexes to the tables. I have heard that those indexes were automatically created... but I listed them with \dt on the psql tool, and there were only one index, the one for the chart table. I ran an informal test on an "end-of-day" module using data from this client. There were 13 transactions. The stock install of RedHat PostgreSQL 7.1 install took approx 8 to 9 seconds for it to load. I followed the optimization hints on PHPBuilder and increased the shared buffer and sort memory, which shaved it down to 7 seconds. (Shared memory from 1 MB to 50 MB. There were postings on there that says that all that really does was force the kernel to use less memory for disk cache. I do not know.) Then I added the indexes: unique indexes for ar, ap, parts, customer, vendor, and various integer and text fields in those and acc_trans, the taxes, etc. etc. including one that indexes the lower-cased version of the customer/vendor/partnumber names. That same app went from 7 seconds to 0.7 seconds the _first_ time. The subsequent queries were 0.2 seconds. This was all done on a PII 200 with only 64 MB of ram, running the Apache and Samba, during off-business hours. A difference of 10 seconds down to less than 1 second is pretty significant, especially considering that the people up front doing the point-of-sale need to access the data quick. The bookkeeper working here gets frustrated at the "slowness" of the "new system" which discouraged the adoption. I'll see how the psychological impact tommorrow ... I figure that the extra time it takes to do inserts was less noticable than the improvements on select queries. Another informal and non-rigourous test used about 1 year worth of backdata in accounts receivable to do the Aging function. Before indexes = timing out the browser (the operation could not be completed). Now, it's completed in about 20 seconds. What I am wondering is if anyone has ran into performance problems and have fixed/optimized/tuned/tweaked and is willing to share. -- -Qaexl- |
|
From: Roland S. <sql...@st...> - 2002-01-06 10:14:58
|
Ehhh guys???? Indexes makes search-queries go from O(n) to O log(n) Read O as order That's about the biggest performance difference you can get in a database!!!!! Say you have a milion records in a table, the search query goes from 1 million compares to 20 or less depending on the index-type of the db. That's some kick-ass speed difference..... I assumed they were in place. Roland On Sunday 6 January 2002 10:31, you wrote: > Hi > > I noticed that my install of PostgreSQL didn't add indexes to the > tables. I have heard that those indexes were automatically created... > but I listed them with \dt on the psql tool, and there were only one > index, the one for the chart table. > > I ran an informal test on an "end-of-day" module using data from this > client. There were 13 transactions. The stock install of RedHat > PostgreSQL 7.1 install took approx 8 to 9 seconds for it to load. I > followed the optimization hints on PHPBuilder and increased the shared > buffer and sort memory, which shaved it down to 7 seconds. (Shared > memory from 1 MB to 50 MB. There were postings on there that says that > all that really does was force the kernel to use less memory for disk > cache. I do not know.) > > Then I added the indexes: unique indexes for ar, ap, parts, customer, > vendor, and various integer and text fields in those and acc_trans, > the taxes, etc. etc. including one that indexes the lower-cased > version of the customer/vendor/partnumber names. > > That same app went from 7 seconds to 0.7 seconds the _first_ time. The > subsequent queries were 0.2 seconds. > > This was all done on a PII 200 with only 64 MB of ram, running the > Apache and Samba, during off-business hours. > > A difference of 10 seconds down to less than 1 second is pretty > significant, especially considering that the people up front doing the > point-of-sale need to access the data quick. The bookkeeper working > here gets frustrated at the "slowness" of the "new system" which > discouraged the adoption. I'll see how the psychological impact > tommorrow ... I figure that the extra time it takes to do inserts was > less noticable than the improvements on select queries. > > Another informal and non-rigourous test used about 1 year worth of > backdata in accounts receivable to do the Aging function. Before > indexes = timing out the browser (the operation could not be > completed). Now, it's completed in about 20 seconds. > > What I am wondering is if anyone has ran into performance problems and > have fixed/optimized/tuned/tweaked and is willing to share. |
|
From: Ho-Sheng H. <qa...@ne...> - 2002-01-06 14:38:04
|
On Sun, Jan 06, 2002 at 11:26:33AM +0100, Roland Stoker wrote: > Ehhh guys???? > > Indexes makes search-queries go from O(n) to O log(n) > Read O as order > That's about the biggest performance difference you can get in a database!!!!! Hey, I remember comp sci 101. I think. Maybe I should put that tidbit into a database ;-) > Say you have a milion records in a table, the search query goes from 1 > million compares to 20 or less depending on the index-type of the db. I've read through the PostgreSQL docs, and PG have three index types. Btree seems to be the most common. Doesn't matter though as you said, since these are "orders". 1 million compares on a 1.1 gz server is still going to be slow compared to 20 or less on say a 200 mhz server. > That's some kick-ass speed difference..... Hell yeah! > I assumed they were in place. Well, whether they were or not, they definately are here to stay for me. I mean, we were using this sucker for updating inventory and adding in barcodes. Updating a single item required the computer to search-partnumber, click-to-edit, update, save, callback to search-partnumber each required a hit on the server. Each on of those operations _used_ to take about 2 or 3 seconds (13,000 part numbers). That adds up to at least 12 to 18 seconds per part in that procedure. Now it's taking less than a second. So before, the computer was taking longer than the humans... which led to some other issues like employees slacking off (little incentive to be more efficient, since the computer would still hang you up) or getting frustrated. I watched how four of the employees worked: they would click and THEN wait until the browser finishes drawing the screen. Or they would start clicking on the submit buttons more often, not understanding that clicking on submit would just reset the search time and thereby take even longer! BTW: there's lots of little things you can help someone who has to enter in volumes of info per day ... things as simple as using the Javascript to move the keyboard focus on the most commonly filled field helps tremendously. I stole ... um, imitated the code from google.com. We can now just take the barcode scanner and shoot each item, boom boom boom, and the retail customer can pay us faster :-) I'm not saying all of this about the users to sneer at or to talk about the mythical "average dumb (frustrated) user". A business builds on OPT -- Other People's Time -- so the owner can play golf and get richer. (Well, at least I do. Heh.) It's easier to build and tweak interface systems that automate as much of the stupid, and silly things than to teach someone how to learn and adapt and think on their feet. The latter quickly leads to politically incorrect ideas. The old system this business was using had it's own problems, but the response time between keypresses and screen updates were probably less than 0.5s ... much less than that. A trained personnel can fly through that, knowing all the shortcuts and keystrokes, etc. One reason some of us use UNIX and CLIs rather than GUIs. (The downside is that the turnover rate for people here is too high -- a newbie staff member would come in and gets overwhelmed with the old POS interface. OPT not effectively used.). The owner, though, LIKES SQL-ledger and the software, since it integrates all the data about cash coming in and cash going out (and therefore my income) and the software works on his existing Linux server. Cool, eh? If anyone asks, I'll post the sql commands to add the indexes that'll work on a 1.6.x system and a "off the top of my head" for an 1.8.x system ... my indices were really ad hoc (fast and sloppy) and I hadn't taken the time to really analyze and profile each of the queries to get the optimal mix. For the SL developers here working on database apps for the first time and want to find out more about how PostgreSQL uses indexes, check out their interactive documentation at http://www.postgresql.org or http://www3.us.postgresql.org (mirror). Search under "index" or "indices" and look for the document with the title "Indices". -Qaexl- |
|
From: Wes W. <ufo...@ea...> - 2002-01-06 17:11:54
|
Could you please post these commands? Thanks, Wes > If anyone asks, I'll post the sql commands to add the indexes that'll > work on a 1.6.x system and a "off the top of my head" for an 1.8.x > system ... my indices were really ad hoc (fast and sloppy) and I > hadn't taken the time to really analyze and profile each of the > queries to get the optimal mix. |
|
From: Ho-Sheng H. <qa...@ne...> - 2002-01-07 08:29:22
|
On Sun, Jan 06, 2002 at 09:11:11AM -0800, Wes Warner wrote: > Could you please post these commands? Sure. You can cut-and-paste or save it into a file: BEGIN WORK; create index acc_trans_tid_idx ON acc_trans (trans_id); create index acc_trans_cid_idx ON acc_trans (chart_id); create index acc_trans_tdate_idx ON acc_trans (transdate); create unique index parts_id_idx ON parts (id); create index parts_lower_desc_idx ON parts (lower(description)); create index parts_lower_pnumber_idx ON parts (lower(partnumber)); create unique index ap_id_idx ON ap (id); create index ap_transdate_idx ON ap (transdate); create index ap_datepaid_idx ON ap (datepaid); create index ap_invnumber_idx ON ap (invnumber); create index ap_ordnumber_idx on ap (ordnumber); create index ap_vendor_id_idx ON ap (vendor); create unique index ar_id_idx ON ar (id); create index ar_transdate_idx ON ar (transdate); create index ar_datepaid_idx ON ar (datepaid); create index ar_invnumber_idx ON ar (invnumber); create index ar_customer_id_idx ON ar (customer); create unique index customer_id_idx ON customer (id); create index customer_macc_idx ON customer (macc); create index customer_lower_name_idx ON customer (lower(name)); create unique index vendor_id_idx ON vendor (id); create index vendor_macc_idx ON vendor (macc); create index vendor_lower_name_idx ON customer (lower(name)); create index customer_tax_idx ON customertax (customer_id); create index vendor_tax_idx ON vendortax (vendor_id); create index parts_tax_idx ON partstax (parts_id); create index chart_link_idx ON chart (link); create index tax_idx ON tax (chart_id); create unique index gl_idx ON gl (id); create index gl_transdate_idx ON gl (transdate); COMMIT WORK; For 1.8.0, at the very least add: BEGIN WORK; create unique index oe_id_idx ON oe (id); create index oe_ordnumber_idx ON oe (ordnumber); create index oe_transdate_idx ON oe (transdate); COMMIT WORK; I have not tested this stuff against a 1.8.x platform so, Your Mileage May Vary. For those who want to tinker, I've listed below the way I've chosen these indexes were based on several criteria: - All primary keys MUST have UNIQUE INDEX. That would be the customer.id, vendor.id, parts.id, oe.id, ap.id, ar.id, etc. MySQL has a PRIMARY KEY syntax which might even be in standard ANSI SQL and therefore PostgreSQL -- declaring within a table that a field is a primary key automagically creates the unique indexes which in turn gives the optimizer something to work with. - Then, the secondary indexes are chosen on common JOIN points or WHERE clauses. Having messed with the guts of the code for 80 hours last week, I had a pretty good idea of what were frequently called stuff. For example, in searching for parts by description, the code uses lower() -- which I understand has been taken out in 1.8.x -- so in the above I had a CREATE INDEX customer_lower_name_idx ON customer (lower(name)); That's supposed to make the search faster. Another commonly searched criteria is based on dates. So I made sure all the transaction date fields has an index. I used guestimates. If I were in the "slow and careful" rather than the "fast and sloppy" mode, I would have used the EXPLAIN for each of the queries and have PG tell me precisely which indices were being used and which ones weren't. I might have gone overboard and added too many fields. That typically affects INSERTs and UPDATEs where for each index, the backend has to add another entry into the index. What's fun though, is that inserts into the indices are typically O(log n) rather than O(n) (e.g. 20 instead of 1 million). So there's a tradeoff between what sort of indices are necessary and what arn't. Certainly there's room for tweaks. I timed the operation by calling the modules via commandline: cd sql-ledger time ./ar.pl "login=myloginstuff&action=something& ... " It works assuming you've created a login where no passwords are required. -- -Qaexl- |
|
From: Dieter S. <dsi...@sq...> - 2002-01-07 18:25:18
|
Indices are only useful for an exact search not a fuzzy search as is used on most of the queries looking for a name or partnumber. example: dws=# create index parts_idx on parts (partnumber); CREATE dws=# explain select * from parts where partnumber = '123'; NOTICE: QUERY PLAN: Index Scan using parts_idx on parts (cost=0.00..2.01 rows=1 width=120) EXPLAIN dws=# explain select * from parts where partnumber LIKE '%123'; NOTICE: QUERY PLAN: Seq Scan on parts (cost=0.00..17.06 rows=1 width=120) EXPLAIN You see that the index is not used at all on a 'LIKE' search. If you remove the % from the backend code the index is used dws=# explain select * from parts where partnumber LIKE '123'; NOTICE: QUERY PLAN: Index Scan using parts_idx on parts (cost=0.00..2.01 rows=1 width=120) EXPLAIN If you run a database with 10,000+ tuples in the parts table it certainly pays off to change the backend from a fuzzy search to an exact search. You lose a lot of flexibility but the performance increase makes it all worth while. With the exception of having an index for the ID, dates and tablepointers it does not make much sense to create an index for a name, partnumber or invnumber. Dieter Simader http://www.sql-ledger.org (780) 472-8161 DWS Systems Inc. Accounting Software Fax: 478-5281 =========== On a clear disk you can seek forever =========== On Mon, 7 Jan 2002, Ho-Sheng Hsiao wrote: > On Sun, Jan 06, 2002 at 09:11:11AM -0800, Wes Warner wrote: > > Could you please post these commands? > > Sure. You can cut-and-paste or save it into a file: > > BEGIN WORK; > create index acc_trans_tid_idx ON acc_trans (trans_id); > create index acc_trans_cid_idx ON acc_trans (chart_id); > create index acc_trans_tdate_idx ON acc_trans (transdate); > create unique index parts_id_idx ON parts (id); > create index parts_lower_desc_idx ON parts (lower(description)); > create index parts_lower_pnumber_idx ON parts (lower(partnumber)); > create unique index ap_id_idx ON ap (id); > create index ap_transdate_idx ON ap (transdate); > create index ap_datepaid_idx ON ap (datepaid); > create index ap_invnumber_idx ON ap (invnumber); > create index ap_ordnumber_idx on ap (ordnumber); > create index ap_vendor_id_idx ON ap (vendor); > create unique index ar_id_idx ON ar (id); > create index ar_transdate_idx ON ar (transdate); > create index ar_datepaid_idx ON ar (datepaid); > create index ar_invnumber_idx ON ar (invnumber); > create index ar_customer_id_idx ON ar (customer); > create unique index customer_id_idx ON customer (id); > create index customer_macc_idx ON customer (macc); > create index customer_lower_name_idx ON customer (lower(name)); > create unique index vendor_id_idx ON vendor (id); > create index vendor_macc_idx ON vendor (macc); > create index vendor_lower_name_idx ON customer (lower(name)); > create index customer_tax_idx ON customertax (customer_id); > create index vendor_tax_idx ON vendortax (vendor_id); > create index parts_tax_idx ON partstax (parts_id); > create index chart_link_idx ON chart (link); > create index tax_idx ON tax (chart_id); > create unique index gl_idx ON gl (id); > create index gl_transdate_idx ON gl (transdate); > COMMIT WORK; > > For 1.8.0, at the very least add: > > BEGIN WORK; > create unique index oe_id_idx ON oe (id); > create index oe_ordnumber_idx ON oe (ordnumber); > create index oe_transdate_idx ON oe (transdate); > COMMIT WORK; > > I have not tested this stuff against a 1.8.x platform so, Your Mileage > May Vary. > > For those who want to tinker, I've listed below the way I've chosen > these indexes were based on several criteria: > > - All primary keys MUST have UNIQUE INDEX. That would be the > customer.id, vendor.id, parts.id, oe.id, ap.id, ar.id, etc. MySQL has > a PRIMARY KEY syntax which might even be in standard ANSI SQL and > therefore PostgreSQL -- declaring within a table that a field is a > primary key automagically creates the unique indexes which in turn > gives the optimizer something to work with. > > - Then, the secondary indexes are chosen on common JOIN points or > WHERE clauses. Having messed with the guts of the code for 80 hours > last week, I had a pretty good idea of what were frequently called > stuff. For example, in searching for parts by description, the code > uses lower() -- which I understand has been taken out in 1.8.x -- so > in the above I had a > > CREATE INDEX customer_lower_name_idx ON customer (lower(name)); > > That's supposed to make the search faster. Another commonly searched > criteria is based on dates. So I made sure all the transaction date > fields has an index. I used guestimates. If I were in the "slow and > careful" rather than the "fast and sloppy" mode, I would have used the > EXPLAIN for each of the queries and have PG tell me precisely which > indices were being used and which ones weren't. > > I might have gone overboard and added too many fields. That typically > affects INSERTs and UPDATEs where for each index, the backend has to > add another entry into the index. What's fun though, is that inserts > into the indices are typically O(log n) rather than O(n) (e.g. 20 > instead of 1 million). So there's a tradeoff between what sort of > indices are necessary and what arn't. Certainly there's room for tweaks. > > I timed the operation by calling the modules via commandline: > > cd sql-ledger > time ./ar.pl "login=myloginstuff&action=something& ... " > > It works assuming you've created a login where no passwords are > required. > > |
|
From: <ma...@li...> - 2002-01-07 18:52:10
|
Dieter, Inexact searching aside, primary key and foreign key indexes, along with candidate key indexes, should be provided at a minimum for each relation in the system. As well, you will find that RDBMS systems nowadays are substantially more intelligent about using partial keys and opportunistic use of available index to accelerate ad hoc queries. If you provide no indexing, then the RDBMS cannot use this intelligence. There is no sound argument against doing this in the general case. (For example, if most users have relatively small installations, they will not pay a substantial reindexing penalty in any case.) Within limits, I think benchmarking will show that a range of indexes should be provided. True, some queries can't or won't take advantage of indexing. Over time, you simply minimize your reliance on those sorts of queries. Qaexl has provided a large number of indexes, more by far than I would have assumed most users would need (or want). However, he based his list on benchmarking results, so maybe he as found some useful speedups worth looking into, yes? Matt On Mon, 7 Jan 2002, Dieter Simader wrote: > Indices are only useful for an exact search not a fuzzy search as is used > on most of the queries looking for a name or partnumber. > > example: > dws=# create index parts_idx on parts (partnumber); > CREATE > dws=# explain select * from parts where partnumber = '123'; > NOTICE: QUERY PLAN: > > -- Matt Benjamin The Linux Box 206 South Fifth Ave. Suite 150 Ann Arbor, MI 48104 tel. 734-761-4689 fax. 734-769-8938 cel. 734-216-5309 pgr. 734-431-0118 |
|
From: Ho-Sheng H. <qa...@ne...> - 2002-01-08 00:25:05
|
On Mon, Jan 07, 2002 at 11:25:13AM -0700, Dieter Simader wrote: > Indices are only useful for an exact search not a fuzzy search as is used > on most of the queries looking for a name or partnumber. That is certainly true. I think that's where the other tweak -- increasing the memory available to PG from 1 MB to 50 MB seems to help the most. The figures I tossed around in the first email were for ar/ap reports; the figures I tossed around in the second email were for parts. The actual time savings weren't very much, but to human psychology it was different, it was smoother, it was closer to being more responsive like a desktop GUI app. Can I prove it? Probably not. I just know that I don't feel like throwing a boot through the monitor due to waiting. I'm tempted to have the queries logged onto a file for a day while the staff does their normal stuff, and then run explains/benchmarks against the log. > If you run a database with 10,000+ tuples in the parts table it certainly > pays off to change the backend from a fuzzy search to an exact search. You > lose a lot of flexibility but the performance increase makes it all worth > while. Sure, and the business process requires a fuzzy search which made those text indices a liability. It's those other operations like getting an AR aging report or a AR transaction list sliced by dates that saw the best improvements. I've written another report module that summerizes "today's sales" which requires joins against the customer/vendor tables. While those queries didn't touch the parts table, they definately saw some speed improvements. Probably because there were 1000 rows in the vendor table and 22,000 rows in the customer table. -Qaexl- |
|
From: <ma...@li...> - 2002-01-07 15:22:40
|
Ho-Sheng, All SQL-based RDBMS systems use indexes, and use them in essentially the same manner (from the DBA's point of view). If SQL-Ledger isn't supplying indexing for tables and relations, it's just a bug. (Dieter?) It does look like my 1.6.1 setup is missing a lot of obvious foreign keys/indexes. You might supply the SQL statements (CREATE INDEX) you developed to the list, as it sounds like this would help people. Matt On Sun, 6 Jan 2002, Ho-Sheng Hsiao wrote: > On Sun, Jan 06, 2002 at 11:26:33AM +0100, Roland Stoker wrote: > > Ehhh guys???? > > > > Indexes makes search-queries go from O(n) to O log(n) > > Read O as order > > That's about the biggest performance difference you can get in a database!!!!! > -- Matt Benjamin The Linux Box 206 South Fifth Ave. Suite 150 Ann Arbor, MI 48104 tel. 734-761-4689 fax. 734-769-8938 cel. 734-216-5309 pgr. 734-431-0118 |