[WDB-Development] wci.read on wdb2ts
Brought to you by:
falkenroth,
michaeloa
From: Michael O. A. <mic...@me...> - 2008-11-06 12:12:43
|
I have been looking into the issues surrounding the low wci.read performance for grids that we have been getting in wdb2ts queries. Part of the problem is due to the lack of indexes; a result of the lack of appropriate load we have had to test with. Adding some appropriate indexes improves performance considerably, taking it down from 400-600 ms/query to around 200 ms (100 ms to retrieve data + 100 ms query execution). At this tempo, it is still using nested loops extensively, resulting in repeated index scans of the oidvalue table, for instance. I believe we should be able to bring the query execution down to about 20 ms (probably close to the optimal performance on PrologDev1), if we can get the query to hash index properly. To achieve this, we will need to assist the Postgres query optimizer quite a bit. Following is a couple of optimizations we might implement on the wci.read side, to make the job easier for the Postgres optimizer: 1. Delayed geographic check The current wci.read query generated, checks that a point being requested is within the geographic boundaries of the field being returned (using within and transform). This prevents us from returning 100+ rows to point extraction function, when perhaps only 1 row is of interest. The problem: the function is costly (although we need not be too concerned about the 3 ms in the context of 100); more importantly, the function is impossible to estimate and as a result it confused the optimizer. Proposed solution: - We assume that the majority of the geographic searches we get will be on appropriate data sources (i.e., applications will not often search for Australian locations on Hirlam data). - The weeding out of data by geography, will then instead occur when we (try to) extract the grid points from the data (if the grid point that we calculate falls outside the grid, we obviously don't return data for that point). The extra check at this point should not - under normal circumstances - be particularly costly. The solution may need to be different for Polygons; we'll just have to deal with that when it comes up. 2. Increased use of materialized views The wci.oidvalue view still uses one non-materialized view (placename). Eliminating that non-materialized view will reduce the number of joins required by the wci.read query from 6 to 5. 3. Use of IN (x..x) instead of x OR x 99% of the time, this will probably not be useful for the optimizer, as it tends to treat this the same in complex queries, but the Postgres query optimizer is supposedly capable of doing a few smart tricks with IN. Making it possible for the Postgres optimizer to consider better options would probably be worth it. Comments...? Particularly on #1 Vegard? Regards, Michael A. |