[WDB-Development] wci.read on wdb2ts

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

I have been looking into the issues surrounding the low wci.read
performance for grids that we have been getting in wdb2ts queries.

Part of the problem is due to the lack of indexes; a result of the lack of
appropriate load we have had to test with. Adding some appropriate indexes
improves performance considerably, taking it down from 400-600 ms/query to
around 200 ms (100 ms to retrieve data + 100 ms query execution). At this
tempo, it is still using nested loops extensively, resulting in repeated
index scans of the oidvalue table, for instance.

I believe we should be able to bring the query execution down to about 20
ms (probably close to the optimal performance on PrologDev1), if we can
get the query to hash index properly. To achieve this, we will need to
assist the Postgres query optimizer quite a bit.

Following is a couple of optimizations we might implement on the wci.read
side, to make the job easier for the Postgres optimizer:

1. Delayed geographic check

The current wci.read query generated, checks that a point being requested
is within the geographic boundaries of the field being returned (using
within and transform). This prevents us from returning 100+ rows to point
extraction function, when perhaps only 1 row is of interest.

The problem: the function is costly (although we need not be too concerned
about the 3 ms in the context of 100); more importantly, the function is
impossible to estimate and as a result it confused the optimizer.

Proposed solution:
- We assume that the majority of the geographic searches we get will be on
appropriate data sources (i.e., applications will not often search for
Australian locations on Hirlam data).
- The weeding out of data by geography, will then instead occur when we
(try to) extract the grid points from the data (if the grid point that we
calculate falls outside the grid, we obviously don't return data for that
point). The extra check at this point should not - under normal
circumstances - be particularly costly.

The solution may need to be different for Polygons; we'll just have to
deal with that when it comes up.

2. Increased use of materialized views

The wci.oidvalue view still uses one non-materialized view (placename).
Eliminating that non-materialized view will reduce the number of joins
required by the wci.read query from 6 to 5.

3. Use of IN (x..x) instead of x OR x

99% of the time, this will probably not be useful for the optimizer, as it
tends to treat this the same in complex queries, but the Postgres query
optimizer is supposedly capable of doing a few smart tricks with IN.
Making it possible for the Postgres optimizer to consider better options
would probably be worth it.

Comments...? Particularly on #1 Vegard?

Regards,

Michael A.