Re: [Postgres-xc-general] Query Never Finishes When Trying to JOIN Multiple Tables

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

On Fri, Aug 24, 2012 at 2:08 AM, Nick Maludy <nm...@gm...> wrote:

> Thanks for the suggestions, based on all of your ideas i have come up with
> the following structure:
>
> CREATE TABLE parent (
>   __id bigserial PRIMARY KEY, // primary key
>   name text,
>   time bigint,
> ) DISTRIBUTE BY HASH (__id);
> CREATE INDEX parent___id_index ON parent(__id);
> CREATE INDEX parent_time_index ON parent(time);
>
> CREATE TABLE list (
>   __root_id bigint REFERENCES parent(__id),
>   __id bigserial, // primary key
>   name text,
>   time bigint,
> ) DISTRIBUTE BY HASH (__root_id);
> CREATE INDEX list__root_id_index ON list(__root_id);
>
> CREATE TABLE sub_list (
>   __root_id bigint REFERENCES parent(__id),
>   __list_id bigint, // foreign key to list.__id
>   __id bigserial, // primary key
>   element_name text,
>   element_value numeric,
>   time bigint
> ) DISTRIBUTE BY HASH (__root_id);
> CREATE INDEX sub_list__root_id_index ON sub_list(__root_id);
>
> ... etc ub_sub_list
>
> //////////////////
> // Query //
> /////////////////
> SELECT sub_list.*
>   FROM sub_list
>   JOIN parent AS parentquery
>        ON parentquery.__id = sub_list.__root_id
>   WHERE parentquery.time > 20000 AND
>         parentquery.time < 30000;
>
> My queries now finish, however they are taking quite a bit of time (about
> 1 second a piece)
>
>                                                           QUERY PLAN
>
>
>
> ------------------------------------------------------------------------------------------------------
> ------------------------
>  Data Node Scan on "__REMOTE_FQS_QUERY__"  (cost=0.00..0.00 rows=0
> width=0) (actual time=183.641..965.710 rows=9998 loops=1)
>    Node/s: datanode_nick, datanode_lenovo, datanode_alien
>  Total runtime: 967.672 ms
> (3 rows)
>
> If i run this same query locally on regular Postgres i get the following:
>
>        QUERY PLAN
>
>
>
> ------------------------------------------------------------------------------------------------------
>
> ------------------------------------------------------------------------------------------------------
> -------------------------
>  Nested Loop  (cost=0.00..72845.41 rows=37269 width=269) (actual
> time=0.086..38.557 rows=39996 loops=1
> )
>    ->  Index Scan using parent_time_index on parent parentquery
>  (cost=0.00..408.38 rows=9951 width=8) (actual time=0.065..2.149 rows=9999
> loops=1)
>          Index Cond: ((time > 20000) AND (time < 30000))
>    ->  Index Scan using sub_list__root_index on sub_list  (cost=0.00..7.20
> rows=6 width=269) (actual time=0.002..
> 0.002 rows=4 loops=9999)
>          Index Cond: (sub_list.__root_id = parentquery._
> _id)
>  Total runtime: 41.469 ms
> (6 rows)
>
> I'm guessing the extra time is simply network overhead?
>

The tag REMOTE_FQS_QUERY tells that the query was directly sent to
datanodes and coordinator only acted as a proxy. That's a good sign. Yes,
for a single query the time take might be because of XC specific overheads,
which also includes the network overhead. But hopefully you get higher
throughput when you run more of those queries/DMLs simultaneously.

-- 
Best Wishes,
Ashutosh Bapat
EntepriseDB Corporation
The Enterprise Postgres Company