|
From: Nick M. <nm...@gm...> - 2012-08-23 20:39:37
|
Thanks for the suggestions, based on all of your ideas i have come up with
the following structure:
CREATE TABLE parent (
__id bigserial PRIMARY KEY, // primary key
name text,
time bigint,
) DISTRIBUTE BY HASH (__id);
CREATE INDEX parent___id_index ON parent(__id);
CREATE INDEX parent_time_index ON parent(time);
CREATE TABLE list (
__root_id bigint REFERENCES parent(__id),
__id bigserial, // primary key
name text,
time bigint,
) DISTRIBUTE BY HASH (__root_id);
CREATE INDEX list__root_id_index ON list(__root_id);
CREATE TABLE sub_list (
__root_id bigint REFERENCES parent(__id),
__list_id bigint, // foreign key to list.__id
__id bigserial, // primary key
element_name text,
element_value numeric,
time bigint
) DISTRIBUTE BY HASH (__root_id);
CREATE INDEX sub_list__root_id_index ON sub_list(__root_id);
... etc ub_sub_list
//////////////////
// Query //
/////////////////
SELECT sub_list.*
FROM sub_list
JOIN parent AS parentquery
ON parentquery.__id = sub_list.__root_id
WHERE parentquery.time > 20000 AND
parentquery.time < 30000;
My queries now finish, however they are taking quite a bit of time (about 1
second a piece)
QUERY PLAN
------------------------------------------------------------------------------------------------------
------------------------
Data Node Scan on "__REMOTE_FQS_QUERY__" (cost=0.00..0.00 rows=0 width=0)
(actual time=183.641..965.710 rows=9998 loops=1)
Node/s: datanode_nick, datanode_lenovo, datanode_alien
Total runtime: 967.672 ms
(3 rows)
If i run this same query locally on regular Postgres i get the following:
QUERY PLAN
------------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------------
-------------------------
Nested Loop (cost=0.00..72845.41 rows=37269 width=269) (actual
time=0.086..38.557 rows=39996 loops=1
)
-> Index Scan using parent_time_index on parent parentquery
(cost=0.00..408.38 rows=9951 width=8) (actual time=0.065..2.149 rows=9999
loops=1)
Index Cond: ((time > 20000) AND (time < 30000))
-> Index Scan using sub_list__root_index on sub_list (cost=0.00..7.20
rows=6 width=269) (actual time=0.002..
0.002 rows=4 loops=9999)
Index Cond: (sub_list.__root_id = parentquery._
_id)
Total runtime: 41.469 ms
(6 rows)
I'm guessing the extra time is simply network overhead?
|