|
From: David E. W. <da...@ju...> - 2014-03-03 20:00:29
|
On Mar 3, 2014, at 10:36 AM, David E. Wheeler <da...@ju...> wrote:
> Even with JOIN clauses added to the query such that it *should* run in whole on each node, the plan still fetches an entire (very large) table from each node. I will keep fiddling with the query to see if I can figure out how to get it to run on the nodes.
Okay, here’s a simplified example:
=# EXPLAIN VERBOSE
SELECT tr.subscriber_id AS subscriber_id
, s.subscriber AS subscriber_name
, tr.txn_id AS tracking_number
FROM subscriber_:SUB_CODE.transactions tr
LEFT JOIN config.subscribers s ON tr.subscriber_id = s.subscriber_id
WHERE tr.txn_timestamp_utc >= :'RPT_DAY'::timestamp - (:DAYS_BACK || ' days')::interval + (:CUTOFF_HOUR || ' hours')::interval
AND tr.txn_timestamp_utc< :'RPT_DAY'::timestamp + (:CUTOFF_HOUR || ' hours')::interval
;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Hash Right Join (cost=0.01..0.06 rows=1 width=44)
Output: tr.subscriber_id, s.subscriber, tr.txn_id
Hash Cond: (s.subscriber_id = tr.subscriber_id)
-> Data Node Scan on subscribers "_REMOTE_TABLE_QUERY_" (cost=0.00..0.00 rows=1000 width=36)
Output: s.subscriber, s.subscriber_id
Node/s: rptdata01
Remote query: SELECT subscriber, subscriber_id FROM ONLY config.subscribers s WHERE true
-> Hash (cost=0.00..0.00 rows=1000 width=12)
Output: tr.subscriber_id, tr.txn_id
-> Data Node Scan on transactions "_REMOTE_TABLE_QUERY_" (cost=0.00..0.00 rows=1000 width=12)
Output: tr.subscriber_id, tr.txn_id
Node/s: rptdata01, rptdata02, rptdata03, rptdata04
Remote query: SELECT subscriber_id, txn_id, txn_timestamp_utc FROM ONLY subscriber_482900.transactions tr WHERE true
Coordinator quals: ((tr.txn_timestamp_utc < ('2014-01-16 00:00:00'::timestamp without time zone + ('0 hours'::cstring)::interval)) AND (tr.txn_timestamp_utc >= (('2014-01-16 00:00:00'::timestamp without time zone - ('1 days'::cstring)::interval) + ('0 hours'::cstring)::interval)))
This line is the problem:
Remote query: SELECT subscriber_id, txn_id, txn_timestamp_utc FROM ONLY subscriber_482900.transactions tr WHERE true
Why aren't the WHERE clauses getting pushed down to the data nodes? That transactions table is very large, and there is an index on txn_timestamp_utc, so I can see no reason why the planner would choose to do a full table scan on every data node and filter on the coordinator. That’s a really poor choice. This forced table scan is eating all the memory, of course.
Is there a reason the planner isn’t pushing down the WHERE clauses?
Thanks,
David
|