From: David E. W. <da...@ju...> - 2014-03-03 20:00:29
|
On Mar 3, 2014, at 10:36 AM, David E. Wheeler <da...@ju...> wrote: > Even with JOIN clauses added to the query such that it *should* run in whole on each node, the plan still fetches an entire (very large) table from each node. I will keep fiddling with the query to see if I can figure out how to get it to run on the nodes. Okay, here’s a simplified example: =# EXPLAIN VERBOSE SELECT tr.subscriber_id AS subscriber_id , s.subscriber AS subscriber_name , tr.txn_id AS tracking_number FROM subscriber_:SUB_CODE.transactions tr LEFT JOIN config.subscribers s ON tr.subscriber_id = s.subscriber_id WHERE tr.txn_timestamp_utc >= :'RPT_DAY'::timestamp - (:DAYS_BACK || ' days')::interval + (:CUTOFF_HOUR || ' hours')::interval AND tr.txn_timestamp_utc< :'RPT_DAY'::timestamp + (:CUTOFF_HOUR || ' hours')::interval ; QUERY PLAN --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Hash Right Join (cost=0.01..0.06 rows=1 width=44) Output: tr.subscriber_id, s.subscriber, tr.txn_id Hash Cond: (s.subscriber_id = tr.subscriber_id) -> Data Node Scan on subscribers "_REMOTE_TABLE_QUERY_" (cost=0.00..0.00 rows=1000 width=36) Output: s.subscriber, s.subscriber_id Node/s: rptdata01 Remote query: SELECT subscriber, subscriber_id FROM ONLY config.subscribers s WHERE true -> Hash (cost=0.00..0.00 rows=1000 width=12) Output: tr.subscriber_id, tr.txn_id -> Data Node Scan on transactions "_REMOTE_TABLE_QUERY_" (cost=0.00..0.00 rows=1000 width=12) Output: tr.subscriber_id, tr.txn_id Node/s: rptdata01, rptdata02, rptdata03, rptdata04 Remote query: SELECT subscriber_id, txn_id, txn_timestamp_utc FROM ONLY subscriber_482900.transactions tr WHERE true Coordinator quals: ((tr.txn_timestamp_utc < ('2014-01-16 00:00:00'::timestamp without time zone + ('0 hours'::cstring)::interval)) AND (tr.txn_timestamp_utc >= (('2014-01-16 00:00:00'::timestamp without time zone - ('1 days'::cstring)::interval) + ('0 hours'::cstring)::interval))) This line is the problem: Remote query: SELECT subscriber_id, txn_id, txn_timestamp_utc FROM ONLY subscriber_482900.transactions tr WHERE true Why aren't the WHERE clauses getting pushed down to the data nodes? That transactions table is very large, and there is an index on txn_timestamp_utc, so I can see no reason why the planner would choose to do a full table scan on every data node and filter on the coordinator. That’s a really poor choice. This forced table scan is eating all the memory, of course. Is there a reason the planner isn’t pushing down the WHERE clauses? Thanks, David |