Re: [Postgres-xc-developers] connection to server was lost

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

On Mar 3, 2014, at 10:36 AM, David E. Wheeler <da...@ju...> wrote:

> Even with JOIN clauses added to the query such that it *should* run in whole on each node, the plan still fetches an entire (very large) table from each node. I will keep fiddling with the query to see if I can figure out how to get it to run on the nodes.

Okay, here’s a simplified example:

=# EXPLAIN VERBOSE
SELECT tr.subscriber_id  AS subscriber_id
     , s.subscriber      AS subscriber_name
     , tr.txn_id         AS tracking_number
  FROM subscriber_:SUB_CODE.transactions tr
  LEFT JOIN config.subscribers s ON tr.subscriber_id = s.subscriber_id
 WHERE  tr.txn_timestamp_utc >= :'RPT_DAY'::timestamp - (:DAYS_BACK || ' days')::interval + (:CUTOFF_HOUR || ' hours')::interval
   AND  tr.txn_timestamp_utc< :'RPT_DAY'::timestamp  + (:CUTOFF_HOUR || ' hours')::interval
;
                                                                                                                                               QUERY PLAN                                                                                                                                               
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Hash Right Join  (cost=0.01..0.06 rows=1 width=44)
   Output: tr.subscriber_id, s.subscriber, tr.txn_id
   Hash Cond: (s.subscriber_id = tr.subscriber_id)
   ->  Data Node Scan on subscribers "_REMOTE_TABLE_QUERY_"  (cost=0.00..0.00 rows=1000 width=36)
         Output: s.subscriber, s.subscriber_id
         Node/s: rptdata01
         Remote query: SELECT subscriber, subscriber_id FROM ONLY config.subscribers s WHERE true
   ->  Hash  (cost=0.00..0.00 rows=1000 width=12)
         Output: tr.subscriber_id, tr.txn_id
         ->  Data Node Scan on transactions "_REMOTE_TABLE_QUERY_"  (cost=0.00..0.00 rows=1000 width=12)
               Output: tr.subscriber_id, tr.txn_id
               Node/s: rptdata01, rptdata02, rptdata03, rptdata04
               Remote query: SELECT subscriber_id, txn_id, txn_timestamp_utc FROM ONLY subscriber_482900.transactions tr WHERE true
               Coordinator quals: ((tr.txn_timestamp_utc < ('2014-01-16 00:00:00'::timestamp without time zone + ('0 hours'::cstring)::interval)) AND (tr.txn_timestamp_utc >= (('2014-01-16 00:00:00'::timestamp without time zone - ('1 days'::cstring)::interval) + ('0 hours'::cstring)::interval)))

This line is the problem:

    Remote query: SELECT subscriber_id, txn_id, txn_timestamp_utc FROM ONLY subscriber_482900.transactions tr WHERE true

Why aren't the WHERE clauses getting pushed down to the data nodes? That transactions table is very large, and there is an index on txn_timestamp_utc, so I can see no reason why the planner would choose to do a full table scan on every data node and filter on the coordinator. That’s a really poor choice. This forced table scan is eating all the memory, of course.

Is there a reason the planner isn’t pushing down the WHERE clauses?

Thanks,

David