Re: [Postgres-xc-developers] connection to server was lost

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

On Tue, Mar 4, 2014 at 1:30 AM, David E. Wheeler <da...@ju...>wrote:

> On Mar 3, 2014, at 10:36 AM, David E. Wheeler <da...@ju...>
> wrote:
>
> > Even with JOIN clauses added to the query such that it *should* run in
> whole on each node, the plan still fetches an entire (very large) table
> from each node. I will keep fiddling with the query to see if I can figure
> out how to get it to run on the nodes.
>
> Okay, here's a simplified example:
>
> =# EXPLAIN VERBOSE
> SELECT tr.subscriber_id  AS subscriber_id
>      , s.subscriber      AS subscriber_name
>      , tr.txn_id         AS tracking_number
>   FROM subscriber_:SUB_CODE.transactions tr
>   LEFT JOIN config.subscribers s ON tr.subscriber_id = s.subscriber_id
>  WHERE  tr.txn_timestamp_utc >= :'RPT_DAY'::timestamp - (:DAYS_BACK || '
> days')::interval + (:CUTOFF_HOUR || ' hours')::interval
>    AND  tr.txn_timestamp_utc< :'RPT_DAY'::timestamp  + (:CUTOFF_HOUR || '
> hours')::interval
> ;
>
>                                                                      QUERY
> PLAN
>
> ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>  Hash Right Join  (cost=0.01..0.06 rows=1 width=44)
>    Output: tr.subscriber_id, s.subscriber, tr.txn_id
>    Hash Cond: (s.subscriber_id = tr.subscriber_id)
>    ->  Data Node Scan on subscribers "_REMOTE_TABLE_QUERY_"
>  (cost=0.00..0.00 rows=1000 width=36)
>          Output: s.subscriber, s.subscriber_id
>          Node/s: rptdata01
>          Remote query: SELECT subscriber, subscriber_id FROM ONLY
> config.subscribers s WHERE true
>    ->  Hash  (cost=0.00..0.00 rows=1000 width=12)
>          Output: tr.subscriber_id, tr.txn_id
>          ->  Data Node Scan on transactions "_REMOTE_TABLE_QUERY_"
>  (cost=0.00..0.00 rows=1000 width=12)
>                Output: tr.subscriber_id, tr.txn_id
>                Node/s: rptdata01, rptdata02, rptdata03, rptdata04
>                Remote query: SELECT subscriber_id, txn_id,
> txn_timestamp_utc FROM ONLY subscriber_482900.transactions tr WHERE true
>                Coordinator quals: ((tr.txn_timestamp_utc < ('2014-01-16
> 00:00:00'::timestamp without time zone + ('0 hours'::cstring)::interval))
> AND (tr.txn_timestamp_utc >= (('2014-01-16 00:00:00'::timestamp without
> time zone - ('1 days'::cstring)::interval) + ('0
> hours'::cstring)::interval)))
>
> This line is the problem:
>
>     Remote query: SELECT subscriber_id, txn_id, txn_timestamp_utc FROM
> ONLY subscriber_482900.transactions tr WHERE true
>
> Why aren't the WHERE clauses getting pushed down to the data nodes? That
> transactions table is very large, and there is an index on
> txn_timestamp_utc, so I can see no reason why the planner would choose to
> do a full table scan on every data node and filter on the coordinator.
> That's a really poor choice. This forced table scan is eating all the
> memory, of course.
>
> Is there a reason the planner isn't pushing down the WHERE clauses?
>
>
Although, it looks like the quals are shippable, there may be something
which is being deemed as volatile (function or operator or a cast) or some
parameter. Unfortunately, I don't have time to dig through it. Can you
please check what happens to the JOIN if you say removing WHERE clause.
With that the entire join should be shippable, assuming that the two
relations are distributed on subscriber_id.


> Thanks,
>
> David
>
>


-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company