From: David E. W. <da...@ju...> - 2014-02-28 00:53:22
|
PGXC Hakers, I have finally loaded up my testing PGXC four-node cluster with a nice beefy database similar to a PostgreSQL database we use for long-running reporting queries. I gathered up one of our slower-running queries (26.6m run) and ran it on XC. Alas, after a while, it died with this error: psql:slow.sql:73: connection to server was lost The coordinator log was not much help: nothing was logged. So I trolled through the logs on the data nodes. All four had these messages: > 2014-02-27 15:45:51 PST dwheeler 10.4.34.1(56968) 22213 530fc838.56c5 LOG: could not send data to client: Connection reset by peer > 2014-02-27 15:45:51 PST dwheeler 10.4.34.1(56968) 22213 530fc838.56c5 STATEMENT: SELECT subscriber_cmd_id, rule_reason, rule_score, txn_uuid, txn_timestamp_utc FROM ONLY subscriber_482900.transactions_rule tma WHERE (subscriber_id = 482900) > 2014-02-27 15:45:51 PST dwheeler 10.4.34.1(56968) 22213 530fc838.56c5 FATAL: connection to client lost > 2014-02-27 15:45:51 PST dwheeler 10.4.34.1(56968) 22213 530fc838.56c5 STATEMENT: SELECT subscriber_cmd_id, rule_reason, rule_score, txn_uuid, txn_timestamp_utc FROM ONLY subscriber_482900.transactions_rule tma WHERE (subscriber_id = 482900) No reason given for the dropped connection. I ran the query on the coordinator box, so psql should have connected via a socket rather than TCP. Out of curiosity, I looked at the logs for the other three coordinators. None had any error messages, either. So, no idea what’s timing out; statement_timeout is set to 0. Here are the settings from my coordinator’s postgreql.conf: max_connections = 100 shared_buffers = 32MB log_destination = 'stderr' logging_collector = on log_directory = 'pg_log' log_filename = 'postgresql-%a.log' log_truncate_on_rotation = on log_rotation_age = 1d log_rotation_size = 0 log_line_prefix = '< %m >' log_timezone = 'US/Pacific' datestyle = 'iso, mdy' timezone = 'US/Pacific' lc_messages = 'en_US.UTF-8' lc_monetary = 'en_US.UTF-8' lc_numeric = 'en_US.UTF-8' lc_time = 'en_US.UTF-8' default_text_search_config = 'pg_catalog.english' pgxc_node_name = 'node1' port = 5432 listen_addresses = '*' shared_buffers = 250MB work_mem = 128MB maintenance_work_mem = 128MB effective_cache_size = 8GB log_line_prefix = '%t %u %r %p %c ' timezone = 'UTC' gtm_host = 'node1.example.com' And from one of the data nodes (only the names differ on the others): max_connections = 100 shared_buffers = 32MB log_destination = 'stderr' logging_collector = on log_directory = 'pg_log' log_filename = 'postgresql-%a.log' log_truncate_on_rotation = on log_rotation_age = 1d log_rotation_size = 0 log_line_prefix = '< %m >' log_timezone = 'US/Pacific' datestyle = 'iso, mdy' timezone = 'US/Pacific' lc_messages = 'en_US.UTF-8' lc_monetary = 'en_US.UTF-8' lc_numeric = 'en_US.UTF-8' lc_time = 'en_US.UTF-8' default_text_search_config = 'pg_catalog.english' pgxc_node_name = 'node1' port = 15432 listen_addresses = '*' shared_buffers = 750MB work_mem = 128MB maintenance_work_mem = 128MB effective_cache_size = 23GB log_line_prefix = '%t %u %r %p %c ' timezone = 'UTC' gtm_host = 'node1.iovationnp.com' Thoughts? What could be timing out? Thanks, David |
From: Ashutosh B. <ash...@en...> - 2014-02-28 03:48:40
|
Hi David, It might be due to the large amount of data sent from the datanode to the connector. When you see message "connection to client lost" at datanode, it means that the connection to the coordinator was lost. In XC, coordinators act as client to the datanode. Further, no message in the coordinator log implies that there wasn't any segfault or error on the coordinator which can result in loosing client (to the datanode). One way to verify this is to check what happens for smaller amounts of the data. There is still some code in executor, which saves data from datanode in a linked list and because of large amount of data that process runs out of memory. You may find something in system logs, if that is true. Please do the following, Run explain verbose on the query which showed this behavior and in that output you will find what query is being sent to the datanode Reduce your data on the datanode such that, that particular query returns may be a few thousand rows to the coordinator. BTW, I have seen millions of rows being exchanged between the coordinator and datanode without problem. But still there is a case where large data would be a problem. Now, see if the query runs without problem. On Fri, Feb 28, 2014 at 6:23 AM, David E. Wheeler <da...@ju...>wrote: > PGXC Hakers, > > I have finally loaded up my testing PGXC four-node cluster with a nice > beefy database similar to a PostgreSQL database we use for long-running > reporting queries. I gathered up one of our slower-running queries (26.6m > run) and ran it on XC. Alas, after a while, it died with this error: > > psql:slow.sql:73: connection to server was lost > > The coordinator log was not much help: nothing was logged. So I trolled > through the logs on the data nodes. All four had these messages: > > > 2014-02-27 15:45:51 PST dwheeler 10.4.34.1(56968) 22213 530fc838.56c5 > LOG: could not send data to client: Connection reset by peer > > 2014-02-27 15:45:51 PST dwheeler 10.4.34.1(56968) 22213 530fc838.56c5 > STATEMENT: SELECT subscriber_cmd_id, rule_reason, rule_score, txn_uuid, > txn_timestamp_utc FROM ONLY subscriber_482900.transactions_rule tma WHERE > (subscriber_id = 482900) > > 2014-02-27 15:45:51 PST dwheeler 10.4.34.1(56968) 22213 530fc838.56c5 > FATAL: connection to client lost > > 2014-02-27 15:45:51 PST dwheeler 10.4.34.1(56968) 22213 530fc838.56c5 > STATEMENT: SELECT subscriber_cmd_id, rule_reason, rule_score, txn_uuid, > txn_timestamp_utc FROM ONLY subscriber_482900.transactions_rule tma WHERE > (subscriber_id = 482900) > > No reason given for the dropped connection. I ran the query on the > coordinator box, so psql should have connected via a socket rather than > TCP. Out of curiosity, I looked at the logs for the other three > coordinators. None had any error messages, either. > > So, no idea what's timing out; statement_timeout is set to 0. Here are the > settings from my coordinator's postgreql.conf: > > max_connections = 100 > shared_buffers = 32MB > log_destination = 'stderr' > logging_collector = on > log_directory = 'pg_log' > log_filename = 'postgresql-%a.log' > log_truncate_on_rotation = on > log_rotation_age = 1d > log_rotation_size = 0 > log_line_prefix = '< %m >' > log_timezone = 'US/Pacific' > datestyle = 'iso, mdy' > timezone = 'US/Pacific' > lc_messages = 'en_US.UTF-8' > lc_monetary = 'en_US.UTF-8' > lc_numeric = 'en_US.UTF-8' > lc_time = 'en_US.UTF-8' > default_text_search_config = 'pg_catalog.english' > pgxc_node_name = 'node1' > port = 5432 > listen_addresses = '*' > shared_buffers = 250MB > work_mem = 128MB > maintenance_work_mem = 128MB > effective_cache_size = 8GB > log_line_prefix = '%t %u %r %p %c ' > timezone = 'UTC' > gtm_host = 'node1.example.com' > > And from one of the data nodes (only the names differ on the others): > > max_connections = 100 > shared_buffers = 32MB > log_destination = 'stderr' > logging_collector = on > log_directory = 'pg_log' > log_filename = 'postgresql-%a.log' > log_truncate_on_rotation = on > log_rotation_age = 1d > log_rotation_size = 0 > log_line_prefix = '< %m >' > log_timezone = 'US/Pacific' > datestyle = 'iso, mdy' > timezone = 'US/Pacific' > lc_messages = 'en_US.UTF-8' > lc_monetary = 'en_US.UTF-8' > lc_numeric = 'en_US.UTF-8' > lc_time = 'en_US.UTF-8' > default_text_search_config = 'pg_catalog.english' > pgxc_node_name = 'node1' > port = 15432 > listen_addresses = '*' > shared_buffers = 750MB > work_mem = 128MB > maintenance_work_mem = 128MB > effective_cache_size = 23GB > log_line_prefix = '%t %u %r %p %c ' > timezone = 'UTC' > gtm_host = 'node1.iovationnp.com' > > Thoughts? What could be timing out? > > Thanks, > > David > > > > > ------------------------------------------------------------------------------ > Flow-based real-time traffic analytics software. Cisco certified tool. > Monitor traffic, SLAs, QoS, Medianet, WAAS etc. with NetFlow Analyzer > Customize your own dashboards, set traffic alerts and generate reports. > Network behavioral analysis & security monitoring. All-in-one tool. > > http://pubads.g.doubleclick.net/gampad/clk?id=126839071&iu=/4140/ostg.clktrk > _______________________________________________ > Postgres-xc-developers mailing list > Pos...@li... > https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers > -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company |
From: David E. W. <da...@ju...> - 2014-02-28 21:17:16
|
On Feb 27, 2014, at 7:48 PM, Ashutosh Bapat <ash...@en...> wrote: > It might be due to the large amount of data sent from the datanode to the connector. When you see message "connection to client lost" at datanode, it means that the connection to the coordinator was lost. In XC, coordinators act as client to the datanode. Further, no message in the coordinator log implies that there wasn't any segfault or error on the coordinator which can result in loosing client (to the datanode). One way to verify this is to check what happens for smaller amounts of the data. There is still some code in executor, which saves data from datanode in a linked list and because of large amount of data that process runs out of memory. You may find something in system logs, if that is true. Ah ha. Now that I pay more attention to the statement in the log, I see what the problem is. That's a full table scan it's doing, on a very large table. I think the planner is making a mistake. The query I'm running is far more complicated than that bit in the log. Really, the full query should be able to run on each node, and the results aggregated on the coordinator. I suspect I need to add some more JOIN clauses to make sure that the planner better knows how to run the query on each node. > Please do the following, > Run explain verbose on the query which showed this behavior and in that output you will find what query is being sent to the datanode So I did this, but even with the EXPLAIN VERBOSE I got the disconnect error. With plain EXPLAIN, too. The query should not run without ANALZYE, right? This is 1.1, BTW. > Reduce your data on the datanode such that, that particular query returns may be a few thousand rows to the coordinator. BTW, I have seen millions of rows being exchanged between the coordinator and datanode without problem. But still there is a case where large data would be a problem. > Now, see if the query runs without problem. I updated my query to make sure that I was joining on partitioned columns, thinking that would get the queries to run more on the data nodes, but it made no difference. I still got an error for a table scan on a very large table. :-( David |
From: Ashutosh B. <ash...@en...> - 2014-03-01 04:23:37
|
On Sat, Mar 1, 2014 at 2:47 AM, David E. Wheeler <da...@ju...>wrote: > On Feb 27, 2014, at 7:48 PM, Ashutosh Bapat < > ash...@en...> wrote: > > > It might be due to the large amount of data sent from the datanode to > the connector. When you see message "connection to client lost" at > datanode, it means that the connection to the coordinator was lost. In XC, > coordinators act as client to the datanode. Further, no message in the > coordinator log implies that there wasn't any segfault or error on the > coordinator which can result in loosing client (to the datanode). One way > to verify this is to check what happens for smaller amounts of the data. > There is still some code in executor, which saves data from datanode in a > linked list and because of large amount of data that process runs out of > memory. You may find something in system logs, if that is true. > > Ah ha. Now that I pay more attention to the statement in the log, I see > what the problem is. That's a full table scan it's doing, on a very large > table. I think the planner is making a mistake. The query I'm running is > far more complicated than that bit in the log. Really, the full query > should be able to run on each node, and the results aggregated on the > coordinator. I suspect I need to add some more JOIN clauses to make sure > that the planner better knows how to run the query on each node. > > > Please do the following, > > Run explain verbose on the query which showed this behavior and in that > output you will find what query is being sent to the datanode > > So I did this, but even with the EXPLAIN VERBOSE I got the disconnect > error. With plain EXPLAIN, too. The query should not run without ANALZYE, > right? This is 1.1, BTW. > This looks weird. Are you seeing this error in datanode logs or coordinator logs. For mere EXPLAIN only coordinator is active. Coordinator and datanode are not connected. So, if you are seeing disconnection error on datanode when you fired EXPLAIN VERBOSE on coordinator, that's weird. Am I missing something? > > > Reduce your data on the datanode such that, that particular query > returns may be a few thousand rows to the coordinator. BTW, I have seen > millions of rows being exchanged between the coordinator and datanode > without problem. But still there is a case where large data would be a > problem. > > Now, see if the query runs without problem. > > I updated my query to make sure that I was joining on partitioned columns, > thinking that would get the queries to run more on the data nodes, but it > made no difference. I still got an error for a table scan on a very large > table. :-( > > David > > -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company |
From: David E. W. <da...@ju...> - 2014-03-03 18:02:32
|
On Feb 28, 2014, at 8:23 PM, Ashutosh Bapat <ash...@en...> wrote: > This looks weird. Are you seeing this error in datanode logs or coordinator logs. For mere EXPLAIN only coordinator is active. Coordinator and datanode are not connected. So, if you are seeing disconnection error on datanode when you fired EXPLAIN VERBOSE on coordinator, that's weird. Am I missing something? The error is in the data node logs. The coordinator has this: 2014-03-03 09:21:43 PST 22325 530659c2.5735 LOG: server process (PID 24475) was terminated by signal 9: Killed 2014-03-03 09:21:43 PST 22325 530659c2.5735 DETAIL: Failed process was running: EXPLAIN ANALYZE WITH DTL1 as (Select td.subscriber_id, s.subscriber::varchar(40) as subscriber_name, td.txn_id::char(20) as tracking_number, to_char(td.txn_timestamp_local,'yyyy/mm/dd-hh24:mi:ss')::varchar(50) as transaction_time_local, to_char(td.txn_timestamp_utc,'yyyy/mm/dd-hh24:mi:ss')::varchar(50) as transaction_time, td.local_timezone as local_timezone, td.account_code::varchar(60), case when td.account_code is not null then valueof_node_id(node_id_to_string( node_id( subscriber_id := s.subscriber_id, account_code := td.account_code)))::varchar(80) else null::varchar(80) end as account_link, td.device_id::char(20) as device_id, CASE WHEN td.result_code = 0 THEN 'Allow'::varchar(10) WHEN td.result_code = 1 THEN 'Deny'::varchar(10) WHEN td.result_code = 2 THEN 'Review'::varchar(10) ELSE 'Other'::varchar(10) END as result, td.ruleset, 2014-03-03 09:21:43 PST 22325 530659c2.5735 LOG: terminating any other active server processes 2014-03-03 09:21:44 PST 9611 53110262.258b WARNING: terminating connection because of crash of another server process 2014-03-03 09:21:44 PST 9611 53110262.258b DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory. 2014-03-03 09:21:44 PST 9611 53110262.258b HINT: In a moment you should be able to reconnect to the database and repeat your command. 2014-03-03 09:21:44 PST 22325 530659c2.5735 LOG: all server processes terminated; reinitializing 2014-03-03 09:21:48 PST 26198 5314ba2c.6656 LOG: database system was interrupted; last known up at 2014-03-03 09:16:56 PST 2014-03-03 09:21:48 PST 26198 5314ba2c.6656 LOG: database system was not properly shut down; automatic recovery in progress 2014-03-03 09:21:48 PST 26198 5314ba2c.6656 LOG: redo starts at 0/2F47B70 2014-03-03 09:21:48 PST 26198 5314ba2c.6656 LOG: record with zero length at 0/2F47D50 2014-03-03 09:21:48 PST 26198 5314ba2c.6656 LOG: redo done at 0/2F47D20 2014-03-03 09:21:48 PST 26198 5314ba2c.6656 LOG: last completed transaction was at log time 2014-03-03 17:21:13.279991+00 2014-03-03 09:21:48 PST 26202 5314ba2c.665a LOG: autovacuum launcher started 2014-03-03 09:21:48 PST 22325 530659c2.5735 LOG: database system is ready to accept connections The query I’m running has EXPLAIN VERBOSE, not EXPLAIN ANALYZE. Is the parser mucking with it somehow? This is 1.1. The possibility of corrupted shared memory is distressing; wish the log would say which node exited abnormally. David |
From: David E. W. <da...@ju...> - 2014-03-03 18:36:44
|
On Mar 3, 2014, at 10:02 AM, David E. Wheeler <da...@ju...> wrote: > The query I’m running has EXPLAIN VERBOSE, not EXPLAIN ANALYZE. Is the parser mucking with it somehow? This is 1.1. Crap, just realized that I have two query files. I was editing one and running the other. Sheesh, sorry about that. Even with JOIN clauses added to the query such that it *should* run in whole on each node, the plan still fetches an entire (very large) table from each node. I will keep fiddling with the query to see if I can figure out how to get it to run on the nodes. > The possibility of corrupted shared memory is distressing; wish the log would say which node exited abnormally. This is a pretty significant issue, though. The thing really should not run out of memory, and even it it does, it should tell me where. Thanks, David |
From: David E. W. <da...@ju...> - 2014-03-03 20:00:29
|
On Mar 3, 2014, at 10:36 AM, David E. Wheeler <da...@ju...> wrote: > Even with JOIN clauses added to the query such that it *should* run in whole on each node, the plan still fetches an entire (very large) table from each node. I will keep fiddling with the query to see if I can figure out how to get it to run on the nodes. Okay, here’s a simplified example: =# EXPLAIN VERBOSE SELECT tr.subscriber_id AS subscriber_id , s.subscriber AS subscriber_name , tr.txn_id AS tracking_number FROM subscriber_:SUB_CODE.transactions tr LEFT JOIN config.subscribers s ON tr.subscriber_id = s.subscriber_id WHERE tr.txn_timestamp_utc >= :'RPT_DAY'::timestamp - (:DAYS_BACK || ' days')::interval + (:CUTOFF_HOUR || ' hours')::interval AND tr.txn_timestamp_utc< :'RPT_DAY'::timestamp + (:CUTOFF_HOUR || ' hours')::interval ; QUERY PLAN --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Hash Right Join (cost=0.01..0.06 rows=1 width=44) Output: tr.subscriber_id, s.subscriber, tr.txn_id Hash Cond: (s.subscriber_id = tr.subscriber_id) -> Data Node Scan on subscribers "_REMOTE_TABLE_QUERY_" (cost=0.00..0.00 rows=1000 width=36) Output: s.subscriber, s.subscriber_id Node/s: rptdata01 Remote query: SELECT subscriber, subscriber_id FROM ONLY config.subscribers s WHERE true -> Hash (cost=0.00..0.00 rows=1000 width=12) Output: tr.subscriber_id, tr.txn_id -> Data Node Scan on transactions "_REMOTE_TABLE_QUERY_" (cost=0.00..0.00 rows=1000 width=12) Output: tr.subscriber_id, tr.txn_id Node/s: rptdata01, rptdata02, rptdata03, rptdata04 Remote query: SELECT subscriber_id, txn_id, txn_timestamp_utc FROM ONLY subscriber_482900.transactions tr WHERE true Coordinator quals: ((tr.txn_timestamp_utc < ('2014-01-16 00:00:00'::timestamp without time zone + ('0 hours'::cstring)::interval)) AND (tr.txn_timestamp_utc >= (('2014-01-16 00:00:00'::timestamp without time zone - ('1 days'::cstring)::interval) + ('0 hours'::cstring)::interval))) This line is the problem: Remote query: SELECT subscriber_id, txn_id, txn_timestamp_utc FROM ONLY subscriber_482900.transactions tr WHERE true Why aren't the WHERE clauses getting pushed down to the data nodes? That transactions table is very large, and there is an index on txn_timestamp_utc, so I can see no reason why the planner would choose to do a full table scan on every data node and filter on the coordinator. That’s a really poor choice. This forced table scan is eating all the memory, of course. Is there a reason the planner isn’t pushing down the WHERE clauses? Thanks, David |
From: Ashutosh B. <ash...@en...> - 2014-03-04 03:50:25
|
On Tue, Mar 4, 2014 at 1:30 AM, David E. Wheeler <da...@ju...>wrote: > On Mar 3, 2014, at 10:36 AM, David E. Wheeler <da...@ju...> > wrote: > > > Even with JOIN clauses added to the query such that it *should* run in > whole on each node, the plan still fetches an entire (very large) table > from each node. I will keep fiddling with the query to see if I can figure > out how to get it to run on the nodes. > > Okay, here's a simplified example: > > =# EXPLAIN VERBOSE > SELECT tr.subscriber_id AS subscriber_id > , s.subscriber AS subscriber_name > , tr.txn_id AS tracking_number > FROM subscriber_:SUB_CODE.transactions tr > LEFT JOIN config.subscribers s ON tr.subscriber_id = s.subscriber_id > WHERE tr.txn_timestamp_utc >= :'RPT_DAY'::timestamp - (:DAYS_BACK || ' > days')::interval + (:CUTOFF_HOUR || ' hours')::interval > AND tr.txn_timestamp_utc< :'RPT_DAY'::timestamp + (:CUTOFF_HOUR || ' > hours')::interval > ; > > QUERY > PLAN > > --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- > Hash Right Join (cost=0.01..0.06 rows=1 width=44) > Output: tr.subscriber_id, s.subscriber, tr.txn_id > Hash Cond: (s.subscriber_id = tr.subscriber_id) > -> Data Node Scan on subscribers "_REMOTE_TABLE_QUERY_" > (cost=0.00..0.00 rows=1000 width=36) > Output: s.subscriber, s.subscriber_id > Node/s: rptdata01 > Remote query: SELECT subscriber, subscriber_id FROM ONLY > config.subscribers s WHERE true > -> Hash (cost=0.00..0.00 rows=1000 width=12) > Output: tr.subscriber_id, tr.txn_id > -> Data Node Scan on transactions "_REMOTE_TABLE_QUERY_" > (cost=0.00..0.00 rows=1000 width=12) > Output: tr.subscriber_id, tr.txn_id > Node/s: rptdata01, rptdata02, rptdata03, rptdata04 > Remote query: SELECT subscriber_id, txn_id, > txn_timestamp_utc FROM ONLY subscriber_482900.transactions tr WHERE true > Coordinator quals: ((tr.txn_timestamp_utc < ('2014-01-16 > 00:00:00'::timestamp without time zone + ('0 hours'::cstring)::interval)) > AND (tr.txn_timestamp_utc >= (('2014-01-16 00:00:00'::timestamp without > time zone - ('1 days'::cstring)::interval) + ('0 > hours'::cstring)::interval))) > > This line is the problem: > > Remote query: SELECT subscriber_id, txn_id, txn_timestamp_utc FROM > ONLY subscriber_482900.transactions tr WHERE true > > Why aren't the WHERE clauses getting pushed down to the data nodes? That > transactions table is very large, and there is an index on > txn_timestamp_utc, so I can see no reason why the planner would choose to > do a full table scan on every data node and filter on the coordinator. > That's a really poor choice. This forced table scan is eating all the > memory, of course. > > Is there a reason the planner isn't pushing down the WHERE clauses? > > Although, it looks like the quals are shippable, there may be something which is being deemed as volatile (function or operator or a cast) or some parameter. Unfortunately, I don't have time to dig through it. Can you please check what happens to the JOIN if you say removing WHERE clause. With that the entire join should be shippable, assuming that the two relations are distributed on subscriber_id. > Thanks, > > David > > -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company |
From: David E. W. <da...@ju...> - 2014-03-04 04:45:37
|
On Mar 3, 2014, at 7:50 PM, Ashutosh Bapat <ash...@en...> wrote: > Although, it looks like the quals are shippable, there may be something which is being deemed as volatile (function or operator or a cast) or some parameter. Unfortunately, I don't have time to dig through it. Can you please check what happens to the JOIN if you say removing WHERE clause. With that the entire join should be shippable, assuming that the two relations are distributed on subscriber_id. config.subscribers is replicated to all nodes, and subscriber_482900.transactions is a partition, and the only subscriber_id in the whole table is 482900. It's distributed on another column, account_code. Taking out the WHERE clause is by definition a table scan: =# EXPLAIN VERBOSE -# SELECT tr.subscriber_id AS subscriber_id -# , s.subscriber AS subscriber_name -# , tr.txn_id AS tracking_number -# FROM subscriber_:SUB_CODE.transactions tr -# LEFT JOIN config.subscribers s ON tr.subscriber_id = s.subscriber_id -# WHERE tr.subscriber_id = :SUB_CODE -# AND tr.txn_timestamp_utc >= :'RPT_DAY'::timestamp - (:DAYS_BACK || ' days')::interval + (:CUTOFF_HOUR || ' hours')::interval -# AND tr.txn_timestamp_utc< :'RPT_DAY'::timestamp + (:CUTOFF_HOUR || ' hours')::interval -# ; QUERY PLAN --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Nested Loop Left Join (cost=0.00..0.01 rows=1 width=44) Output: tr.subscriber_id, s.subscriber, tr.txn_id Join Filter: (tr.subscriber_id = s.subscriber_id) -> Data Node Scan on transactions "_REMOTE_TABLE_QUERY_" (cost=0.00..0.00 rows=1000 width=12) Output: tr.subscriber_id, tr.txn_id Node/s: rptdata01, rptdata02, rptdata03, rptdata04 Remote query: SELECT subscriber_id, txn_id, txn_timestamp_utc FROM ONLY subscriber_482900.transactions tr WHERE (subscriber_id = 482900) Coordinator quals: ((tr.txn_timestamp_utc < ('2014-01-16 00:00:00'::timestamp without time zone + ('0 hours'::cstring)::interval)) AND (tr.txn_timestamp_utc >= (('2014-01-16 00:00:00'::timestamp without time zone - ('1 days'::cstring)::interval) + ('0 hours'::cstring)::interval))) -> Data Node Scan on subscribers "_REMOTE_TABLE_QUERY_" (cost=0.00..0.00 rows=1000 width=36) Output: s.subscriber, s.subscriber_id Node/s: rptdata01 Remote query: SELECT subscriber, subscriber_id FROM ONLY config.subscribers s WHERE (subscriber_id = 482900) David |
From: David E. W. <da...@ju...> - 2014-03-04 16:57:57
|
On Mar 3, 2014, at 8:45 PM, David E. Wheeler <da...@ju...> wrote: >> Although, it looks like the quals are shippable, there may be something which is being deemed as volatile (function or operator or a cast) or some parameter. Unfortunately, I don't have time to dig through it. Can you please check what happens to the JOIN if you say removing WHERE clause. With that the entire join should be shippable, assuming that the two relations are distributed on subscriber_id. > > config.subscribers is replicated to all nodes, and subscriber_482900.transactions is a partition, and the only subscriber_id in the whole table is 482900. It's distributed on another column, account_code. Taking out the WHERE clause is by definition a table scan: Sleeping on it, I realized that it might be thinking that the date math is volatile. So I tried a query using constant values, instead: =# EXPLAIN VERBOSE -# SELECT tr.subscriber_id AS subscriber_id -# , s.subscriber AS subscriber_name -# , tr.txn_id AS tracking_number -# FROM subscriber_:SUB_CODE.transactions tr -# LEFT JOIN config.subscribers s ON tr.subscriber_id = s.subscriber_id -# WHERE tr.subscriber_id = :SUB_CODE -# AND tr.txn_timestamp_utc >= '2014-01-15'::timestamp -# AND tr.txn_timestamp_utc < '2014-01-16'::timestamp -# ; QUERY PLAN --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Data Node Scan on "__REMOTE_FQS_QUERY__" (cost=0.00..0.00 rows=0 width=0) Output: tr.subscriber_id, s.subscriber, tr.txn_id Node/s: rptdata01, rptdata02, rptdata03, rptdata04 Remote query: SELECT tr.subscriber_id, s.subscriber AS subscriber_name, tr.txn_id AS tracking_number FROM (subscriber_482900.transactions tr LEFT JOIN config.subscribers s ON ((tr.subscriber_id = s.subscriber_id))) WHERE (((tr.subscriber_id = 482900) AND (tr.txn_timestamp_utc >= '2014-01-15 00:00:00'::timestamp without time zone)) AND (tr.txn_timestamp_utc < '2014-01-16 00:00:00'::timestamp without time zone)) (4 rows) That’s *so* much better. Makes me realize we probably have the same problem in the original Postgres version of this query. Thanks, David |
From: Ashutosh B. <ash...@en...> - 2014-03-05 04:40:15
|
On Tue, Mar 4, 2014 at 10:27 PM, David E. Wheeler <da...@ju...>wrote: > On Mar 3, 2014, at 8:45 PM, David E. Wheeler <da...@ju...> > wrote: > > >> Although, it looks like the quals are shippable, there may be something > which is being deemed as volatile (function or operator or a cast) or some > parameter. Unfortunately, I don't have time to dig through it. Can you > please check what happens to the JOIN if you say removing WHERE clause. > With that the entire join should be shippable, assuming that the two > relations are distributed on subscriber_id. > > > > config.subscribers is replicated to all nodes, and > subscriber_482900.transactions is a partition, and the only subscriber_id > in the whole table is 482900. It's distributed on another column, > account_code. Taking out the WHERE clause is by definition a table scan: > > Sleeping on it, I realized that it might be thinking that the date math is > volatile. So I tried a query using constant values, instead: > > =# EXPLAIN VERBOSE > -# SELECT tr.subscriber_id AS subscriber_id > -# , s.subscriber AS subscriber_name > -# , tr.txn_id AS tracking_number > -# FROM subscriber_:SUB_CODE.transactions tr > -# LEFT JOIN config.subscribers s ON tr.subscriber_id = s.subscriber_id > -# WHERE tr.subscriber_id = :SUB_CODE > -# AND tr.txn_timestamp_utc >= '2014-01-15'::timestamp > -# AND tr.txn_timestamp_utc < '2014-01-16'::timestamp > -# ; > > > QUERY PLAN > > --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- > Data Node Scan on "__REMOTE_FQS_QUERY__" (cost=0.00..0.00 rows=0 width=0) > Output: tr.subscriber_id, s.subscriber, tr.txn_id > Node/s: rptdata01, rptdata02, rptdata03, rptdata04 > Remote query: SELECT tr.subscriber_id, s.subscriber AS subscriber_name, > tr.txn_id AS tracking_number FROM (subscriber_482900.transactions tr LEFT > JOIN config.subscribers s ON ((tr.subscriber_id = s.subscriber_id))) WHERE > (((tr.subscriber_id = 482900) AND (tr.txn_timestamp_utc >= '2014-01-15 > 00:00:00'::timestamp without time zone)) AND (tr.txn_timestamp_utc < > '2014-01-16 00:00:00'::timestamp without time zone)) > (4 rows) > > That's *so* much better. Makes me realize we probably have the same > problem in the original Postgres version of this query. > > Good, that's what I was pointing to in my earlier mail. Since the quals are not shippable because of the volatility, the LEFT join too is not shippable. Notice that PostgreSQL planner has pushed those quals down to the respective relation. BTW, I am still not able to spot the volatility in those quals, but that's a different question, I think. > Thanks, > > David > > -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company |