From: David E. W. <da...@ju...> - 2014-02-28 00:53:22
|
PGXC Hakers, I have finally loaded up my testing PGXC four-node cluster with a nice beefy database similar to a PostgreSQL database we use for long-running reporting queries. I gathered up one of our slower-running queries (26.6m run) and ran it on XC. Alas, after a while, it died with this error: psql:slow.sql:73: connection to server was lost The coordinator log was not much help: nothing was logged. So I trolled through the logs on the data nodes. All four had these messages: > 2014-02-27 15:45:51 PST dwheeler 10.4.34.1(56968) 22213 530fc838.56c5 LOG: could not send data to client: Connection reset by peer > 2014-02-27 15:45:51 PST dwheeler 10.4.34.1(56968) 22213 530fc838.56c5 STATEMENT: SELECT subscriber_cmd_id, rule_reason, rule_score, txn_uuid, txn_timestamp_utc FROM ONLY subscriber_482900.transactions_rule tma WHERE (subscriber_id = 482900) > 2014-02-27 15:45:51 PST dwheeler 10.4.34.1(56968) 22213 530fc838.56c5 FATAL: connection to client lost > 2014-02-27 15:45:51 PST dwheeler 10.4.34.1(56968) 22213 530fc838.56c5 STATEMENT: SELECT subscriber_cmd_id, rule_reason, rule_score, txn_uuid, txn_timestamp_utc FROM ONLY subscriber_482900.transactions_rule tma WHERE (subscriber_id = 482900) No reason given for the dropped connection. I ran the query on the coordinator box, so psql should have connected via a socket rather than TCP. Out of curiosity, I looked at the logs for the other three coordinators. None had any error messages, either. So, no idea what’s timing out; statement_timeout is set to 0. Here are the settings from my coordinator’s postgreql.conf: max_connections = 100 shared_buffers = 32MB log_destination = 'stderr' logging_collector = on log_directory = 'pg_log' log_filename = 'postgresql-%a.log' log_truncate_on_rotation = on log_rotation_age = 1d log_rotation_size = 0 log_line_prefix = '< %m >' log_timezone = 'US/Pacific' datestyle = 'iso, mdy' timezone = 'US/Pacific' lc_messages = 'en_US.UTF-8' lc_monetary = 'en_US.UTF-8' lc_numeric = 'en_US.UTF-8' lc_time = 'en_US.UTF-8' default_text_search_config = 'pg_catalog.english' pgxc_node_name = 'node1' port = 5432 listen_addresses = '*' shared_buffers = 250MB work_mem = 128MB maintenance_work_mem = 128MB effective_cache_size = 8GB log_line_prefix = '%t %u %r %p %c ' timezone = 'UTC' gtm_host = 'node1.example.com' And from one of the data nodes (only the names differ on the others): max_connections = 100 shared_buffers = 32MB log_destination = 'stderr' logging_collector = on log_directory = 'pg_log' log_filename = 'postgresql-%a.log' log_truncate_on_rotation = on log_rotation_age = 1d log_rotation_size = 0 log_line_prefix = '< %m >' log_timezone = 'US/Pacific' datestyle = 'iso, mdy' timezone = 'US/Pacific' lc_messages = 'en_US.UTF-8' lc_monetary = 'en_US.UTF-8' lc_numeric = 'en_US.UTF-8' lc_time = 'en_US.UTF-8' default_text_search_config = 'pg_catalog.english' pgxc_node_name = 'node1' port = 15432 listen_addresses = '*' shared_buffers = 750MB work_mem = 128MB maintenance_work_mem = 128MB effective_cache_size = 23GB log_line_prefix = '%t %u %r %p %c ' timezone = 'UTC' gtm_host = 'node1.iovationnp.com' Thoughts? What could be timing out? Thanks, David |