Re: [Postgres-xc-developers] connection to server was lost

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Hi David,
It might be due to the large amount of data sent from the datanode to the
connector. When you see message "connection to client lost" at datanode, it
means that the connection to the coordinator was lost. In XC, coordinators
act as client to the datanode. Further, no message in the coordinator log
implies that there wasn't any segfault or error on the coordinator which
can result in loosing client (to the datanode). One way to verify this is
to check what happens for smaller amounts of the data. There is still some
code in executor, which saves data from datanode in a linked list and
because of large amount of data that process runs out of memory. You may
find something in system logs, if that is true.

Please do the following,
Run explain verbose on the query which showed this behavior and in that
output you will find what query is being sent to the datanode
Reduce your data on the datanode such that, that particular query returns
may be a few thousand rows to the coordinator. BTW, I have seen millions of
rows being exchanged between the coordinator and datanode without problem.
But still there is a case where large data would be a problem.
Now, see if the query runs without problem.

On Fri, Feb 28, 2014 at 6:23 AM, David E. Wheeler <da...@ju...>wrote:

> PGXC Hakers,
>
> I have finally loaded up my testing PGXC four-node cluster with a nice
> beefy database similar to a PostgreSQL database we use for long-running
> reporting queries. I gathered up one of our slower-running queries (26.6m
> run) and ran it on XC. Alas, after a while, it died with this error:
>
>     psql:slow.sql:73: connection to server was lost
>
> The coordinator log was not much help: nothing was logged. So I trolled
> through the logs on the data nodes. All four had these messages:
>
> > 2014-02-27 15:45:51 PST dwheeler 10.4.34.1(56968) 22213 530fc838.56c5
> LOG:  could not send data to client: Connection reset by peer
> > 2014-02-27 15:45:51 PST dwheeler 10.4.34.1(56968) 22213 530fc838.56c5
> STATEMENT:  SELECT subscriber_cmd_id, rule_reason, rule_score, txn_uuid,
> txn_timestamp_utc FROM ONLY subscriber_482900.transactions_rule tma WHERE
> (subscriber_id = 482900)
> > 2014-02-27 15:45:51 PST dwheeler 10.4.34.1(56968) 22213 530fc838.56c5
> FATAL:  connection to client lost
> > 2014-02-27 15:45:51 PST dwheeler 10.4.34.1(56968) 22213 530fc838.56c5
> STATEMENT:  SELECT subscriber_cmd_id, rule_reason, rule_score, txn_uuid,
> txn_timestamp_utc FROM ONLY subscriber_482900.transactions_rule tma WHERE
> (subscriber_id = 482900)
>
> No reason given for the dropped connection. I ran the query on the
> coordinator box, so psql should have connected via a socket rather than
> TCP. Out of curiosity, I looked at the logs for the other three
> coordinators. None had any error messages, either.
>
> So, no idea what's timing out; statement_timeout is set to 0. Here are the
> settings from my coordinator's postgreql.conf:
>
>     max_connections = 100
>     shared_buffers = 32MB
>     log_destination = 'stderr'
>     logging_collector = on
>     log_directory = 'pg_log'
>     log_filename = 'postgresql-%a.log'
>     log_truncate_on_rotation = on
>     log_rotation_age = 1d
>     log_rotation_size = 0
>     log_line_prefix = '< %m >'
>     log_timezone = 'US/Pacific'
>     datestyle = 'iso, mdy'
>     timezone = 'US/Pacific'
>     lc_messages = 'en_US.UTF-8'
>     lc_monetary = 'en_US.UTF-8'
>     lc_numeric = 'en_US.UTF-8'
>     lc_time = 'en_US.UTF-8'
>     default_text_search_config = 'pg_catalog.english'
>     pgxc_node_name = 'node1'
>     port = 5432
>     listen_addresses = '*'
>     shared_buffers = 250MB
>     work_mem = 128MB
>     maintenance_work_mem = 128MB
>     effective_cache_size = 8GB
>     log_line_prefix = '%t %u %r %p %c '
>     timezone = 'UTC'
>     gtm_host = 'node1.example.com'
>
> And from one of the data nodes (only the names differ on the others):
>
>     max_connections = 100
>     shared_buffers = 32MB
>     log_destination = 'stderr'
>     logging_collector = on
>     log_directory = 'pg_log'
>     log_filename = 'postgresql-%a.log'
>     log_truncate_on_rotation = on
>     log_rotation_age = 1d
>     log_rotation_size = 0
>     log_line_prefix = '< %m >'
>     log_timezone = 'US/Pacific'
>     datestyle = 'iso, mdy'
>     timezone = 'US/Pacific'
>     lc_messages = 'en_US.UTF-8'
>     lc_monetary = 'en_US.UTF-8'
>     lc_numeric = 'en_US.UTF-8'
>     lc_time = 'en_US.UTF-8'
>     default_text_search_config = 'pg_catalog.english'
>     pgxc_node_name = 'node1'
>     port = 15432
>     listen_addresses = '*'
>     shared_buffers = 750MB
>     work_mem = 128MB
>     maintenance_work_mem = 128MB
>     effective_cache_size = 23GB
>     log_line_prefix = '%t %u %r %p %c '
>     timezone = 'UTC'
>     gtm_host = 'node1.iovationnp.com'
>
> Thoughts? What could be timing out?
>
> Thanks,
>
> David
>
>
>
>
> ------------------------------------------------------------------------------
> Flow-based real-time traffic analytics software. Cisco certified tool.
> Monitor traffic, SLAs, QoS, Medianet, WAAS etc. with NetFlow Analyzer
> Customize your own dashboards, set traffic alerts and generate reports.
> Network behavioral analysis & security monitoring. All-in-one tool.
>
> http://pubads.g.doubleclick.net/gampad/clk?id=126839071&iu=/4140/ostg.clktrk
> _______________________________________________
> Postgres-xc-developers mailing list
> Pos...@li...
> https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers
>

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company