[Postgres-xc-developers] connection to server was lost

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

PGXC Hakers,

I have finally loaded up my testing PGXC four-node cluster with a nice beefy database similar to a PostgreSQL database we use for long-running reporting queries. I gathered up one of our slower-running queries (26.6m run) and ran it on XC. Alas, after a while, it died with this error:

    psql:slow.sql:73: connection to server was lost

The coordinator log was not much help: nothing was logged. So I trolled through the logs on the data nodes. All four had these messages:

> 2014-02-27 15:45:51 PST dwheeler 10.4.34.1(56968) 22213 530fc838.56c5 LOG:  could not send data to client: Connection reset by peer
> 2014-02-27 15:45:51 PST dwheeler 10.4.34.1(56968) 22213 530fc838.56c5 STATEMENT:  SELECT subscriber_cmd_id, rule_reason, rule_score, txn_uuid, txn_timestamp_utc FROM ONLY subscriber_482900.transactions_rule tma WHERE (subscriber_id = 482900)
> 2014-02-27 15:45:51 PST dwheeler 10.4.34.1(56968) 22213 530fc838.56c5 FATAL:  connection to client lost
> 2014-02-27 15:45:51 PST dwheeler 10.4.34.1(56968) 22213 530fc838.56c5 STATEMENT:  SELECT subscriber_cmd_id, rule_reason, rule_score, txn_uuid, txn_timestamp_utc FROM ONLY subscriber_482900.transactions_rule tma WHERE (subscriber_id = 482900)

No reason given for the dropped connection. I ran the query on the coordinator box, so psql should have connected via a socket rather than TCP. Out of curiosity, I looked at the logs for the other three coordinators. None had any error messages, either.

So, no idea what’s timing out; statement_timeout is set to 0. Here are the settings from my coordinator’s postgreql.conf:

    max_connections = 100
    shared_buffers = 32MB
    log_destination = 'stderr'
    logging_collector = on
    log_directory = 'pg_log'
    log_filename = 'postgresql-%a.log'
    log_truncate_on_rotation = on
    log_rotation_age = 1d
    log_rotation_size = 0
    log_line_prefix = '< %m >'
    log_timezone = 'US/Pacific'
    datestyle = 'iso, mdy'
    timezone = 'US/Pacific'
    lc_messages = 'en_US.UTF-8'
    lc_monetary = 'en_US.UTF-8'
    lc_numeric = 'en_US.UTF-8'
    lc_time = 'en_US.UTF-8'
    default_text_search_config = 'pg_catalog.english'
    pgxc_node_name = 'node1'
    port = 5432
    listen_addresses = '*'
    shared_buffers = 250MB
    work_mem = 128MB
    maintenance_work_mem = 128MB
    effective_cache_size = 8GB
    log_line_prefix = '%t %u %r %p %c '
    timezone = 'UTC'
    gtm_host = 'node1.example.com'

And from one of the data nodes (only the names differ on the others):

    max_connections = 100			
    shared_buffers = 32MB			
    log_destination = 'stderr'
    logging_collector = on	
    log_directory = 'pg_log'
    log_filename = 'postgresql-%a.log'
    log_truncate_on_rotation = on	
    log_rotation_age = 1d			
    log_rotation_size = 0			
    log_line_prefix = '< %m >'		
    log_timezone = 'US/Pacific'
    datestyle = 'iso, mdy'
    timezone = 'US/Pacific'
    lc_messages = 'en_US.UTF-8'		
    lc_monetary = 'en_US.UTF-8'		
    lc_numeric = 'en_US.UTF-8'		
    lc_time = 'en_US.UTF-8'			
    default_text_search_config = 'pg_catalog.english'
    pgxc_node_name = 'node1'	
    port = 15432
    listen_addresses = '*'
    shared_buffers = 750MB
    work_mem = 128MB
    maintenance_work_mem = 128MB
    effective_cache_size = 23GB
    log_line_prefix = '%t %u %r %p %c '
    timezone = 'UTC'
    gtm_host = 'node1.iovationnp.com'

Thoughts? What could be timing out?

Thanks,

David