From: David E. W. <da...@ju...> - 2014-03-03 18:02:32
|
On Feb 28, 2014, at 8:23 PM, Ashutosh Bapat <ash...@en...> wrote: > This looks weird. Are you seeing this error in datanode logs or coordinator logs. For mere EXPLAIN only coordinator is active. Coordinator and datanode are not connected. So, if you are seeing disconnection error on datanode when you fired EXPLAIN VERBOSE on coordinator, that's weird. Am I missing something? The error is in the data node logs. The coordinator has this: 2014-03-03 09:21:43 PST 22325 530659c2.5735 LOG: server process (PID 24475) was terminated by signal 9: Killed 2014-03-03 09:21:43 PST 22325 530659c2.5735 DETAIL: Failed process was running: EXPLAIN ANALYZE WITH DTL1 as (Select td.subscriber_id, s.subscriber::varchar(40) as subscriber_name, td.txn_id::char(20) as tracking_number, to_char(td.txn_timestamp_local,'yyyy/mm/dd-hh24:mi:ss')::varchar(50) as transaction_time_local, to_char(td.txn_timestamp_utc,'yyyy/mm/dd-hh24:mi:ss')::varchar(50) as transaction_time, td.local_timezone as local_timezone, td.account_code::varchar(60), case when td.account_code is not null then valueof_node_id(node_id_to_string( node_id( subscriber_id := s.subscriber_id, account_code := td.account_code)))::varchar(80) else null::varchar(80) end as account_link, td.device_id::char(20) as device_id, CASE WHEN td.result_code = 0 THEN 'Allow'::varchar(10) WHEN td.result_code = 1 THEN 'Deny'::varchar(10) WHEN td.result_code = 2 THEN 'Review'::varchar(10) ELSE 'Other'::varchar(10) END as result, td.ruleset, 2014-03-03 09:21:43 PST 22325 530659c2.5735 LOG: terminating any other active server processes 2014-03-03 09:21:44 PST 9611 53110262.258b WARNING: terminating connection because of crash of another server process 2014-03-03 09:21:44 PST 9611 53110262.258b DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory. 2014-03-03 09:21:44 PST 9611 53110262.258b HINT: In a moment you should be able to reconnect to the database and repeat your command. 2014-03-03 09:21:44 PST 22325 530659c2.5735 LOG: all server processes terminated; reinitializing 2014-03-03 09:21:48 PST 26198 5314ba2c.6656 LOG: database system was interrupted; last known up at 2014-03-03 09:16:56 PST 2014-03-03 09:21:48 PST 26198 5314ba2c.6656 LOG: database system was not properly shut down; automatic recovery in progress 2014-03-03 09:21:48 PST 26198 5314ba2c.6656 LOG: redo starts at 0/2F47B70 2014-03-03 09:21:48 PST 26198 5314ba2c.6656 LOG: record with zero length at 0/2F47D50 2014-03-03 09:21:48 PST 26198 5314ba2c.6656 LOG: redo done at 0/2F47D20 2014-03-03 09:21:48 PST 26198 5314ba2c.6656 LOG: last completed transaction was at log time 2014-03-03 17:21:13.279991+00 2014-03-03 09:21:48 PST 26202 5314ba2c.665a LOG: autovacuum launcher started 2014-03-03 09:21:48 PST 22325 530659c2.5735 LOG: database system is ready to accept connections The query I’m running has EXPLAIN VERBOSE, not EXPLAIN ANALYZE. Is the parser mucking with it somehow? This is 1.1. The possibility of corrupted shared memory is distressing; wish the log would say which node exited abnormally. David |