From: <xu...@cm...> - 2015-01-27 11:29:58
|
Hi In coordinator restore mode. I get coredump when client issue DDL Recreate steps: 1.pg_ctl start -Z restoremode -D /rdbdata/bcrdb_data/coord 2.psql -hzhcx5i -p20015 cxdb 3.create table: CREATE TABLE cm_busi_handle_201301 ( so_nbr bigint, region_code integer, process_id integer, process_result integer, handle_seq integer, op_id integer, oper_date timestamp without time zone, oper_end_date timestamp without time zone, invoice_no character varying(20), property character varying(20), oper_desc text ) DISTRIBUTE BY MODULO (region_code) TO NODE (datanode1,datanode2,datanode3,datanode4,datanode5,datanode6,datanode7,datanode8) 4.get coredump cxdb=# CREATE TABLE cm_busi_handle_201301 ( cxdb(# so_nbr bigint, cxdb(# region_code integer, cxdb(# process_id integer, cxdb(# process_result integer, cxdb(# handle_seq integer, cxdb(# op_id integer, cxdb(# oper_date timestamp without time zone, cxdb(# oper_end_date timestamp without time zone, cxdb(# invoice_no character varying(20), cxdb(# property character varying(20), cxdb(# oper_desc text cxdb(# ) cxdb-# DISTRIBUTE BY MODULO (region_code) cxdb-# TO NODE (datanode1,datanode2,datanode3,datanode4,datanode5,datanode6,datanode7,datanode8); The connection to the server was lost. Attempting reset: Failed. !> 5.stack when get coredump gdb /rdbdata/bcrdb_install/bin/postgres /tmp/corefile/core.postgres.48524 (gd bt #0 0x00000036a48328a5 in raise () from /lib64/libc.so.6 #1 0x00000036a4834085 in abort () from /lib64/libc.so.6 #2 0x00000036a486fa37 in __libc_message () from /lib64/libc.so.6 #3 0x00000036a4875366 in malloc_printerr () from /lib64/libc.so.6 #4 0x00000036a4877e93 in _int_free () from /lib64/libc.so.6 #5 0x0000000000769879 in AllocSetDelete (context=<value optimized out>) at aset.c:551 #6 0x0000000000769dad in MemoryContextDelete (context=0x12231e8) at mcxt.c:193 #7 0x000000000076aa70 in PortalDrop (portal=0x122d0c0, isTopCommit=<value optimized out>) at portalmem.c:588 #8 0x000000000067ddaa in exec_simple_query ( query_string=0x114c1e0 "CREATE TABLE cm_busi_handle_201301 (\n so_nbr bigint,\n region_code integer,\n process_id integer,\n process_result integer,\n handle_seq integer,\n op_id integer,\n oper_date timestamp "...) at postgres.c:1149 #9 0x000000000067f82f in PostgresMain (argc=<value optimized out>, argv=<value optimized out>, dbname=0x1166708 "cxdb", username=<value optimized out>) at postgres.c:4243 #10 0x000000000063b84a in BackendRun (argc=<value optimized out>, argv=<value optimized out>) at postmaster.c:4202 #11 BackendStartup (argc=<value optimized out>, argv=<value optimized out>) at postmaster.c:3891 #12 ServerLoop (argc=<value optimized out>, argv=<value optimized out>) at postmaster.c:1702 #13 PostmasterMain (argc=<value optimized out>, argv=<value optimized out>) at postmaster.c:1369 #14 0x00000000005d1420 in main (argc=4, argv=0x1131c70) at main.c:206 My analysis is as below: when I start coordinator in restoremode. Pooler process is not running. So NumDataNodes is zero. so there is a problem in function BuildRelationDistributionNodes { //In restoremode. NumDataNodes is 0. So memory allocation has the problem. when the parameter of palloc0 is 0.Then palloc0 allocate smallest chunk to //nodeoids. If we need more memory, there is a memory overflow. so when postgres free memory, get coredump nodeoids = (Oid ) palloc0(NumDataNodes * sizeof(Oid)); } The right code as follows: BuildRelationDistributionNodes(List *nodes, int *numnodes) { Oid *nodeoids; ListCell *item; *numnodes = 0; int numdatanotes; numdatanotes=list_length(nodes); nodeoids = (Oid *) palloc0(numdatanotes*sizeof(Oid)); } In my team, Test result is ok. Please review these code, If any problem. Please let me know. xu...@cm... |