Hi
In coordinator restore mode. I get coredump when client issue DDL
Recreate steps:
1.pg_ctl start -Z restoremode -D /rdbdata/bcrdb_data/coord
2.psql -hzhcx5i -p20015 cxdb
3.create table:
CREATE TABLE cm_busi_handle_201301 (
so_nbr bigint,
region_code integer,
process_id integer,
process_result integer,
handle_seq integer,
op_id integer,
oper_date timestamp without time zone,
oper_end_date timestamp without time zone,
invoice_no character varying(20),
property character varying(20),
oper_desc text
)
DISTRIBUTE BY MODULO (region_code)
TO NODE (datanode1,datanode2,datanode3,datanode4,datanode5,datanode6,datanode7,datanode8)
4.get coredump
cxdb=# CREATE TABLE cm_busi_handle_201301 (
cxdb(# so_nbr bigint,
cxdb(# region_code integer,
cxdb(# process_id integer,
cxdb(# process_result integer,
cxdb(# handle_seq integer,
cxdb(# op_id integer,
cxdb(# oper_date timestamp without time zone,
cxdb(# oper_end_date timestamp without time zone,
cxdb(# invoice_no character varying(20),
cxdb(# property character varying(20),
cxdb(# oper_desc text
cxdb(# )
cxdb-# DISTRIBUTE BY MODULO (region_code)
cxdb-# TO NODE (datanode1,datanode2,datanode3,datanode4,datanode5,datanode6,datanode7,datanode8);
The connection to the server was lost. Attempting reset: Failed.
!>
5.stack when get coredump
gdb /rdbdata/bcrdb_install/bin/postgres /tmp/corefile/core.postgres.48524
(gd bt
#0 0x00000036a48328a5 in raise () from /lib64/libc.so.6
#1 0x00000036a4834085 in abort () from /lib64/libc.so.6
#2 0x00000036a486fa37 in __libc_message () from /lib64/libc.so.6
#3 0x00000036a4875366 in malloc_printerr () from /lib64/libc.so.6
#4 0x00000036a4877e93 in _int_free () from /lib64/libc.so.6
#5 0x0000000000769879 in AllocSetDelete (context=<value optimized out>) at aset.c:551
#6 0x0000000000769dad in MemoryContextDelete (context=0x12231e8) at mcxt.c:193
#7 0x000000000076aa70 in PortalDrop (portal=0x122d0c0, isTopCommit=<value optimized out>) at portalmem.c:588
#8 0x000000000067ddaa in exec_simple_query (
query_string=0x114c1e0 "CREATE TABLE cm_busi_handle_201301 (\n so_nbr bigint,\n region_code integer,\n process_id integer,\n process_result integer,\n handle_seq integer,\n op_id integer,\n oper_date timestamp "...) at postgres.c:1149
#9 0x000000000067f82f in PostgresMain (argc=<value optimized out>, argv=<value optimized out>, dbname=0x1166708 "cxdb",
username=<value optimized out>) at postgres.c:4243
#10 0x000000000063b84a in BackendRun (argc=<value optimized out>, argv=<value optimized out>) at postmaster.c:4202
#11 BackendStartup (argc=<value optimized out>, argv=<value optimized out>) at postmaster.c:3891
#12 ServerLoop (argc=<value optimized out>, argv=<value optimized out>) at postmaster.c:1702
#13 PostmasterMain (argc=<value optimized out>, argv=<value optimized out>) at postmaster.c:1369
#14 0x00000000005d1420 in main (argc=4, argv=0x1131c70) at main.c:206
My analysis is as below:
when I start coordinator in restoremode. Pooler process is not running. So NumDataNodes is zero.
so there is a problem in function BuildRelationDistributionNodes
{
//In restoremode. NumDataNodes is 0. So memory allocation has the problem. when the parameter of palloc0 is 0.Then palloc0 allocate smallest chunk to //nodeoids. If we need more memory, there is a memory overflow. so when postgres free memory, get coredump
nodeoids = (Oid ) palloc0(NumDataNodes * sizeof(Oid));
}
The right code as follows:
BuildRelationDistributionNodes(List *nodes, int *numnodes)
{
Oid *nodeoids;
ListCell *item;
*numnodes = 0;
int numdatanotes;
numdatanotes=list_length(nodes);
nodeoids = (Oid *) palloc0(numdatanotes*sizeof(Oid));
}
In my team, Test result is ok. Please review these code, If any problem. Please let me know.
xu...@cm...
|