Postgres-XC / Bugs / #348 Slave crashes in pgbench

Koichi Suzuki - 2012-11-06

Reproduced with pgbench.

Startup process encounters FATAL. Bt and related info are as follows:

----------------------
Breakpoint 1, KnownAssignedXidsAdd (from_xid=7941, to_xid=5789, exclusive_lock=0 '\000') at procarray.c:3482
3482 elog(ERROR, "too many KnownAssignedXids");
(gdb) bt
#0 KnownAssignedXidsAdd (from_xid=7941, to_xid=5789, exclusive_lock=0 '\000') at procarray.c:3482
#1 0x00000000006497bb in RecordKnownAssignedTransactionIds (xid=13730) at procarray.c:3182
#2 0x00000000004a5841 in StartupXLOG () at xlog.c:6725
#3 0x0000000000626a48 in StartupProcessMain () at startup.c:220
#4 0x00000000004c3c51 in AuxiliaryProcessMain (argc=2, argv=<value optimized out>) at bootstrap.c:438
#5 0x0000000000621b16 in StartChildProcess (type=StartupProcess) at postmaster.c:4713
#6 0x0000000000625d37 in PostmasterMain (argc=5, argv=0x12c3e710) at postmaster.c:1179
#7 0x00000000005bf99e in main (argc=5, argv=<value optimized out>) at main.c:199
(gdb) p from_xid
$1 = 7941
(gdb) p to_xid
$2 = 5789
(gdb) p head
$3 = 6252
(gdb) p nxids
$4 = 5790
(gdb) p pArray->maxKnownAssignedXids
$5 = 7410
(gdb)
-----------------------

Cuase is as follows:

1. Current slave assums there's no missing XID and stores XID to the array using XID value as an index.
2. In XC, there are some missing XIDs to specific node. Although number of XID is within the limit, XID as an index easily overflows.

Suggested fix: can be local to procarray.c
1. Not to use XID as an index to KnownAssignedXids
2. Rather, KnownAssignedXids should be packed and store KnownAssignedXid in ascending order.
3. KnownAssignedXidsValid array need similar modification

Should be careful not to slow down this array handling.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

mason_s - 2012-11-06

We recently observed this, too.

A short term solution is to set the slaves to warm and not hot standby until read-only slaves are implemented. It then will not use this part of the code.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Koichi Suzuki - 2012-11-06

Finally found the following:

Current code can store XID just for locally generated ones (TOTAL_MAX_CACHED_SUBXIDS). In XC, this should be multiplied by (MaxCoords + MaxDataNodes).

See proposed patch.

Test result is as follows:

------
PGXC$ pgbench -k -t 100000 -p 20004 -h node06
starting vacuum...end.
WARNING: unexpected EOF on datanode connection
WARNING: unexpected EOF on datanode connection
Client 0 aborted in state 11: ERROR: Failed to read response from Datanodes
transaction type: TPC-B (sort of)
scaling factor: 1
query mode: simple
number of clients: 1
number of threads: 1
number of transactions per client: 100000
number of transactions actually processed: 66171/100000
tps = 87.665172 (including connections establishing)
tps = 87.665392 (excluding connections establishing)
PGXC$
----
Although warning is printed, observed all the nodes (master and slave) are still running.

------
PGXC$ monitor all
GTM master (gtm): running. GTM slave (gtm): running. GTM proxy (gtm_pxy1): running. GTM proxy (gtm_pxy2): running. GTM proxy (gtm_pxy3): running. GTM proxy (gtm_pxy4): running. Coordinator master (coord1): running. Coordinator master (coord2): running. Coordinator master (coord3): running. Coordinator master (coord4): running. Coordinator slave (coord1): running. Coordinator slave (coord2): running. Coordinator slave (coord3): running. Coordinator slave (coord4): running. Datanode master (datanode1): running. Datanode master (datanode2): running. Datanode master (datanode3): running. Datanode master (datanode4): running. Datanode slave (datanode1): running. Datanode slave (datanode2): running. Datanode slave (datanode3): running. Datanode slave (datanode4): running. PGXC$ Psql
psql (PGXC 1.1devel, based Type "help" for help. host: node13, port: 20001, dir: /home/koichi/pgxc/nodes/gtm
host: node12, port: 20001, dir: /home/koichi/pgxc/nodes/gtm
host: node06, port: 20001, dir: /home/koichi/pgxc/nodes/gtm_pxy
host: node07, port: 20001, dir: /home/koichi/pgxc/nodes/gtm_pxy
host: node08, port: 20001, dir: /home/koichi/pgxc/nodes/gtm_pxy
host: node09, port: 20001, dir: /home/koichi/pgxc/nodes/gtm_pxy
host: node06, port: 20004, dir: /home/koichi/pgxc/nodes/coord
host: node07, port: 20005, dir: /home/koichi/pgxc/nodes/coord
host: node08, port: 20004, dir: /home/koichi/pgxc/nodes/coord
host: node09, port: 20005, dir: /home/koichi/pgxc/nodes/coord
host: node07, port: 20004, dir: /home/koichi/pgxc/nodes/coord_slave
host: node08, port: 20005, dir: /home/koichi/pgxc/nodes/coord_slave
host: node09, port: 20004, dir: /home/koichi/pgxc/nodes/coord_slave
host: node06, port: 20005, dir: /home/koichi/pgxc/nodes/coord_slave
host: node06, port: 20008, dir: /home/koichi/pgxc/nodes/dn_master
host: node07, port: 20009, dir: /home/koichi/pgxc/nodes/dn_master
host: node08, port: 20008, dir: /home/koichi/pgxc/nodes/dn_master
host: node09, port: 20009, dir: /home/koichi/pgxc/nodes/dn_master
host: node07, port: 20008, dir: /home/koichi/pgxc/nodes/dn_slave
host: node08, port: 20009, dir: /home/koichi/pgxc/nodes/dn_slave
host: node09, port: 20008, dir: /home/koichi/pgxc/nodes/dn_slave
host: node06, port: 20009, dir: /home/koichi/pgxc/nodes/dn_slave
on PG 9.2beta2)

koichi=# \d
List of relations
Schema | Name | Type | Owner
--------+------------------+-------+--------
public | pgbench_accounts | table | koichi
public | pgbench_branches | table | koichi
public | pgbench_history | table | koichi
public | pgbench_tellers | table | koichi
public | s | table | koichi
public | t | table | koichi
public | x | table | koichi
public | y | table | koichi
(8 rows)

koichi=# select * from s;
a | b
----+---
1 | a
2 | b
3 | c
4 | d
5 | e
6 | f
7 | g
8 | h
9 | i
10 | j
(10 rows)

koichi=#
-------

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Koichi Suzuki - 2012-11-06

Finally found the following:

Current code can store XID just for locally generated ones (TOTAL_MAX_CACHED_SUBXIDS). In XC, this should be multiplied by (MaxCoords + MaxDataNodes).

See proposed patch.

Test result is as follows:

------
PGXC$ pgbench -k -t 100000 -p 20004 -h node06
starting vacuum...end.
WARNING: unexpected EOF on datanode connection
WARNING: unexpected EOF on datanode connection
Client 0 aborted in state 11: ERROR: Failed to read response from Datanodes
transaction type: TPC-B (sort of)
scaling factor: 1
query mode: simple
number of clients: 1
number of threads: 1
number of transactions per client: 100000
number of transactions actually processed: 66171/100000
tps = 87.665172 (including connections establishing)
tps = 87.665392 (excluding connections establishing)
PGXC$
----
Although warning is printed, observed all the nodes (master and slave) are still running.

------
PGXC$ monitor all
GTM master (gtm): running. host: node13, port: 20001, dir: /home/koichi/pgxc/nodes/gtm
GTM slave (gtm): running. host: node12, port: 20001, dir: /home/koichi/pgxc/nodes/gtm
GTM proxy (gtm_pxy1): running. host: node06, port: 20001, dir: /home/koichi/pgxc/nodes/gtm_pxy
GTM proxy (gtm_pxy2): running. host: node07, port: 20001, dir: /home/koichi/pgxc/nodes/gtm_pxy
GTM proxy (gtm_pxy3): running. host: node08, port: 20001, dir: /home/koichi/pgxc/nodes/gtm_pxy
GTM proxy (gtm_pxy4): running. host: node09, port: 20001, dir: /home/koichi/pgxc/nodes/gtm_pxy
Coordinator master (coord1): running. host: node06, port: 20004, dir: /home/koichi/pgxc/nodes/coord
Coordinator master (coord2): running. host: node07, port: 20005, dir: /home/koichi/pgxc/nodes/coord
Coordinator master (coord3): running. host: node08, port: 20004, dir: /home/koichi/pgxc/nodes/coord
Coordinator master (coord4): running. host: node09, port: 20005, dir: /home/koichi/pgxc/nodes/coord
Coordinator slave (coord1): running. host: node07, port: 20004, dir: /home/koichi/pgxc/nodes/coord_slave
Coordinator slave (coord2): running. host: node08, port: 20005, dir: /home/koichi/pgxc/nodes/coord_slave
Coordinator slave (coord3): running. host: node09, port: 20004, dir: /home/koichi/pgxc/nodes/coord_slave
Coordinator slave (coord4): running. host: node06, port: 20005, dir: /home/koichi/pgxc/nodes/coord_slave
Datanode master (datanode1): running. host: node06, port: 20008, dir: /home/koichi/pgxc/nodes/dn_master
Datanode master (datanode2): running. host: node07, port: 20009, dir: /home/koichi/pgxc/nodes/dn_master
Datanode master (datanode3): running. host: node08, port: 20008, dir: /home/koichi/pgxc/nodes/dn_master
Datanode master (datanode4): running. host: node09, port: 20009, dir: /home/koichi/pgxc/nodes/dn_master
Datanode slave (datanode1): running. host: node07, port: 20008, dir: /home/koichi/pgxc/nodes/dn_slave
Datanode slave (datanode2): running. host: node08, port: 20009, dir: /home/koichi/pgxc/nodes/dn_slave
Datanode slave (datanode3): running. host: node09, port: 20008, dir: /home/koichi/pgxc/nodes/dn_slave
Datanode slave (datanode4): running. host: node06, port: 20009, dir: /home/koichi/pgxc/nodes/dn_slave
PGXC$ Psql
psql (PGXC 1.1devel, based on PG 9.2beta2)
Type "help" for help.

koichi=# \d
List of relations
Schema | Name | Type | Owner
--------+------------------+-------+--------
public | pgbench_accounts | table | koichi
public | pgbench_branches | table | koichi
public | pgbench_history | table | koichi
public | pgbench_tellers | table | koichi
public | s | table | koichi
public | t | table | koichi
public | x | table | koichi
public | y | table | koichi
(8 rows)

koichi=# select * from s;
a | b
----+---
1 | a
2 | b
3 | c
4 | d
5 | e
6 | f
7 | g
8 | h
9 | i
10 | j
(10 rows)

koichi=#
-------

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Koichi Suzuki - 2012-11-06

Suggested fix.

slave_crash_fix_2012_11_06.patch

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

mason_s - 2012-11-06

There is a theoretical danger that if a particular node is not involved in any transactions for a long time compared to the others that this delta could grow to be large between the last known XID it has seen and then a new one that starts. We may want to periodically update all of the nodes every n number of transactions. (A simple BEGIN-COMMIT on all nodes every n transactions?)

This could theoretically happen if, say, a new node is added, but no tables are added to it until later. Or, if a table was created on a single data node, and then many operations are executed just involving that table (and one data node).

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Koichi Suzuki - 2012-11-07

status: open --> closed
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Koichi Suzuki - 2012-11-07

Fixed by the commit b3208942ba78f26c97cc737a3d33bb359fd78132 (master) and 18b4519c82045646d202d467ee325a4a36a97108 (REL1_0_STABLE)

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Koichi Suzuki - 2012-11-27

This is re-opened because the patch does not work in some case, even in REL1_0_STABLE branch.

Now moves to REL1_0_STABLE branch bug.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Koichi Suzuki - 2012-11-27

milestone: 2663467 --> V1.0 maintenance

status: closed --> open
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Koichi Suzuki - 2012-11-27

Related code deals with snapshots in hot standby. The original code assumes PostgreSQL architecture such as:
1. There will be no missing XIDs in transferred WAL.
2. Arriving XID will be in the order. That is, any comming XID must follow all the KnownXIDs.
They don't apply to Postgres-XC. There will be missing XIDs and some younger XID may come later.

We need to improve procarray.c to deal with these situations.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Koichi Suzuki - 2012-12-25

The fix was committed by 185f790dd913c72b29a73436f9d0b1e749c40dd6 only for REL1_0_STABLE branch. This patch just disables management of KnownAssignedTransactionIds, which works with 1.0.x because hot standby is not correctly supported. To provide consistent database views to slaves and support hot standby, this
has to be handled correctly. Further code will be submitted for this purpose.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Koichi Suzuki - 2013-01-18

KnownAssignedXids are used to generate snapshot for hot standby. In XC, to provide hot standby, we need connections between coordinator and datanodes. Coordinators and datanodes status should be reasonably synchronized, not only WAL receiving but also replaying them. Correct handing of KnownAssignedXids should be done when XC supports such hot standby. At current master, we just disable KnownAssignedXids management and apply the same patch as REL1_0_STABLE.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Koichi Suzuki - 2013-01-18

Test script for the fix, for REL1_0_STABLE

KnownAssignedXids_test.bash

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Koichi Suzuki - 2013-01-18

Test script for the fix, for master (1.1 dev.)

KnownAssignedXids_test_master.bash

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Koichi Suzuki - 2013-01-18

status: open --> closed
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Koichi Suzuki - 2013-01-18

Patch 8a7e02a16924d6caaa5ceec5272bd2d72a8bc2ff fixes this problem for the master (1.1 dev).

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Slave crashes in pgbench

Group

Searches

Help

#348 Slave crashes in pgbench

Discussion