I am testing synchronous streaming replication of the XC nodes.
I configured a cluster, two synchronous standbys for each node (Coordinator and Datanode) and start testing.
After few hours some standbys went down, with
FATAL: too many KnownAssignedXids
in their logs. I could not restart and continue replicating - standbys went down immediately with the same error and I had to re-create failed standbys.
Eventually (within 3 days) all configured standbys died with the same error in the log.
With this bug streaming replication is not practically usable - system would have to continuously recreate standbys.
From searching the PostgreSQL archives, it appears they had a similar problem:
http://archives.postgresql.org/pgsql-hackers/2010-11/msg01303.php
It might be that they only partially fixed it, or that code was not quite merged in correctly.
I found in the notes of release 9.0.2 a bug regarding that.
http://www.postgresql.org/docs/9.1/static/release-9-0-2.html
Fix "too many KnownAssignedXids" error during Hot Standby replay (Heikki Linnakangas)
Is it related to that?
I am however just testing with the latest release of XC 0.9.6 which is based on PostgreSQL 9.1.1
This bug may perhaps not be related with standby nodes but with a memory leak. Postgres returns the error "too many KnownAssignedXids" only when there is no more memory available.
I am monitoring my tests and can see the system memory increasing in time slowly.
I encountered the bug on latest code from the master branch, as it was at the moment I filed the bug. I believe it is close to 0.9.6.
I don't believe the bug is because of two reasons:
First - after standby went down because of that bug the error is happened again each time when standby is restarted, immediately after startup, but shutdown should release all allocated memory.
Second - the error is actually happened when too many xids are written to a fixed-size array, the quession is, why there are too many xids.
I will comment on the bug if I figure out more.
I think I can shed more light on the problem.
There is a function GetRunningTransactionData() in procarray.c which scans master's procs during checkpoints. The snapshot is sent then to the standbys where some shared variables get updated, in particular ShmemVariableCache->nextXid.
When one of the nodes is idle for long while other nodes are working ShmemVariableCache->nextXid on the node stays unchanged, while gxid keeps advancing. So if the node after long idle period is executing a transaction the xid of the transaction may be much greater then nextxid during the last checkpoing.
When standby is redoing the transaction it considers all xids between local nextxid and xid of the current transaction as "known assigned" and the number of such xids may be greater then maximum MaxNumProcs*(1+MaxSubtrans).
After that standby can not recover - while redoing it is always receiving wal record for the checkpoint with too old nextxid followed by wal record for transaction with unexpectedly high xid.
In the case of Postgres, transaction ID is limited to local node, so we need a fix which interacts with GTM to get a correct transaction ID value. It looks that the only way to solve that would be to have the local ShmemVariableCache->nextXid in cache being updated with the latest transaction ID from GTM.
When calling KnownAssignedXidsAdd, we could use the global one instead of the local one as from_xid.
A better solution would be not to get the latest GXID, but directly from GTM the oldest transaction ID among the active transactions saved inside GTM. There is no API yet to get this value from GTM.
I'd like to know if the hot standby was enabled in standbys. If so, I'd like to know if the same problem occurs when hot standby is disabled. If this happens only when the hot standby is enabled, XC may need a fix when it provides hot standby to datanode standbys.
The issue is not reproduced in the latest version.
I added an extra log in GetRunningTransactionData() and now I can see the shared variables are updated over time.
I will investigate this some more and close the bug if it is no longer an issue.
This issue happens in vanilla PostgreSQL 9.1. This seems to be fixed inside the GIT repository and was applied PG 9.1 updates. It is fixed.