|
From: 张仲良 <jul...@ou...> - 2014-01-09 02:39:58
|
Your
name :
Julian
Your email
address : jul...@ou...
System
Configuration:
---------------------
Architecture
(example: Intel
Pentium) : Intel
Pentium
Operating
System (example: Linux 2.4.18) : 2.6.32-358.el6.x86_64
Postgres-XC
version (example: Postgres-XC 1.1devel): Github master
Compiler
used (example: gcc
3.3.5) : gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-3)
Please enter a
FULL description of your problem:
------------------------------------------------
When testing tpcc (100 warehouses) on PGXC using HammerDB with 20 concurrent users, about 2 minutes later, all the sessions are blocked by acquiring the TwoPhaseStateLock:execute direct on (datanode2) $$select * from pg_stat_activity where state != 'idle' order by query_start$$;
datid | pid | query_start | query
-------+-------+------------------------------+ --------------------------------------------------------------------------
16384 | 19392 |2014-01-08 14:24:24.274622+08 | autovacuum: VACUUM ANALYZE public.stock
16384 | 19384 |2014-01-08 14:25:49.073815+08 | PREPARE TRANSACTION 'T27146'
16384 | 19383 |2014-01-08 14:25:49.084483+08 | COMMIT PREPARED 'T27077'
16384 | 19382 |2014-01-08 14:25:49.087827+08 | COMMIT PREPARED 'T27052'
16384 | 19385 |2014-01-08 14:25:49.109279+08 | COMMIT PREPARED 'T27118'
16384 | 19373 |2014-01-08 14:25:49.114323+08 | COMMIT PREPARED 'T27111'
16384 | 19372 |2014-01-08 14:25:49.114784+08 | COMMIT PREPARED 'T27063'
16384 | 19376 |2014-01-08 14:25:49.131651+08 | COMMIT PREPARED 'T27102'
16384 | 19371 |2014-01-08 14:25:49.147467+08 | COMMIT PREPARED 'T27023'
16384 | 19374 |2014-01-08 14:25:49.156297+08 | COMMIT PREPARED 'T27123'
16384 | 19386 |2014-01-08 14:25:49.168084+08 | COMMIT PREPARED 'T27128'
16384 | 19389 |2014-01-08 14:25:49.179543+08 | PREPARE TRANSACTION 'T27161'
16384 | 19380 |2014-01-08 14:25:49.222886+08 | COMMIT PREPARED 'T27083'
16384 | 19377 |2014-01-08 14:25:49.373674+08 | PREPARE TRANSACTION 'T27178'
16384 | 19388 |2014-01-08 14:25:49.386222+08 | PREPARE TRANSACTION 'T27180'
16384 | 19378 |2014-01-08 14:25:49.493811+08 | PREPARE TRANSACTION 'T27176'
16384 | 19381 |2014-01-08 14:25:49.662885+08 | PREPARE TRANSACTION 'T27148'
16384 | 19375 |2014-01-08 14:25:49.680977+08 | PREPARE TRANSACTION 'T27156'
16384 | 19387 |2014-01-08 14:25:49.744282+08 | PREPARE TRANSACTION 'T27157'
16384 | 19370 |2014-01-08 14:25:49.7463+08 | PREPARE TRANSACTION 'T27173'
16384 | 19379 |2014-01-08 14:25:49.866666+08 | PREPARE TRANSACTION 'T27171'
16384 | 18687 |2014-01-08 14:30:46.506894+08 | select * from pg_stat_activity where state != 'idle' order by query_start
(22 rows)
One of the sessions has the stack as below:#0 0x0000003ce42eaf37 in semop () from /lib64/libc.so.6#1 0x00000000006d9d7a in PGSemaphoreLock (sema=0x7faf6078c490, interruptOK=0 '\000') at pg_sema.c:415#2 0x000000000072dce3 in LWLockAcquire (lockid=TwoPhaseStateLock, mode=LW_EXCLUSIVE) at lwlock.c:474#3 0x00000000004adae4 in MarkAsPreparing (xid=69984, gid=0xe9f060 "T69983", prepared_at=442376096193723, owner=10, databaseid=16450) at twophase.c:267#4 0x00000000004a567a in PrepareTransaction () at xact.c:2684#5 0x00000000004a5e55 in CommitTransactionCommand () at xact.c:3248#6 0x000000000073d025 in finish_xact_command () at postgres.c:2551#7 0x000000000073ac89 in exec_simple_query (query_string=0xd7bc60 "PREPARE TRANSACTION 'T69983'") at postgres.c:1159#8 0x000000000073f018 in PostgresMain (argc=2, argv=0xd63828, username=0xd636b0 "zhangzl") at postgres.c:4212#9 0x00000000006eafca in BackendRun (port=0xd86960) at postmaster.c:3803#10 0x00000000006ea6b9 in BackendStartup (port=0xd86960) at postmaster.c:3488#11 0x00000000006e7473 in ServerLoop () at postmaster.c:1466#12 0x00000000006e6e7c in PostmasterMain (argc=5, argv=0xd61870) at postmaster.c:1226#13 0x0000000000650b1d in main (argc=5, argv=0xd61870) at main.c:199
But the session holding the TwoPhaseStatLock is blocked at:(gdb) bt#0 0x0000003ce42eaf37 in semop () from /lib64/libc.so.6#1 0x00000000006d9e1a in PGSemaphoreLock (sema=0x7f103f857b90, interruptOK=1 '\001') at pg_sema.c:415#2 0x000000000072b15d in ProcSleep (locallock=0x1712650, lockMethodTable=0x9dcb20) at proc.c:1086#3 0x0000000000726566 in WaitOnLock (locallock=0x1712650, owner=0x179ed40) at lock.c:1537#4 0x00000000007258be in LockAcquireExtendedXC (locktag=0x7fff84dca5b0, lockmode=7, sessionLock=0 '\000', dontWait=0 '\000', reportMemoryError=1 '\001', only_increment=0 '\000') at lock.c:914#5 0x0000000000724fcd in LockAcquireExtended (locktag=0x7fff84dca5b0, lockmode=7, sessionLock=0 '\000', dontWait=0 '\000', reportMemoryError=1 '\001') at lock.c:616#6 0x0000000000724f36 in LockAcquire (locktag=0x7fff84dca5b0, lockmode=7, sessionLock=0 '\000', dontWait=0 '\000') at lock.c:575#7 0x0000000000724665 in XactLockTableInsert (xid=25010) at lmgr.c:433#8 0x00000000004a390c in AssignTransactionId (s=0xcbd9c0) at xact.c:619#9 0x00000000004a3657 in GetTopTransactionId () at xact.c:429#10 0x00000000004ae2e7 in LockGXact (gid=0x170e3a8 "T24965", user=10) at twophase.c:460#11 0x00000000004afa05 in FinishPreparedTransaction (gid=0x170e3a8 "T24965", isCommit=1 '\001') at twophase.c:1298#12 0x0000000000742fa6 in standard_ProcessUtility (parsetree=0x170e3c0, queryString=0x170d9e0 "COMMIT PREPARED 'T24965'", params=0x0, isTopLevel=1 '\001', dest=0x170e700, sentToRemote=0 '\000', completionTag=0x7fff84dcaee0 "") at utility.c:520#13 0x0000000000742c10 in ProcessUtility (parsetree=0x170e3c0, queryString=0x170d9e0 "COMMIT PREPARED 'T24965'", params=0x0, isTopLevel=1 '\001', dest=0x170e700, sentToRemote=0 '\000', completionTag=0x7fff84dcaee0 "") at utility.c:377#14 0x0000000000741b9b in PortalRunUtility (portal=0x1713330, utilityStmt=0x170e3c0, isTopLevel=1 '\001', dest=0x170e700, completionTag=0x7fff84dcaee0 "") at pquery.c:1284#15 0x0000000000741dc8 in PortalRunMulti (portal=0x1713330, isTopLevel=1 '\001', dest=0x170e700, altdest=0x170e700, completionTag=0x7fff84dcaee0 "") at pquery.c:1431#16 0x000000000074126f in PortalRun (portal=0x1713330, count=9223372036854775807, isTopLevel=1 '\001', dest=0x170e700, altdest=0x170e700, completionTag=0x7fff84dcaee0 "") at pquery.c:881#17 0x000000000073ae2d in exec_simple_query (query_string=0x170d9e0 "COMMIT PREPARED 'T24965'") at postgres.c:1142#18 0x000000000073f1f0 in PostgresMain (argc=2, argv=0x16f55a8, username=0x16f5430 "zhangzl") at postgres.c:4212#19 0x00000000006eb06a in BackendRun (port=0x17186b0) at postmaster.c:3803#20 0x00000000006ea759 in BackendStartup (port=0x17186b0) at postmaster.c:3488#21 0x00000000006e7513 in ServerLoop () at postmaster.c:1466#22 0x00000000006e6f1c in PostmasterMain (argc=5, argv=0x16f3610) at postmaster.c:1226#23 0x0000000000650bbd in main (argc=5, argv=0x16f3610) at main.c:199
Please describe a
way to repeat the problem. Please try to provide a
concise
reproducible example, if at all possible:
----------------------------------------------------------------------
Run a TPCC test tool named HammerDB, create a workload with 100 warehouses, run tpcc tests with 20 users.
If you know how
this problem might be fixed, list the solution below:
---------------------------------------------------------------------
According to the stack of the session which holds the TwoPhaseStateLock, the error is in the function of
LockGXact:static GlobalTransaction
LockGXact(const char *gid, Oid user)
{
......
LWLockAcquire(TwoPhaseStateLock, LW_EXCLUSIVE);
for (i = 0; i < TwoPhaseState->numPrepXacts; i++)
{
......
gxact->locking_xid = GetTopTransactionId();
LWLockRelease(TwoPhaseStateLock);
return gxact;
}
LWLockRelease(TwoPhaseStateLock);
......
}
GetTopTransactionId() is blocked by acquiring another lock, but it is invoked between the TwoPhaseStatLock's Acquire and Release.
To fix it, I just call GetTopTransactionId() before "LWLockAcquire(TwoPhaseStateLock, LW_EXCLUSIVE)" to enable the TopTransactionId can be got directly later.
diff --git a/src/backend/access/transam/twophase.c b/src/backend/access/transam/twophase.c
index c39d9e6..1312e88 100644
--- a/src/backend/access/transam/twophase.c
+++ b/src/backend/access/transam/twophase.c
@@ -414,6 +414,8 @@ LockGXact(const char *gid, Oid user)
{
int i;
+ GetTopTransactionId();
+
LWLockAcquire(TwoPhaseStateLock, LW_EXCLUSIVE);
for (i = 0; i < TwoPhaseState->numPrepXacts; i++)
Any commiter can help to review it and commit it to the Github master branch?
ThanksJulian
|