From: 张仲良 <jul...@ou...> - 2014-01-09 02:39:58
|
Your name : Julian Your email address : jul...@ou... System Configuration: --------------------- Architecture (example: Intel Pentium) : Intel Pentium Operating System (example: Linux 2.4.18) : 2.6.32-358.el6.x86_64 Postgres-XC version (example: Postgres-XC 1.1devel): Github master Compiler used (example: gcc 3.3.5) : gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-3) Please enter a FULL description of your problem: ------------------------------------------------ When testing tpcc (100 warehouses) on PGXC using HammerDB with 20 concurrent users, about 2 minutes later, all the sessions are blocked by acquiring the TwoPhaseStateLock:execute direct on (datanode2) $$select * from pg_stat_activity where state != 'idle' order by query_start$$; datid | pid | query_start | query -------+-------+------------------------------+ -------------------------------------------------------------------------- 16384 | 19392 |2014-01-08 14:24:24.274622+08 | autovacuum: VACUUM ANALYZE public.stock 16384 | 19384 |2014-01-08 14:25:49.073815+08 | PREPARE TRANSACTION 'T27146' 16384 | 19383 |2014-01-08 14:25:49.084483+08 | COMMIT PREPARED 'T27077' 16384 | 19382 |2014-01-08 14:25:49.087827+08 | COMMIT PREPARED 'T27052' 16384 | 19385 |2014-01-08 14:25:49.109279+08 | COMMIT PREPARED 'T27118' 16384 | 19373 |2014-01-08 14:25:49.114323+08 | COMMIT PREPARED 'T27111' 16384 | 19372 |2014-01-08 14:25:49.114784+08 | COMMIT PREPARED 'T27063' 16384 | 19376 |2014-01-08 14:25:49.131651+08 | COMMIT PREPARED 'T27102' 16384 | 19371 |2014-01-08 14:25:49.147467+08 | COMMIT PREPARED 'T27023' 16384 | 19374 |2014-01-08 14:25:49.156297+08 | COMMIT PREPARED 'T27123' 16384 | 19386 |2014-01-08 14:25:49.168084+08 | COMMIT PREPARED 'T27128' 16384 | 19389 |2014-01-08 14:25:49.179543+08 | PREPARE TRANSACTION 'T27161' 16384 | 19380 |2014-01-08 14:25:49.222886+08 | COMMIT PREPARED 'T27083' 16384 | 19377 |2014-01-08 14:25:49.373674+08 | PREPARE TRANSACTION 'T27178' 16384 | 19388 |2014-01-08 14:25:49.386222+08 | PREPARE TRANSACTION 'T27180' 16384 | 19378 |2014-01-08 14:25:49.493811+08 | PREPARE TRANSACTION 'T27176' 16384 | 19381 |2014-01-08 14:25:49.662885+08 | PREPARE TRANSACTION 'T27148' 16384 | 19375 |2014-01-08 14:25:49.680977+08 | PREPARE TRANSACTION 'T27156' 16384 | 19387 |2014-01-08 14:25:49.744282+08 | PREPARE TRANSACTION 'T27157' 16384 | 19370 |2014-01-08 14:25:49.7463+08 | PREPARE TRANSACTION 'T27173' 16384 | 19379 |2014-01-08 14:25:49.866666+08 | PREPARE TRANSACTION 'T27171' 16384 | 18687 |2014-01-08 14:30:46.506894+08 | select * from pg_stat_activity where state != 'idle' order by query_start (22 rows) One of the sessions has the stack as below:#0 0x0000003ce42eaf37 in semop () from /lib64/libc.so.6#1 0x00000000006d9d7a in PGSemaphoreLock (sema=0x7faf6078c490, interruptOK=0 '\000') at pg_sema.c:415#2 0x000000000072dce3 in LWLockAcquire (lockid=TwoPhaseStateLock, mode=LW_EXCLUSIVE) at lwlock.c:474#3 0x00000000004adae4 in MarkAsPreparing (xid=69984, gid=0xe9f060 "T69983", prepared_at=442376096193723, owner=10, databaseid=16450) at twophase.c:267#4 0x00000000004a567a in PrepareTransaction () at xact.c:2684#5 0x00000000004a5e55 in CommitTransactionCommand () at xact.c:3248#6 0x000000000073d025 in finish_xact_command () at postgres.c:2551#7 0x000000000073ac89 in exec_simple_query (query_string=0xd7bc60 "PREPARE TRANSACTION 'T69983'") at postgres.c:1159#8 0x000000000073f018 in PostgresMain (argc=2, argv=0xd63828, username=0xd636b0 "zhangzl") at postgres.c:4212#9 0x00000000006eafca in BackendRun (port=0xd86960) at postmaster.c:3803#10 0x00000000006ea6b9 in BackendStartup (port=0xd86960) at postmaster.c:3488#11 0x00000000006e7473 in ServerLoop () at postmaster.c:1466#12 0x00000000006e6e7c in PostmasterMain (argc=5, argv=0xd61870) at postmaster.c:1226#13 0x0000000000650b1d in main (argc=5, argv=0xd61870) at main.c:199 But the session holding the TwoPhaseStatLock is blocked at:(gdb) bt#0 0x0000003ce42eaf37 in semop () from /lib64/libc.so.6#1 0x00000000006d9e1a in PGSemaphoreLock (sema=0x7f103f857b90, interruptOK=1 '\001') at pg_sema.c:415#2 0x000000000072b15d in ProcSleep (locallock=0x1712650, lockMethodTable=0x9dcb20) at proc.c:1086#3 0x0000000000726566 in WaitOnLock (locallock=0x1712650, owner=0x179ed40) at lock.c:1537#4 0x00000000007258be in LockAcquireExtendedXC (locktag=0x7fff84dca5b0, lockmode=7, sessionLock=0 '\000', dontWait=0 '\000', reportMemoryError=1 '\001', only_increment=0 '\000') at lock.c:914#5 0x0000000000724fcd in LockAcquireExtended (locktag=0x7fff84dca5b0, lockmode=7, sessionLock=0 '\000', dontWait=0 '\000', reportMemoryError=1 '\001') at lock.c:616#6 0x0000000000724f36 in LockAcquire (locktag=0x7fff84dca5b0, lockmode=7, sessionLock=0 '\000', dontWait=0 '\000') at lock.c:575#7 0x0000000000724665 in XactLockTableInsert (xid=25010) at lmgr.c:433#8 0x00000000004a390c in AssignTransactionId (s=0xcbd9c0) at xact.c:619#9 0x00000000004a3657 in GetTopTransactionId () at xact.c:429#10 0x00000000004ae2e7 in LockGXact (gid=0x170e3a8 "T24965", user=10) at twophase.c:460#11 0x00000000004afa05 in FinishPreparedTransaction (gid=0x170e3a8 "T24965", isCommit=1 '\001') at twophase.c:1298#12 0x0000000000742fa6 in standard_ProcessUtility (parsetree=0x170e3c0, queryString=0x170d9e0 "COMMIT PREPARED 'T24965'", params=0x0, isTopLevel=1 '\001', dest=0x170e700, sentToRemote=0 '\000', completionTag=0x7fff84dcaee0 "") at utility.c:520#13 0x0000000000742c10 in ProcessUtility (parsetree=0x170e3c0, queryString=0x170d9e0 "COMMIT PREPARED 'T24965'", params=0x0, isTopLevel=1 '\001', dest=0x170e700, sentToRemote=0 '\000', completionTag=0x7fff84dcaee0 "") at utility.c:377#14 0x0000000000741b9b in PortalRunUtility (portal=0x1713330, utilityStmt=0x170e3c0, isTopLevel=1 '\001', dest=0x170e700, completionTag=0x7fff84dcaee0 "") at pquery.c:1284#15 0x0000000000741dc8 in PortalRunMulti (portal=0x1713330, isTopLevel=1 '\001', dest=0x170e700, altdest=0x170e700, completionTag=0x7fff84dcaee0 "") at pquery.c:1431#16 0x000000000074126f in PortalRun (portal=0x1713330, count=9223372036854775807, isTopLevel=1 '\001', dest=0x170e700, altdest=0x170e700, completionTag=0x7fff84dcaee0 "") at pquery.c:881#17 0x000000000073ae2d in exec_simple_query (query_string=0x170d9e0 "COMMIT PREPARED 'T24965'") at postgres.c:1142#18 0x000000000073f1f0 in PostgresMain (argc=2, argv=0x16f55a8, username=0x16f5430 "zhangzl") at postgres.c:4212#19 0x00000000006eb06a in BackendRun (port=0x17186b0) at postmaster.c:3803#20 0x00000000006ea759 in BackendStartup (port=0x17186b0) at postmaster.c:3488#21 0x00000000006e7513 in ServerLoop () at postmaster.c:1466#22 0x00000000006e6f1c in PostmasterMain (argc=5, argv=0x16f3610) at postmaster.c:1226#23 0x0000000000650bbd in main (argc=5, argv=0x16f3610) at main.c:199 Please describe a way to repeat the problem. Please try to provide a concise reproducible example, if at all possible: ---------------------------------------------------------------------- Run a TPCC test tool named HammerDB, create a workload with 100 warehouses, run tpcc tests with 20 users. If you know how this problem might be fixed, list the solution below: --------------------------------------------------------------------- According to the stack of the session which holds the TwoPhaseStateLock, the error is in the function of LockGXact:static GlobalTransaction LockGXact(const char *gid, Oid user) { ...... LWLockAcquire(TwoPhaseStateLock, LW_EXCLUSIVE); for (i = 0; i < TwoPhaseState->numPrepXacts; i++) { ...... gxact->locking_xid = GetTopTransactionId(); LWLockRelease(TwoPhaseStateLock); return gxact; } LWLockRelease(TwoPhaseStateLock); ...... } GetTopTransactionId() is blocked by acquiring another lock, but it is invoked between the TwoPhaseStatLock's Acquire and Release. To fix it, I just call GetTopTransactionId() before "LWLockAcquire(TwoPhaseStateLock, LW_EXCLUSIVE)" to enable the TopTransactionId can be got directly later. diff --git a/src/backend/access/transam/twophase.c b/src/backend/access/transam/twophase.c index c39d9e6..1312e88 100644 --- a/src/backend/access/transam/twophase.c +++ b/src/backend/access/transam/twophase.c @@ -414,6 +414,8 @@ LockGXact(const char *gid, Oid user) { int i; + GetTopTransactionId(); + LWLockAcquire(TwoPhaseStateLock, LW_EXCLUSIVE); for (i = 0; i < TwoPhaseState->numPrepXacts; i++) Any commiter can help to review it and commit it to the Github master branch? ThanksJulian |