| 
      
      
      From: 张仲良 <jul...@ou...> - 2014-01-09 02:39:58
      
     | 
| 
Your
name                :
Julian
Your email
address        : jul...@ou...
 
 
System
Configuration:
---------------------
  Architecture
(example: Intel
Pentium)                :  Intel
Pentium
 
  Operating
System (example: Linux 2.4.18)        : 2.6.32-358.el6.x86_64
 
  Postgres-XC
version (example: Postgres-XC 1.1devel): Github master
 
  Compiler
used (example: gcc
3.3.5)                :  gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-3)
 
 
Please enter a
FULL description of your problem:
------------------------------------------------
 When testing tpcc (100 warehouses) on PGXC using HammerDB with 20 concurrent users, about 2 minutes later, all the sessions are blocked by acquiring the TwoPhaseStateLock:execute direct on (datanode2) $$select * from pg_stat_activity where state != 'idle' order by query_start$$;                                
 datid |  pid  |         query_start          |                                   query                                   
-------+-------+------------------------------+ --------------------------------------------------------------------------
 16384 | 19392 |2014-01-08 14:24:24.274622+08 | autovacuum: VACUUM ANALYZE public.stock
 16384 | 19384 |2014-01-08 14:25:49.073815+08 | PREPARE TRANSACTION 'T27146'
 16384 | 19383 |2014-01-08 14:25:49.084483+08 | COMMIT PREPARED 'T27077'
 16384 | 19382 |2014-01-08 14:25:49.087827+08 | COMMIT PREPARED 'T27052'
 16384 | 19385 |2014-01-08 14:25:49.109279+08 | COMMIT PREPARED 'T27118'
 16384 | 19373 |2014-01-08 14:25:49.114323+08 | COMMIT PREPARED 'T27111'
 16384 | 19372 |2014-01-08 14:25:49.114784+08 | COMMIT PREPARED 'T27063'
 16384 | 19376 |2014-01-08 14:25:49.131651+08 | COMMIT PREPARED 'T27102'
 16384 | 19371 |2014-01-08 14:25:49.147467+08 | COMMIT PREPARED 'T27023'
 16384 | 19374 |2014-01-08 14:25:49.156297+08 | COMMIT PREPARED 'T27123'
 16384 | 19386 |2014-01-08 14:25:49.168084+08 | COMMIT PREPARED 'T27128'
 16384 | 19389 |2014-01-08 14:25:49.179543+08 | PREPARE TRANSACTION 'T27161'
 16384 | 19380 |2014-01-08 14:25:49.222886+08 | COMMIT PREPARED 'T27083'
 16384 | 19377 |2014-01-08 14:25:49.373674+08 | PREPARE TRANSACTION 'T27178'
 16384 | 19388 |2014-01-08 14:25:49.386222+08 | PREPARE TRANSACTION 'T27180'
 16384 | 19378 |2014-01-08 14:25:49.493811+08 | PREPARE TRANSACTION 'T27176'
 16384 | 19381 |2014-01-08 14:25:49.662885+08 | PREPARE TRANSACTION 'T27148'
 16384 | 19375 |2014-01-08 14:25:49.680977+08 | PREPARE TRANSACTION 'T27156'
 16384 | 19387 |2014-01-08 14:25:49.744282+08 | PREPARE TRANSACTION 'T27157'
 16384 | 19370 |2014-01-08 14:25:49.7463+08   | PREPARE TRANSACTION 'T27173'
 16384 | 19379 |2014-01-08 14:25:49.866666+08 | PREPARE TRANSACTION 'T27171'
 16384 | 18687 |2014-01-08 14:30:46.506894+08 | select * from pg_stat_activity where state != 'idle' order by query_start
(22 rows)
One of the sessions has the stack as below:#0  0x0000003ce42eaf37 in semop () from /lib64/libc.so.6#1  0x00000000006d9d7a in PGSemaphoreLock (sema=0x7faf6078c490, interruptOK=0 '\000') at pg_sema.c:415#2  0x000000000072dce3 in LWLockAcquire (lockid=TwoPhaseStateLock, mode=LW_EXCLUSIVE) at lwlock.c:474#3  0x00000000004adae4 in MarkAsPreparing (xid=69984, gid=0xe9f060 "T69983", prepared_at=442376096193723, owner=10, databaseid=16450) at twophase.c:267#4  0x00000000004a567a in PrepareTransaction () at xact.c:2684#5  0x00000000004a5e55 in CommitTransactionCommand () at xact.c:3248#6  0x000000000073d025 in finish_xact_command () at postgres.c:2551#7  0x000000000073ac89 in exec_simple_query (query_string=0xd7bc60 "PREPARE TRANSACTION 'T69983'") at postgres.c:1159#8  0x000000000073f018 in PostgresMain (argc=2, argv=0xd63828, username=0xd636b0 "zhangzl") at postgres.c:4212#9  0x00000000006eafca in BackendRun (port=0xd86960) at postmaster.c:3803#10 0x00000000006ea6b9 in BackendStartup (port=0xd86960) at postmaster.c:3488#11 0x00000000006e7473 in ServerLoop () at postmaster.c:1466#12 0x00000000006e6e7c in PostmasterMain (argc=5, argv=0xd61870) at postmaster.c:1226#13 0x0000000000650b1d in main (argc=5, argv=0xd61870) at main.c:199
 
But the session holding the TwoPhaseStatLock is blocked at:(gdb) bt#0  0x0000003ce42eaf37 in semop () from /lib64/libc.so.6#1  0x00000000006d9e1a in PGSemaphoreLock (sema=0x7f103f857b90, interruptOK=1 '\001') at pg_sema.c:415#2  0x000000000072b15d in ProcSleep (locallock=0x1712650, lockMethodTable=0x9dcb20) at proc.c:1086#3  0x0000000000726566 in WaitOnLock (locallock=0x1712650, owner=0x179ed40) at lock.c:1537#4  0x00000000007258be in LockAcquireExtendedXC (locktag=0x7fff84dca5b0, lockmode=7, sessionLock=0 '\000', dontWait=0 '\000', reportMemoryError=1 '\001', only_increment=0 '\000') at lock.c:914#5  0x0000000000724fcd in LockAcquireExtended (locktag=0x7fff84dca5b0, lockmode=7, sessionLock=0 '\000', dontWait=0 '\000', reportMemoryError=1 '\001') at lock.c:616#6  0x0000000000724f36 in LockAcquire (locktag=0x7fff84dca5b0, lockmode=7, sessionLock=0 '\000', dontWait=0 '\000') at lock.c:575#7  0x0000000000724665 in XactLockTableInsert (xid=25010) at lmgr.c:433#8  0x00000000004a390c in AssignTransactionId (s=0xcbd9c0) at xact.c:619#9  0x00000000004a3657 in GetTopTransactionId () at xact.c:429#10 0x00000000004ae2e7 in LockGXact (gid=0x170e3a8 "T24965", user=10) at twophase.c:460#11 0x00000000004afa05 in FinishPreparedTransaction (gid=0x170e3a8 "T24965", isCommit=1 '\001') at twophase.c:1298#12 0x0000000000742fa6 in standard_ProcessUtility (parsetree=0x170e3c0, queryString=0x170d9e0 "COMMIT PREPARED 'T24965'", params=0x0, isTopLevel=1 '\001', dest=0x170e700, sentToRemote=0 '\000', completionTag=0x7fff84dcaee0 "") at utility.c:520#13 0x0000000000742c10 in ProcessUtility (parsetree=0x170e3c0, queryString=0x170d9e0 "COMMIT PREPARED 'T24965'", params=0x0, isTopLevel=1 '\001', dest=0x170e700, sentToRemote=0 '\000', completionTag=0x7fff84dcaee0 "") at utility.c:377#14 0x0000000000741b9b in PortalRunUtility (portal=0x1713330, utilityStmt=0x170e3c0, isTopLevel=1 '\001', dest=0x170e700, completionTag=0x7fff84dcaee0 "") at pquery.c:1284#15 0x0000000000741dc8 in PortalRunMulti (portal=0x1713330, isTopLevel=1 '\001', dest=0x170e700, altdest=0x170e700, completionTag=0x7fff84dcaee0 "") at pquery.c:1431#16 0x000000000074126f in PortalRun (portal=0x1713330, count=9223372036854775807, isTopLevel=1 '\001', dest=0x170e700, altdest=0x170e700, completionTag=0x7fff84dcaee0 "") at pquery.c:881#17 0x000000000073ae2d in exec_simple_query (query_string=0x170d9e0 "COMMIT PREPARED 'T24965'") at postgres.c:1142#18 0x000000000073f1f0 in PostgresMain (argc=2, argv=0x16f55a8, username=0x16f5430 "zhangzl") at postgres.c:4212#19 0x00000000006eb06a in BackendRun (port=0x17186b0) at postmaster.c:3803#20 0x00000000006ea759 in BackendStartup (port=0x17186b0) at postmaster.c:3488#21 0x00000000006e7513 in ServerLoop () at postmaster.c:1466#22 0x00000000006e6f1c in PostmasterMain (argc=5, argv=0x16f3610) at postmaster.c:1226#23 0x0000000000650bbd in main (argc=5, argv=0x16f3610) at main.c:199
 
 
Please describe a
way to repeat the problem. Please try to provide a
concise
reproducible example, if at all possible:
----------------------------------------------------------------------
 Run a TPCC test tool named HammerDB, create a workload with 100 warehouses, run tpcc tests with 20 users.  
 
 
 
If you know how
this problem might be fixed, list the solution below:
---------------------------------------------------------------------
According to the stack of the session which holds the TwoPhaseStateLock, the error is in the function of 
LockGXact:static GlobalTransaction
LockGXact(const char *gid, Oid user)
{
    ......
    LWLockAcquire(TwoPhaseStateLock, LW_EXCLUSIVE);
    for (i = 0; i < TwoPhaseState->numPrepXacts; i++)
    {
        ......
        gxact->locking_xid = GetTopTransactionId(); 
        LWLockRelease(TwoPhaseStateLock);
        return gxact;
    }
    LWLockRelease(TwoPhaseStateLock);
    ......
}
GetTopTransactionId() is blocked by acquiring another lock, but it is invoked between the TwoPhaseStatLock's Acquire and Release. 
To fix it, I just call GetTopTransactionId() before "LWLockAcquire(TwoPhaseStateLock, LW_EXCLUSIVE)" to enable the TopTransactionId can be got directly later. 
diff --git a/src/backend/access/transam/twophase.c b/src/backend/access/transam/twophase.c
index c39d9e6..1312e88 100644
--- a/src/backend/access/transam/twophase.c
+++ b/src/backend/access/transam/twophase.c
@@ -414,6 +414,8 @@ LockGXact(const char *gid, Oid user)
 {
     int            i;
 
+    GetTopTransactionId();
+
     LWLockAcquire(TwoPhaseStateLock, LW_EXCLUSIVE);
 
     for (i = 0; i < TwoPhaseState->numPrepXacts; i++)
Any commiter can help to review it and commit it to the Github master branch? 
ThanksJulian
 		 	   		   |