From: Ashutosh B. <ash...@en...> - 2014-01-09 03:43:16
|
Hi Julian, Can you please provide and patch, which fixes this problem? On Thu, Jan 9, 2014 at 8:09 AM, 张仲良 <jul...@ou...> wrote: > Your name : Julian > > Your email address : jul...@ou... > > > > > > System Configuration: > > --------------------- > > Architecture (example: Intel Pentium) : Intel Pentium > > > > Operating System (example: Linux 2.4.18) : 2.6.32-358.el6.x86_64 > > > > Postgres-XC version (example: Postgres-XC 1.1devel): Github master > > > > Compiler used (example: gcc 3.3.5) : gcc (GCC) 4.4.7 > 20120313 (Red Hat 4.4.7-3) > > > > > > Please enter a FULL description of your problem: > > ------------------------------------------------ > > When testing tpcc (100 warehouses) on PGXC using HammerDB with 20 > concurrent users, about 2 minutes later, all the sessions are blocked by > acquiring the TwoPhaseStateLock: > > execute direct on (datanode2) $$select * from pg_stat_activity where state > != 'idle' order by query_start$$; > datid | pid | query_start > | query > -------+-------+------------------------------+ > -------------------------------------------------------------------------- > 16384 | 19392 |2014-01-08 14:24:24.274622+08 | autovacuum: VACUUM ANALYZE > public.stock > 16384 | 19384 |2014-01-08 14:25:49.073815+08 | PREPARE TRANSACTION > 'T27146' > 16384 | 19383 |2014-01-08 14:25:49.084483+08 | COMMIT PREPARED 'T27077' > 16384 | 19382 |2014-01-08 14:25:49.087827+08 | COMMIT PREPARED 'T27052' > 16384 | 19385 |2014-01-08 14:25:49.109279+08 | COMMIT PREPARED 'T27118' > 16384 | 19373 |2014-01-08 14:25:49.114323+08 | COMMIT PREPARED 'T27111' > 16384 | 19372 |2014-01-08 14:25:49.114784+08 | COMMIT PREPARED 'T27063' > 16384 | 19376 |2014-01-08 14:25:49.131651+08 | COMMIT PREPARED 'T27102' > 16384 | 19371 |2014-01-08 14:25:49.147467+08 | COMMIT PREPARED 'T27023' > 16384 | 19374 |2014-01-08 14:25:49.156297+08 | COMMIT PREPARED 'T27123' > 16384 | 19386 |2014-01-08 14:25:49.168084+08 | COMMIT PREPARED 'T27128' > 16384 | 19389 |2014-01-08 14:25:49.179543+08 | PREPARE TRANSACTION > 'T27161' > 16384 | 19380 |2014-01-08 14:25:49.222886+08 | COMMIT PREPARED 'T27083' > 16384 | 19377 |2014-01-08 14:25:49.373674+08 | PREPARE TRANSACTION > 'T27178' > 16384 | 19388 |2014-01-08 14:25:49.386222+08 | PREPARE TRANSACTION > 'T27180' > 16384 | 19378 |2014-01-08 14:25:49.493811+08 | PREPARE TRANSACTION > 'T27176' > 16384 | 19381 |2014-01-08 14:25:49.662885+08 | PREPARE TRANSACTION > 'T27148' > 16384 | 19375 |2014-01-08 14:25:49.680977+08 | PREPARE TRANSACTION > 'T27156' > 16384 | 19387 |2014-01-08 14:25:49.744282+08 | PREPARE TRANSACTION > 'T27157' > 16384 | 19370 |2014-01-08 14:25:49.7463+08 | PREPARE TRANSACTION > 'T27173' > 16384 | 19379 |2014-01-08 14:25:49.866666+08 | PREPARE TRANSACTION > 'T27171' > 16384 | 18687 |2014-01-08 14:30:46.506894+08 | select * from > pg_stat_activity where state != 'idle' order by query_start > (22 rows) > > One of the sessions has the stack as below: > #0 0x0000003ce42eaf37 in semop () from /lib64/libc.so.6 > #1 0x00000000006d9d7a in PGSemaphoreLock (sema=0x7faf6078c490, > interruptOK=0 '\000') at pg_sema.c:415 > #2 0x000000000072dce3 in LWLockAcquire (lockid=TwoPhaseStateLock, > mode=LW_EXCLUSIVE) at lwlock.c:474 > #3 0x00000000004adae4 in MarkAsPreparing (xid=69984, gid=0xe9f060 > "T69983", prepared_at=442376096193723, owner=10, databaseid=16450) at > twophase.c:267 > #4 0x00000000004a567a in PrepareTransaction () at xact.c:2684 > #5 0x00000000004a5e55 in CommitTransactionCommand () at xact.c:3248 > #6 0x000000000073d025 in finish_xact_command () at postgres.c:2551 > #7 0x000000000073ac89 in exec_simple_query (query_string=0xd7bc60 > "PREPARE TRANSACTION 'T69983'") at postgres.c:1159 > #8 0x000000000073f018 in PostgresMain (argc=2, argv=0xd63828, > username=0xd636b0 "zhangzl") at postgres.c:4212 > #9 0x00000000006eafca in BackendRun (port=0xd86960) at postmaster.c:3803 > #10 0x00000000006ea6b9 in BackendStartup (port=0xd86960) at > postmaster.c:3488 > #11 0x00000000006e7473 in ServerLoop () at postmaster.c:1466 > #12 0x00000000006e6e7c in PostmasterMain (argc=5, argv=0xd61870) at > postmaster.c:1226 > #13 0x0000000000650b1d in main (argc=5, argv=0xd61870) at main.c:199 > > > > But the session holding the TwoPhaseStatLock is blocked at: > (gdb) bt > #0 0x0000003ce42eaf37 in semop () from /lib64/libc.so.6 > #1 0x00000000006d9e1a in PGSemaphoreLock (sema=0x7f103f857b90, > interruptOK=1 '\001') at pg_sema.c:415 > #2 0x000000000072b15d in ProcSleep (locallock=0x1712650, > lockMethodTable=0x9dcb20) at proc.c:1086 > #3 0x0000000000726566 in WaitOnLock (locallock=0x1712650, > owner=0x179ed40) at lock.c:1537 > #4 0x00000000007258be in LockAcquireExtendedXC (locktag=0x7fff84dca5b0, > lockmode=7, sessionLock=0 '\000', dontWait=0 '\000', reportMemoryError=1 > '\001', only_increment=0 '\000') at lock.c:914 > #5 0x0000000000724fcd in LockAcquireExtended (locktag=0x7fff84dca5b0, > lockmode=7, sessionLock=0 '\000', dontWait=0 '\000', reportMemoryError=1 > '\001') at lock.c:616 > #6 0x0000000000724f36 in LockAcquire (locktag=0x7fff84dca5b0, lockmode=7, > sessionLock=0 '\000', dontWait=0 '\000') at lock.c:575 > #7 0x0000000000724665 in XactLockTableInsert (xid=25010) at lmgr.c:433 > #8 0x00000000004a390c in AssignTransactionId (s=0xcbd9c0) at xact.c:619 > #9 0x00000000004a3657 in GetTopTransactionId () at xact.c:429 > #10 0x00000000004ae2e7 in LockGXact (gid=0x170e3a8 "T24965", user=10) at > twophase.c:460 > #11 0x00000000004afa05 in FinishPreparedTransaction (gid=0x170e3a8 > "T24965", isCommit=1 '\001') at twophase.c:1298 > #12 0x0000000000742fa6 in standard_ProcessUtility (parsetree=0x170e3c0, > queryString=0x170d9e0 "COMMIT PREPARED 'T24965'", params=0x0, isTopLevel=1 > '\001', dest=0x170e700, sentToRemote=0 '\000', completionTag=0x7fff84dcaee0 > "") at utility.c:520 > #13 0x0000000000742c10 in ProcessUtility (parsetree=0x170e3c0, > queryString=0x170d9e0 "COMMIT PREPARED 'T24965'", params=0x0, isTopLevel=1 > '\001', dest=0x170e700, sentToRemote=0 '\000', completionTag=0x7fff84dcaee0 > "") at utility.c:377 > #14 0x0000000000741b9b in PortalRunUtility (portal=0x1713330, > utilityStmt=0x170e3c0, isTopLevel=1 '\001', dest=0x170e700, > completionTag=0x7fff84dcaee0 "") at pquery.c:1284 > #15 0x0000000000741dc8 in PortalRunMulti (portal=0x1713330, isTopLevel=1 > '\001', dest=0x170e700, altdest=0x170e700, completionTag=0x7fff84dcaee0 "") > at pquery.c:1431 > #16 0x000000000074126f in PortalRun (portal=0x1713330, > count=9223372036854775807, isTopLevel=1 '\001', dest=0x170e700, > altdest=0x170e700, completionTag=0x7fff84dcaee0 "") at pquery.c:881 > #17 0x000000000073ae2d in exec_simple_query (query_string=0x170d9e0 > "COMMIT PREPARED 'T24965'") at postgres.c:1142 > #18 0x000000000073f1f0 in PostgresMain (argc=2, argv=0x16f55a8, > username=0x16f5430 "zhangzl") at postgres.c:4212 > #19 0x00000000006eb06a in BackendRun (port=0x17186b0) at postmaster.c:3803 > #20 0x00000000006ea759 in BackendStartup (port=0x17186b0) at > postmaster.c:3488 > #21 0x00000000006e7513 in ServerLoop () at postmaster.c:1466 > #22 0x00000000006e6f1c in PostmasterMain (argc=5, argv=0x16f3610) at > postmaster.c:1226 > #23 0x0000000000650bbd in main (argc=5, argv=0x16f3610) at main.c:199 > > > > > > Please describe a way to repeat the problem. Please try to provide a > > concise reproducible example, if at all possible: > > ---------------------------------------------------------------------- > > Run a TPCC test tool named HammerDB, create a workload with 100 > warehouses, run tpcc tests with 20 users. > > > > > > > > > If you know how this problem might be fixed, list the solution below: > > --------------------------------------------------------------------- > > According to the stack of the session which holds the TwoPhaseStateLock, > the error is in the function of > LockGXact: > > static GlobalTransaction > LockGXact(const char *gid, Oid user) > { > ...... > LWLockAcquire(TwoPhaseStateLock, LW_EXCLUSIVE); > > for (i = 0; i < TwoPhaseState->numPrepXacts; i++) > { > ...... > gxact->locking_xid = GetTopTransactionId(); > > LWLockRelease(TwoPhaseStateLock); > > return gxact; > } > > LWLockRelease(TwoPhaseStateLock); > ...... > } > > > GetTopTransactionId() is blocked by acquiring another lock, but it is > invoked between the TwoPhaseStatLock's Acquire and Release. > > To fix it, I just call GetTopTransactionId() before "LWLockAcquire(TwoPhaseStateLock, > LW_EXCLUSIVE)" to enable the TopTransactionId can be got directly later. > > diff --git a/src/backend/access/transam/twophase.c > b/src/backend/access/transam/twophase.c > index c39d9e6..1312e88 100644 > --- a/src/backend/access/transam/twophase.c > +++ b/src/backend/access/transam/twophase.c > @@ -414,6 +414,8 @@ LockGXact(const char *gid, Oid user) > { > int i; > > + GetTopTransactionId(); > + > LWLockAcquire(TwoPhaseStateLock, LW_EXCLUSIVE); > > for (i = 0; i < TwoPhaseState->numPrepXacts; i++) > > > Any commiter can help to review it and commit it to the Github master > branch? > > > Thanks > > Julian > > > > > ------------------------------------------------------------------------------ > CenturyLink Cloud: The Leader in Enterprise Cloud Services. > Learn Why More Businesses Are Choosing CenturyLink Cloud For > Critical Workloads, Development Environments & Everything In Between. > Get a Quote or Start a Free Trial Today. > > http://pubads.g.doubleclick.net/gampad/clk?id=119420431&iu=/4140/ostg.clktrk > _______________________________________________ > Postgres-xc-bugs mailing list > Pos...@li... > https://lists.sourceforge.net/lists/listinfo/postgres-xc-bugs > > -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company |