|
From: Ashutosh B. <ash...@en...> - 2014-01-09 03:43:16
|
Hi Julian,
Can you please provide and patch, which fixes this problem?
On Thu, Jan 9, 2014 at 8:09 AM, 张仲良 <jul...@ou...> wrote:
> Your name : Julian
>
> Your email address : jul...@ou...
>
>
>
>
>
> System Configuration:
>
> ---------------------
>
> Architecture (example: Intel Pentium) : Intel Pentium
>
>
>
> Operating System (example: Linux 2.4.18) : 2.6.32-358.el6.x86_64
>
>
>
> Postgres-XC version (example: Postgres-XC 1.1devel): Github master
>
>
>
> Compiler used (example: gcc 3.3.5) : gcc (GCC) 4.4.7
> 20120313 (Red Hat 4.4.7-3)
>
>
>
>
>
> Please enter a FULL description of your problem:
>
> ------------------------------------------------
>
> When testing tpcc (100 warehouses) on PGXC using HammerDB with 20
> concurrent users, about 2 minutes later, all the sessions are blocked by
> acquiring the TwoPhaseStateLock:
>
> execute direct on (datanode2) $$select * from pg_stat_activity where state
> != 'idle' order by query_start$$;
> datid | pid | query_start
> | query
> -------+-------+------------------------------+
> --------------------------------------------------------------------------
> 16384 | 19392 |2014-01-08 14:24:24.274622+08 | autovacuum: VACUUM ANALYZE
> public.stock
> 16384 | 19384 |2014-01-08 14:25:49.073815+08 | PREPARE TRANSACTION
> 'T27146'
> 16384 | 19383 |2014-01-08 14:25:49.084483+08 | COMMIT PREPARED 'T27077'
> 16384 | 19382 |2014-01-08 14:25:49.087827+08 | COMMIT PREPARED 'T27052'
> 16384 | 19385 |2014-01-08 14:25:49.109279+08 | COMMIT PREPARED 'T27118'
> 16384 | 19373 |2014-01-08 14:25:49.114323+08 | COMMIT PREPARED 'T27111'
> 16384 | 19372 |2014-01-08 14:25:49.114784+08 | COMMIT PREPARED 'T27063'
> 16384 | 19376 |2014-01-08 14:25:49.131651+08 | COMMIT PREPARED 'T27102'
> 16384 | 19371 |2014-01-08 14:25:49.147467+08 | COMMIT PREPARED 'T27023'
> 16384 | 19374 |2014-01-08 14:25:49.156297+08 | COMMIT PREPARED 'T27123'
> 16384 | 19386 |2014-01-08 14:25:49.168084+08 | COMMIT PREPARED 'T27128'
> 16384 | 19389 |2014-01-08 14:25:49.179543+08 | PREPARE TRANSACTION
> 'T27161'
> 16384 | 19380 |2014-01-08 14:25:49.222886+08 | COMMIT PREPARED 'T27083'
> 16384 | 19377 |2014-01-08 14:25:49.373674+08 | PREPARE TRANSACTION
> 'T27178'
> 16384 | 19388 |2014-01-08 14:25:49.386222+08 | PREPARE TRANSACTION
> 'T27180'
> 16384 | 19378 |2014-01-08 14:25:49.493811+08 | PREPARE TRANSACTION
> 'T27176'
> 16384 | 19381 |2014-01-08 14:25:49.662885+08 | PREPARE TRANSACTION
> 'T27148'
> 16384 | 19375 |2014-01-08 14:25:49.680977+08 | PREPARE TRANSACTION
> 'T27156'
> 16384 | 19387 |2014-01-08 14:25:49.744282+08 | PREPARE TRANSACTION
> 'T27157'
> 16384 | 19370 |2014-01-08 14:25:49.7463+08 | PREPARE TRANSACTION
> 'T27173'
> 16384 | 19379 |2014-01-08 14:25:49.866666+08 | PREPARE TRANSACTION
> 'T27171'
> 16384 | 18687 |2014-01-08 14:30:46.506894+08 | select * from
> pg_stat_activity where state != 'idle' order by query_start
> (22 rows)
>
> One of the sessions has the stack as below:
> #0 0x0000003ce42eaf37 in semop () from /lib64/libc.so.6
> #1 0x00000000006d9d7a in PGSemaphoreLock (sema=0x7faf6078c490,
> interruptOK=0 '\000') at pg_sema.c:415
> #2 0x000000000072dce3 in LWLockAcquire (lockid=TwoPhaseStateLock,
> mode=LW_EXCLUSIVE) at lwlock.c:474
> #3 0x00000000004adae4 in MarkAsPreparing (xid=69984, gid=0xe9f060
> "T69983", prepared_at=442376096193723, owner=10, databaseid=16450) at
> twophase.c:267
> #4 0x00000000004a567a in PrepareTransaction () at xact.c:2684
> #5 0x00000000004a5e55 in CommitTransactionCommand () at xact.c:3248
> #6 0x000000000073d025 in finish_xact_command () at postgres.c:2551
> #7 0x000000000073ac89 in exec_simple_query (query_string=0xd7bc60
> "PREPARE TRANSACTION 'T69983'") at postgres.c:1159
> #8 0x000000000073f018 in PostgresMain (argc=2, argv=0xd63828,
> username=0xd636b0 "zhangzl") at postgres.c:4212
> #9 0x00000000006eafca in BackendRun (port=0xd86960) at postmaster.c:3803
> #10 0x00000000006ea6b9 in BackendStartup (port=0xd86960) at
> postmaster.c:3488
> #11 0x00000000006e7473 in ServerLoop () at postmaster.c:1466
> #12 0x00000000006e6e7c in PostmasterMain (argc=5, argv=0xd61870) at
> postmaster.c:1226
> #13 0x0000000000650b1d in main (argc=5, argv=0xd61870) at main.c:199
>
>
>
> But the session holding the TwoPhaseStatLock is blocked at:
> (gdb) bt
> #0 0x0000003ce42eaf37 in semop () from /lib64/libc.so.6
> #1 0x00000000006d9e1a in PGSemaphoreLock (sema=0x7f103f857b90,
> interruptOK=1 '\001') at pg_sema.c:415
> #2 0x000000000072b15d in ProcSleep (locallock=0x1712650,
> lockMethodTable=0x9dcb20) at proc.c:1086
> #3 0x0000000000726566 in WaitOnLock (locallock=0x1712650,
> owner=0x179ed40) at lock.c:1537
> #4 0x00000000007258be in LockAcquireExtendedXC (locktag=0x7fff84dca5b0,
> lockmode=7, sessionLock=0 '\000', dontWait=0 '\000', reportMemoryError=1
> '\001', only_increment=0 '\000') at lock.c:914
> #5 0x0000000000724fcd in LockAcquireExtended (locktag=0x7fff84dca5b0,
> lockmode=7, sessionLock=0 '\000', dontWait=0 '\000', reportMemoryError=1
> '\001') at lock.c:616
> #6 0x0000000000724f36 in LockAcquire (locktag=0x7fff84dca5b0, lockmode=7,
> sessionLock=0 '\000', dontWait=0 '\000') at lock.c:575
> #7 0x0000000000724665 in XactLockTableInsert (xid=25010) at lmgr.c:433
> #8 0x00000000004a390c in AssignTransactionId (s=0xcbd9c0) at xact.c:619
> #9 0x00000000004a3657 in GetTopTransactionId () at xact.c:429
> #10 0x00000000004ae2e7 in LockGXact (gid=0x170e3a8 "T24965", user=10) at
> twophase.c:460
> #11 0x00000000004afa05 in FinishPreparedTransaction (gid=0x170e3a8
> "T24965", isCommit=1 '\001') at twophase.c:1298
> #12 0x0000000000742fa6 in standard_ProcessUtility (parsetree=0x170e3c0,
> queryString=0x170d9e0 "COMMIT PREPARED 'T24965'", params=0x0, isTopLevel=1
> '\001', dest=0x170e700, sentToRemote=0 '\000', completionTag=0x7fff84dcaee0
> "") at utility.c:520
> #13 0x0000000000742c10 in ProcessUtility (parsetree=0x170e3c0,
> queryString=0x170d9e0 "COMMIT PREPARED 'T24965'", params=0x0, isTopLevel=1
> '\001', dest=0x170e700, sentToRemote=0 '\000', completionTag=0x7fff84dcaee0
> "") at utility.c:377
> #14 0x0000000000741b9b in PortalRunUtility (portal=0x1713330,
> utilityStmt=0x170e3c0, isTopLevel=1 '\001', dest=0x170e700,
> completionTag=0x7fff84dcaee0 "") at pquery.c:1284
> #15 0x0000000000741dc8 in PortalRunMulti (portal=0x1713330, isTopLevel=1
> '\001', dest=0x170e700, altdest=0x170e700, completionTag=0x7fff84dcaee0 "")
> at pquery.c:1431
> #16 0x000000000074126f in PortalRun (portal=0x1713330,
> count=9223372036854775807, isTopLevel=1 '\001', dest=0x170e700,
> altdest=0x170e700, completionTag=0x7fff84dcaee0 "") at pquery.c:881
> #17 0x000000000073ae2d in exec_simple_query (query_string=0x170d9e0
> "COMMIT PREPARED 'T24965'") at postgres.c:1142
> #18 0x000000000073f1f0 in PostgresMain (argc=2, argv=0x16f55a8,
> username=0x16f5430 "zhangzl") at postgres.c:4212
> #19 0x00000000006eb06a in BackendRun (port=0x17186b0) at postmaster.c:3803
> #20 0x00000000006ea759 in BackendStartup (port=0x17186b0) at
> postmaster.c:3488
> #21 0x00000000006e7513 in ServerLoop () at postmaster.c:1466
> #22 0x00000000006e6f1c in PostmasterMain (argc=5, argv=0x16f3610) at
> postmaster.c:1226
> #23 0x0000000000650bbd in main (argc=5, argv=0x16f3610) at main.c:199
>
>
>
>
>
> Please describe a way to repeat the problem. Please try to provide a
>
> concise reproducible example, if at all possible:
>
> ----------------------------------------------------------------------
>
> Run a TPCC test tool named HammerDB, create a workload with 100
> warehouses, run tpcc tests with 20 users.
>
>
>
>
>
>
>
>
> If you know how this problem might be fixed, list the solution below:
>
> ---------------------------------------------------------------------
>
> According to the stack of the session which holds the TwoPhaseStateLock,
> the error is in the function of
> LockGXact:
>
> static GlobalTransaction
> LockGXact(const char *gid, Oid user)
> {
> ......
> LWLockAcquire(TwoPhaseStateLock, LW_EXCLUSIVE);
>
> for (i = 0; i < TwoPhaseState->numPrepXacts; i++)
> {
> ......
> gxact->locking_xid = GetTopTransactionId();
>
> LWLockRelease(TwoPhaseStateLock);
>
> return gxact;
> }
>
> LWLockRelease(TwoPhaseStateLock);
> ......
> }
>
>
> GetTopTransactionId() is blocked by acquiring another lock, but it is
> invoked between the TwoPhaseStatLock's Acquire and Release.
>
> To fix it, I just call GetTopTransactionId() before "LWLockAcquire(TwoPhaseStateLock,
> LW_EXCLUSIVE)" to enable the TopTransactionId can be got directly later.
>
> diff --git a/src/backend/access/transam/twophase.c
> b/src/backend/access/transam/twophase.c
> index c39d9e6..1312e88 100644
> --- a/src/backend/access/transam/twophase.c
> +++ b/src/backend/access/transam/twophase.c
> @@ -414,6 +414,8 @@ LockGXact(const char *gid, Oid user)
> {
> int i;
>
> + GetTopTransactionId();
> +
> LWLockAcquire(TwoPhaseStateLock, LW_EXCLUSIVE);
>
> for (i = 0; i < TwoPhaseState->numPrepXacts; i++)
>
>
> Any commiter can help to review it and commit it to the Github master
> branch?
>
>
> Thanks
>
> Julian
>
>
>
>
> ------------------------------------------------------------------------------
> CenturyLink Cloud: The Leader in Enterprise Cloud Services.
> Learn Why More Businesses Are Choosing CenturyLink Cloud For
> Critical Workloads, Development Environments & Everything In Between.
> Get a Quote or Start a Free Trial Today.
>
> http://pubads.g.doubleclick.net/gampad/clk?id=119420431&iu=/4140/ostg.clktrk
> _______________________________________________
> Postgres-xc-bugs mailing list
> Pos...@li...
> https://lists.sourceforge.net/lists/listinfo/postgres-xc-bugs
>
>
--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company
|