|
From: 张仲良 <jul...@ou...> - 2014-01-09 02:39:58
|
Your
name :
Julian
Your email
address : jul...@ou...
System
Configuration:
---------------------
Architecture
(example: Intel
Pentium) : Intel
Pentium
Operating
System (example: Linux 2.4.18) : 2.6.32-358.el6.x86_64
Postgres-XC
version (example: Postgres-XC 1.1devel): Github master
Compiler
used (example: gcc
3.3.5) : gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-3)
Please enter a
FULL description of your problem:
------------------------------------------------
When testing tpcc (100 warehouses) on PGXC using HammerDB with 20 concurrent users, about 2 minutes later, all the sessions are blocked by acquiring the TwoPhaseStateLock:execute direct on (datanode2) $$select * from pg_stat_activity where state != 'idle' order by query_start$$;
datid | pid | query_start | query
-------+-------+------------------------------+ --------------------------------------------------------------------------
16384 | 19392 |2014-01-08 14:24:24.274622+08 | autovacuum: VACUUM ANALYZE public.stock
16384 | 19384 |2014-01-08 14:25:49.073815+08 | PREPARE TRANSACTION 'T27146'
16384 | 19383 |2014-01-08 14:25:49.084483+08 | COMMIT PREPARED 'T27077'
16384 | 19382 |2014-01-08 14:25:49.087827+08 | COMMIT PREPARED 'T27052'
16384 | 19385 |2014-01-08 14:25:49.109279+08 | COMMIT PREPARED 'T27118'
16384 | 19373 |2014-01-08 14:25:49.114323+08 | COMMIT PREPARED 'T27111'
16384 | 19372 |2014-01-08 14:25:49.114784+08 | COMMIT PREPARED 'T27063'
16384 | 19376 |2014-01-08 14:25:49.131651+08 | COMMIT PREPARED 'T27102'
16384 | 19371 |2014-01-08 14:25:49.147467+08 | COMMIT PREPARED 'T27023'
16384 | 19374 |2014-01-08 14:25:49.156297+08 | COMMIT PREPARED 'T27123'
16384 | 19386 |2014-01-08 14:25:49.168084+08 | COMMIT PREPARED 'T27128'
16384 | 19389 |2014-01-08 14:25:49.179543+08 | PREPARE TRANSACTION 'T27161'
16384 | 19380 |2014-01-08 14:25:49.222886+08 | COMMIT PREPARED 'T27083'
16384 | 19377 |2014-01-08 14:25:49.373674+08 | PREPARE TRANSACTION 'T27178'
16384 | 19388 |2014-01-08 14:25:49.386222+08 | PREPARE TRANSACTION 'T27180'
16384 | 19378 |2014-01-08 14:25:49.493811+08 | PREPARE TRANSACTION 'T27176'
16384 | 19381 |2014-01-08 14:25:49.662885+08 | PREPARE TRANSACTION 'T27148'
16384 | 19375 |2014-01-08 14:25:49.680977+08 | PREPARE TRANSACTION 'T27156'
16384 | 19387 |2014-01-08 14:25:49.744282+08 | PREPARE TRANSACTION 'T27157'
16384 | 19370 |2014-01-08 14:25:49.7463+08 | PREPARE TRANSACTION 'T27173'
16384 | 19379 |2014-01-08 14:25:49.866666+08 | PREPARE TRANSACTION 'T27171'
16384 | 18687 |2014-01-08 14:30:46.506894+08 | select * from pg_stat_activity where state != 'idle' order by query_start
(22 rows)
One of the sessions has the stack as below:#0 0x0000003ce42eaf37 in semop () from /lib64/libc.so.6#1 0x00000000006d9d7a in PGSemaphoreLock (sema=0x7faf6078c490, interruptOK=0 '\000') at pg_sema.c:415#2 0x000000000072dce3 in LWLockAcquire (lockid=TwoPhaseStateLock, mode=LW_EXCLUSIVE) at lwlock.c:474#3 0x00000000004adae4 in MarkAsPreparing (xid=69984, gid=0xe9f060 "T69983", prepared_at=442376096193723, owner=10, databaseid=16450) at twophase.c:267#4 0x00000000004a567a in PrepareTransaction () at xact.c:2684#5 0x00000000004a5e55 in CommitTransactionCommand () at xact.c:3248#6 0x000000000073d025 in finish_xact_command () at postgres.c:2551#7 0x000000000073ac89 in exec_simple_query (query_string=0xd7bc60 "PREPARE TRANSACTION 'T69983'") at postgres.c:1159#8 0x000000000073f018 in PostgresMain (argc=2, argv=0xd63828, username=0xd636b0 "zhangzl") at postgres.c:4212#9 0x00000000006eafca in BackendRun (port=0xd86960) at postmaster.c:3803#10 0x00000000006ea6b9 in BackendStartup (port=0xd86960) at postmaster.c:3488#11 0x00000000006e7473 in ServerLoop () at postmaster.c:1466#12 0x00000000006e6e7c in PostmasterMain (argc=5, argv=0xd61870) at postmaster.c:1226#13 0x0000000000650b1d in main (argc=5, argv=0xd61870) at main.c:199
But the session holding the TwoPhaseStatLock is blocked at:(gdb) bt#0 0x0000003ce42eaf37 in semop () from /lib64/libc.so.6#1 0x00000000006d9e1a in PGSemaphoreLock (sema=0x7f103f857b90, interruptOK=1 '\001') at pg_sema.c:415#2 0x000000000072b15d in ProcSleep (locallock=0x1712650, lockMethodTable=0x9dcb20) at proc.c:1086#3 0x0000000000726566 in WaitOnLock (locallock=0x1712650, owner=0x179ed40) at lock.c:1537#4 0x00000000007258be in LockAcquireExtendedXC (locktag=0x7fff84dca5b0, lockmode=7, sessionLock=0 '\000', dontWait=0 '\000', reportMemoryError=1 '\001', only_increment=0 '\000') at lock.c:914#5 0x0000000000724fcd in LockAcquireExtended (locktag=0x7fff84dca5b0, lockmode=7, sessionLock=0 '\000', dontWait=0 '\000', reportMemoryError=1 '\001') at lock.c:616#6 0x0000000000724f36 in LockAcquire (locktag=0x7fff84dca5b0, lockmode=7, sessionLock=0 '\000', dontWait=0 '\000') at lock.c:575#7 0x0000000000724665 in XactLockTableInsert (xid=25010) at lmgr.c:433#8 0x00000000004a390c in AssignTransactionId (s=0xcbd9c0) at xact.c:619#9 0x00000000004a3657 in GetTopTransactionId () at xact.c:429#10 0x00000000004ae2e7 in LockGXact (gid=0x170e3a8 "T24965", user=10) at twophase.c:460#11 0x00000000004afa05 in FinishPreparedTransaction (gid=0x170e3a8 "T24965", isCommit=1 '\001') at twophase.c:1298#12 0x0000000000742fa6 in standard_ProcessUtility (parsetree=0x170e3c0, queryString=0x170d9e0 "COMMIT PREPARED 'T24965'", params=0x0, isTopLevel=1 '\001', dest=0x170e700, sentToRemote=0 '\000', completionTag=0x7fff84dcaee0 "") at utility.c:520#13 0x0000000000742c10 in ProcessUtility (parsetree=0x170e3c0, queryString=0x170d9e0 "COMMIT PREPARED 'T24965'", params=0x0, isTopLevel=1 '\001', dest=0x170e700, sentToRemote=0 '\000', completionTag=0x7fff84dcaee0 "") at utility.c:377#14 0x0000000000741b9b in PortalRunUtility (portal=0x1713330, utilityStmt=0x170e3c0, isTopLevel=1 '\001', dest=0x170e700, completionTag=0x7fff84dcaee0 "") at pquery.c:1284#15 0x0000000000741dc8 in PortalRunMulti (portal=0x1713330, isTopLevel=1 '\001', dest=0x170e700, altdest=0x170e700, completionTag=0x7fff84dcaee0 "") at pquery.c:1431#16 0x000000000074126f in PortalRun (portal=0x1713330, count=9223372036854775807, isTopLevel=1 '\001', dest=0x170e700, altdest=0x170e700, completionTag=0x7fff84dcaee0 "") at pquery.c:881#17 0x000000000073ae2d in exec_simple_query (query_string=0x170d9e0 "COMMIT PREPARED 'T24965'") at postgres.c:1142#18 0x000000000073f1f0 in PostgresMain (argc=2, argv=0x16f55a8, username=0x16f5430 "zhangzl") at postgres.c:4212#19 0x00000000006eb06a in BackendRun (port=0x17186b0) at postmaster.c:3803#20 0x00000000006ea759 in BackendStartup (port=0x17186b0) at postmaster.c:3488#21 0x00000000006e7513 in ServerLoop () at postmaster.c:1466#22 0x00000000006e6f1c in PostmasterMain (argc=5, argv=0x16f3610) at postmaster.c:1226#23 0x0000000000650bbd in main (argc=5, argv=0x16f3610) at main.c:199
Please describe a
way to repeat the problem. Please try to provide a
concise
reproducible example, if at all possible:
----------------------------------------------------------------------
Run a TPCC test tool named HammerDB, create a workload with 100 warehouses, run tpcc tests with 20 users.
If you know how
this problem might be fixed, list the solution below:
---------------------------------------------------------------------
According to the stack of the session which holds the TwoPhaseStateLock, the error is in the function of
LockGXact:static GlobalTransaction
LockGXact(const char *gid, Oid user)
{
......
LWLockAcquire(TwoPhaseStateLock, LW_EXCLUSIVE);
for (i = 0; i < TwoPhaseState->numPrepXacts; i++)
{
......
gxact->locking_xid = GetTopTransactionId();
LWLockRelease(TwoPhaseStateLock);
return gxact;
}
LWLockRelease(TwoPhaseStateLock);
......
}
GetTopTransactionId() is blocked by acquiring another lock, but it is invoked between the TwoPhaseStatLock's Acquire and Release.
To fix it, I just call GetTopTransactionId() before "LWLockAcquire(TwoPhaseStateLock, LW_EXCLUSIVE)" to enable the TopTransactionId can be got directly later.
diff --git a/src/backend/access/transam/twophase.c b/src/backend/access/transam/twophase.c
index c39d9e6..1312e88 100644
--- a/src/backend/access/transam/twophase.c
+++ b/src/backend/access/transam/twophase.c
@@ -414,6 +414,8 @@ LockGXact(const char *gid, Oid user)
{
int i;
+ GetTopTransactionId();
+
LWLockAcquire(TwoPhaseStateLock, LW_EXCLUSIVE);
for (i = 0; i < TwoPhaseState->numPrepXacts; i++)
Any commiter can help to review it and commit it to the Github master branch?
ThanksJulian
|
|
From: Ashutosh B. <ash...@en...> - 2014-01-09 03:43:16
|
Hi Julian,
Can you please provide and patch, which fixes this problem?
On Thu, Jan 9, 2014 at 8:09 AM, 张仲良 <jul...@ou...> wrote:
> Your name : Julian
>
> Your email address : jul...@ou...
>
>
>
>
>
> System Configuration:
>
> ---------------------
>
> Architecture (example: Intel Pentium) : Intel Pentium
>
>
>
> Operating System (example: Linux 2.4.18) : 2.6.32-358.el6.x86_64
>
>
>
> Postgres-XC version (example: Postgres-XC 1.1devel): Github master
>
>
>
> Compiler used (example: gcc 3.3.5) : gcc (GCC) 4.4.7
> 20120313 (Red Hat 4.4.7-3)
>
>
>
>
>
> Please enter a FULL description of your problem:
>
> ------------------------------------------------
>
> When testing tpcc (100 warehouses) on PGXC using HammerDB with 20
> concurrent users, about 2 minutes later, all the sessions are blocked by
> acquiring the TwoPhaseStateLock:
>
> execute direct on (datanode2) $$select * from pg_stat_activity where state
> != 'idle' order by query_start$$;
> datid | pid | query_start
> | query
> -------+-------+------------------------------+
> --------------------------------------------------------------------------
> 16384 | 19392 |2014-01-08 14:24:24.274622+08 | autovacuum: VACUUM ANALYZE
> public.stock
> 16384 | 19384 |2014-01-08 14:25:49.073815+08 | PREPARE TRANSACTION
> 'T27146'
> 16384 | 19383 |2014-01-08 14:25:49.084483+08 | COMMIT PREPARED 'T27077'
> 16384 | 19382 |2014-01-08 14:25:49.087827+08 | COMMIT PREPARED 'T27052'
> 16384 | 19385 |2014-01-08 14:25:49.109279+08 | COMMIT PREPARED 'T27118'
> 16384 | 19373 |2014-01-08 14:25:49.114323+08 | COMMIT PREPARED 'T27111'
> 16384 | 19372 |2014-01-08 14:25:49.114784+08 | COMMIT PREPARED 'T27063'
> 16384 | 19376 |2014-01-08 14:25:49.131651+08 | COMMIT PREPARED 'T27102'
> 16384 | 19371 |2014-01-08 14:25:49.147467+08 | COMMIT PREPARED 'T27023'
> 16384 | 19374 |2014-01-08 14:25:49.156297+08 | COMMIT PREPARED 'T27123'
> 16384 | 19386 |2014-01-08 14:25:49.168084+08 | COMMIT PREPARED 'T27128'
> 16384 | 19389 |2014-01-08 14:25:49.179543+08 | PREPARE TRANSACTION
> 'T27161'
> 16384 | 19380 |2014-01-08 14:25:49.222886+08 | COMMIT PREPARED 'T27083'
> 16384 | 19377 |2014-01-08 14:25:49.373674+08 | PREPARE TRANSACTION
> 'T27178'
> 16384 | 19388 |2014-01-08 14:25:49.386222+08 | PREPARE TRANSACTION
> 'T27180'
> 16384 | 19378 |2014-01-08 14:25:49.493811+08 | PREPARE TRANSACTION
> 'T27176'
> 16384 | 19381 |2014-01-08 14:25:49.662885+08 | PREPARE TRANSACTION
> 'T27148'
> 16384 | 19375 |2014-01-08 14:25:49.680977+08 | PREPARE TRANSACTION
> 'T27156'
> 16384 | 19387 |2014-01-08 14:25:49.744282+08 | PREPARE TRANSACTION
> 'T27157'
> 16384 | 19370 |2014-01-08 14:25:49.7463+08 | PREPARE TRANSACTION
> 'T27173'
> 16384 | 19379 |2014-01-08 14:25:49.866666+08 | PREPARE TRANSACTION
> 'T27171'
> 16384 | 18687 |2014-01-08 14:30:46.506894+08 | select * from
> pg_stat_activity where state != 'idle' order by query_start
> (22 rows)
>
> One of the sessions has the stack as below:
> #0 0x0000003ce42eaf37 in semop () from /lib64/libc.so.6
> #1 0x00000000006d9d7a in PGSemaphoreLock (sema=0x7faf6078c490,
> interruptOK=0 '\000') at pg_sema.c:415
> #2 0x000000000072dce3 in LWLockAcquire (lockid=TwoPhaseStateLock,
> mode=LW_EXCLUSIVE) at lwlock.c:474
> #3 0x00000000004adae4 in MarkAsPreparing (xid=69984, gid=0xe9f060
> "T69983", prepared_at=442376096193723, owner=10, databaseid=16450) at
> twophase.c:267
> #4 0x00000000004a567a in PrepareTransaction () at xact.c:2684
> #5 0x00000000004a5e55 in CommitTransactionCommand () at xact.c:3248
> #6 0x000000000073d025 in finish_xact_command () at postgres.c:2551
> #7 0x000000000073ac89 in exec_simple_query (query_string=0xd7bc60
> "PREPARE TRANSACTION 'T69983'") at postgres.c:1159
> #8 0x000000000073f018 in PostgresMain (argc=2, argv=0xd63828,
> username=0xd636b0 "zhangzl") at postgres.c:4212
> #9 0x00000000006eafca in BackendRun (port=0xd86960) at postmaster.c:3803
> #10 0x00000000006ea6b9 in BackendStartup (port=0xd86960) at
> postmaster.c:3488
> #11 0x00000000006e7473 in ServerLoop () at postmaster.c:1466
> #12 0x00000000006e6e7c in PostmasterMain (argc=5, argv=0xd61870) at
> postmaster.c:1226
> #13 0x0000000000650b1d in main (argc=5, argv=0xd61870) at main.c:199
>
>
>
> But the session holding the TwoPhaseStatLock is blocked at:
> (gdb) bt
> #0 0x0000003ce42eaf37 in semop () from /lib64/libc.so.6
> #1 0x00000000006d9e1a in PGSemaphoreLock (sema=0x7f103f857b90,
> interruptOK=1 '\001') at pg_sema.c:415
> #2 0x000000000072b15d in ProcSleep (locallock=0x1712650,
> lockMethodTable=0x9dcb20) at proc.c:1086
> #3 0x0000000000726566 in WaitOnLock (locallock=0x1712650,
> owner=0x179ed40) at lock.c:1537
> #4 0x00000000007258be in LockAcquireExtendedXC (locktag=0x7fff84dca5b0,
> lockmode=7, sessionLock=0 '\000', dontWait=0 '\000', reportMemoryError=1
> '\001', only_increment=0 '\000') at lock.c:914
> #5 0x0000000000724fcd in LockAcquireExtended (locktag=0x7fff84dca5b0,
> lockmode=7, sessionLock=0 '\000', dontWait=0 '\000', reportMemoryError=1
> '\001') at lock.c:616
> #6 0x0000000000724f36 in LockAcquire (locktag=0x7fff84dca5b0, lockmode=7,
> sessionLock=0 '\000', dontWait=0 '\000') at lock.c:575
> #7 0x0000000000724665 in XactLockTableInsert (xid=25010) at lmgr.c:433
> #8 0x00000000004a390c in AssignTransactionId (s=0xcbd9c0) at xact.c:619
> #9 0x00000000004a3657 in GetTopTransactionId () at xact.c:429
> #10 0x00000000004ae2e7 in LockGXact (gid=0x170e3a8 "T24965", user=10) at
> twophase.c:460
> #11 0x00000000004afa05 in FinishPreparedTransaction (gid=0x170e3a8
> "T24965", isCommit=1 '\001') at twophase.c:1298
> #12 0x0000000000742fa6 in standard_ProcessUtility (parsetree=0x170e3c0,
> queryString=0x170d9e0 "COMMIT PREPARED 'T24965'", params=0x0, isTopLevel=1
> '\001', dest=0x170e700, sentToRemote=0 '\000', completionTag=0x7fff84dcaee0
> "") at utility.c:520
> #13 0x0000000000742c10 in ProcessUtility (parsetree=0x170e3c0,
> queryString=0x170d9e0 "COMMIT PREPARED 'T24965'", params=0x0, isTopLevel=1
> '\001', dest=0x170e700, sentToRemote=0 '\000', completionTag=0x7fff84dcaee0
> "") at utility.c:377
> #14 0x0000000000741b9b in PortalRunUtility (portal=0x1713330,
> utilityStmt=0x170e3c0, isTopLevel=1 '\001', dest=0x170e700,
> completionTag=0x7fff84dcaee0 "") at pquery.c:1284
> #15 0x0000000000741dc8 in PortalRunMulti (portal=0x1713330, isTopLevel=1
> '\001', dest=0x170e700, altdest=0x170e700, completionTag=0x7fff84dcaee0 "")
> at pquery.c:1431
> #16 0x000000000074126f in PortalRun (portal=0x1713330,
> count=9223372036854775807, isTopLevel=1 '\001', dest=0x170e700,
> altdest=0x170e700, completionTag=0x7fff84dcaee0 "") at pquery.c:881
> #17 0x000000000073ae2d in exec_simple_query (query_string=0x170d9e0
> "COMMIT PREPARED 'T24965'") at postgres.c:1142
> #18 0x000000000073f1f0 in PostgresMain (argc=2, argv=0x16f55a8,
> username=0x16f5430 "zhangzl") at postgres.c:4212
> #19 0x00000000006eb06a in BackendRun (port=0x17186b0) at postmaster.c:3803
> #20 0x00000000006ea759 in BackendStartup (port=0x17186b0) at
> postmaster.c:3488
> #21 0x00000000006e7513 in ServerLoop () at postmaster.c:1466
> #22 0x00000000006e6f1c in PostmasterMain (argc=5, argv=0x16f3610) at
> postmaster.c:1226
> #23 0x0000000000650bbd in main (argc=5, argv=0x16f3610) at main.c:199
>
>
>
>
>
> Please describe a way to repeat the problem. Please try to provide a
>
> concise reproducible example, if at all possible:
>
> ----------------------------------------------------------------------
>
> Run a TPCC test tool named HammerDB, create a workload with 100
> warehouses, run tpcc tests with 20 users.
>
>
>
>
>
>
>
>
> If you know how this problem might be fixed, list the solution below:
>
> ---------------------------------------------------------------------
>
> According to the stack of the session which holds the TwoPhaseStateLock,
> the error is in the function of
> LockGXact:
>
> static GlobalTransaction
> LockGXact(const char *gid, Oid user)
> {
> ......
> LWLockAcquire(TwoPhaseStateLock, LW_EXCLUSIVE);
>
> for (i = 0; i < TwoPhaseState->numPrepXacts; i++)
> {
> ......
> gxact->locking_xid = GetTopTransactionId();
>
> LWLockRelease(TwoPhaseStateLock);
>
> return gxact;
> }
>
> LWLockRelease(TwoPhaseStateLock);
> ......
> }
>
>
> GetTopTransactionId() is blocked by acquiring another lock, but it is
> invoked between the TwoPhaseStatLock's Acquire and Release.
>
> To fix it, I just call GetTopTransactionId() before "LWLockAcquire(TwoPhaseStateLock,
> LW_EXCLUSIVE)" to enable the TopTransactionId can be got directly later.
>
> diff --git a/src/backend/access/transam/twophase.c
> b/src/backend/access/transam/twophase.c
> index c39d9e6..1312e88 100644
> --- a/src/backend/access/transam/twophase.c
> +++ b/src/backend/access/transam/twophase.c
> @@ -414,6 +414,8 @@ LockGXact(const char *gid, Oid user)
> {
> int i;
>
> + GetTopTransactionId();
> +
> LWLockAcquire(TwoPhaseStateLock, LW_EXCLUSIVE);
>
> for (i = 0; i < TwoPhaseState->numPrepXacts; i++)
>
>
> Any commiter can help to review it and commit it to the Github master
> branch?
>
>
> Thanks
>
> Julian
>
>
>
>
> ------------------------------------------------------------------------------
> CenturyLink Cloud: The Leader in Enterprise Cloud Services.
> Learn Why More Businesses Are Choosing CenturyLink Cloud For
> Critical Workloads, Development Environments & Everything In Between.
> Get a Quote or Start a Free Trial Today.
>
> http://pubads.g.doubleclick.net/gampad/clk?id=119420431&iu=/4140/ostg.clktrk
> _______________________________________________
> Postgres-xc-bugs mailing list
> Pos...@li...
> https://lists.sourceforge.net/lists/listinfo/postgres-xc-bugs
>
>
--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company
|
|
From: Michael P. <mic...@gm...> - 2014-01-09 05:22:52
|
>> According to the stack of the session which holds the TwoPhaseStateLock,
>> the error is in the function of
>> LockGXact:
>>
>> static GlobalTransaction
>> LockGXact(const char *gid, Oid user)
>> {
>> ......
>> LWLockAcquire(TwoPhaseStateLock, LW_EXCLUSIVE);
>>
>> for (i = 0; i < TwoPhaseState->numPrepXacts; i++)
>> {
>> ......
>> gxact->locking_xid = GetTopTransactionId();
>>
>> LWLockRelease(TwoPhaseStateLock);
>>
>> return gxact;
>> }
>>
>> LWLockRelease(TwoPhaseStateLock);
>> ......
>> }
>>
>>
>> GetTopTransactionId() is blocked by acquiring another lock, but it is
>> invoked between the TwoPhaseStatLock's Acquire and Release.
>>
>> To fix it, I just call GetTopTransactionId() before
>> "LWLockAcquire(TwoPhaseStateLock, LW_EXCLUSIVE)" to enable the
>> TopTransactionId can be got directly later.
>>
>> diff --git a/src/backend/access/transam/twophase.c
>> b/src/backend/access/transam/twophase.c
>> index c39d9e6..1312e88 100644
>> --- a/src/backend/access/transam/twophase.c
>> +++ b/src/backend/access/transam/twophase.c
>> @@ -414,6 +414,8 @@ LockGXact(const char *gid, Oid user)
>> {
>> int i;
>>
>> + GetTopTransactionId();
>> +
>> LWLockAcquire(TwoPhaseStateLock, LW_EXCLUSIVE);
>>
>> for (i = 0; i < TwoPhaseState->numPrepXacts; i++)
>>
>>
>> Any commiter can help to review it and commit it to the Github master
>> branch?
> Can you please provide and patch, which fixes this problem?
The patch is a one-line fix and is in the email content... There is
even an analysis and backtraces. There is enough content to review it.
Regards,
--
Michael
|
|
From: Ashutosh B. <ash...@en...> - 2014-01-09 05:27:59
|
A patch is a patch! You can not expect people to pull code out of email and
try it out.
Also, we need regression results with the patch, which are absent in the
mail.
On Thu, Jan 9, 2014 at 10:52 AM, Michael Paquier
<mic...@gm...>wrote:
> >> According to the stack of the session which holds the TwoPhaseStateLock,
> >> the error is in the function of
> >> LockGXact:
> >>
> >> static GlobalTransaction
> >> LockGXact(const char *gid, Oid user)
> >> {
> >> ......
> >> LWLockAcquire(TwoPhaseStateLock, LW_EXCLUSIVE);
> >>
> >> for (i = 0; i < TwoPhaseState->numPrepXacts; i++)
> >> {
> >> ......
> >> gxact->locking_xid = GetTopTransactionId();
> >>
> >> LWLockRelease(TwoPhaseStateLock);
> >>
> >> return gxact;
> >> }
> >>
> >> LWLockRelease(TwoPhaseStateLock);
> >> ......
> >> }
> >>
> >>
> >> GetTopTransactionId() is blocked by acquiring another lock, but it is
> >> invoked between the TwoPhaseStatLock's Acquire and Release.
> >>
> >> To fix it, I just call GetTopTransactionId() before
> >> "LWLockAcquire(TwoPhaseStateLock, LW_EXCLUSIVE)" to enable the
> >> TopTransactionId can be got directly later.
> >>
> >> diff --git a/src/backend/access/transam/twophase.c
> >> b/src/backend/access/transam/twophase.c
> >> index c39d9e6..1312e88 100644
> >> --- a/src/backend/access/transam/twophase.c
> >> +++ b/src/backend/access/transam/twophase.c
> >> @@ -414,6 +414,8 @@ LockGXact(const char *gid, Oid user)
> >> {
> >> int i;
> >>
> >> + GetTopTransactionId();
> >> +
> >> LWLockAcquire(TwoPhaseStateLock, LW_EXCLUSIVE);
> >>
> >> for (i = 0; i < TwoPhaseState->numPrepXacts; i++)
> >>
> >>
> >> Any commiter can help to review it and commit it to the Github master
> >> branch?
> > Can you please provide and patch, which fixes this problem?
> The patch is a one-line fix and is in the email content... There is
> even an analysis and backtraces. There is enough content to review it.
> Regards,
> --
> Michael
>
--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company
|
|
From: Koichi S. <koi...@gm...> - 2014-01-09 05:33:58
|
Thanks a lot for the analysis and the patch.
The patch is not for XC-specific code. Do you think the patch should
go to PG as well?
Regards;
---
Koichi Suzuki
2014/1/9 张仲良 <jul...@ou...>:
> Your name : Julian
>
> Your email address : jul...@ou...
>
>
>
>
>
> System Configuration:
>
> ---------------------
>
> Architecture (example: Intel Pentium) : Intel Pentium
>
>
>
> Operating System (example: Linux 2.4.18) : 2.6.32-358.el6.x86_64
>
>
>
> Postgres-XC version (example: Postgres-XC 1.1devel): Github master
>
>
>
> Compiler used (example: gcc 3.3.5) : gcc (GCC) 4.4.7
> 20120313 (Red Hat 4.4.7-3)
>
>
>
>
>
> Please enter a FULL description of your problem:
>
> ------------------------------------------------
>
> When testing tpcc (100 warehouses) on PGXC using HammerDB with 20
> concurrent users, about 2 minutes later, all the sessions are blocked by
> acquiring the TwoPhaseStateLock:
>
> execute direct on (datanode2) $$select * from pg_stat_activity where state
> != 'idle' order by query_start$$;
> datid | pid | query_start |
> query
> -------+-------+------------------------------+
> --------------------------------------------------------------------------
> 16384 | 19392 |2014-01-08 14:24:24.274622+08 | autovacuum: VACUUM ANALYZE
> public.stock
> 16384 | 19384 |2014-01-08 14:25:49.073815+08 | PREPARE TRANSACTION 'T27146'
> 16384 | 19383 |2014-01-08 14:25:49.084483+08 | COMMIT PREPARED 'T27077'
> 16384 | 19382 |2014-01-08 14:25:49.087827+08 | COMMIT PREPARED 'T27052'
> 16384 | 19385 |2014-01-08 14:25:49.109279+08 | COMMIT PREPARED 'T27118'
> 16384 | 19373 |2014-01-08 14:25:49.114323+08 | COMMIT PREPARED 'T27111'
> 16384 | 19372 |2014-01-08 14:25:49.114784+08 | COMMIT PREPARED 'T27063'
> 16384 | 19376 |2014-01-08 14:25:49.131651+08 | COMMIT PREPARED 'T27102'
> 16384 | 19371 |2014-01-08 14:25:49.147467+08 | COMMIT PREPARED 'T27023'
> 16384 | 19374 |2014-01-08 14:25:49.156297+08 | COMMIT PREPARED 'T27123'
> 16384 | 19386 |2014-01-08 14:25:49.168084+08 | COMMIT PREPARED 'T27128'
> 16384 | 19389 |2014-01-08 14:25:49.179543+08 | PREPARE TRANSACTION 'T27161'
> 16384 | 19380 |2014-01-08 14:25:49.222886+08 | COMMIT PREPARED 'T27083'
> 16384 | 19377 |2014-01-08 14:25:49.373674+08 | PREPARE TRANSACTION 'T27178'
> 16384 | 19388 |2014-01-08 14:25:49.386222+08 | PREPARE TRANSACTION 'T27180'
> 16384 | 19378 |2014-01-08 14:25:49.493811+08 | PREPARE TRANSACTION 'T27176'
> 16384 | 19381 |2014-01-08 14:25:49.662885+08 | PREPARE TRANSACTION 'T27148'
> 16384 | 19375 |2014-01-08 14:25:49.680977+08 | PREPARE TRANSACTION 'T27156'
> 16384 | 19387 |2014-01-08 14:25:49.744282+08 | PREPARE TRANSACTION 'T27157'
> 16384 | 19370 |2014-01-08 14:25:49.7463+08 | PREPARE TRANSACTION 'T27173'
> 16384 | 19379 |2014-01-08 14:25:49.866666+08 | PREPARE TRANSACTION 'T27171'
> 16384 | 18687 |2014-01-08 14:30:46.506894+08 | select * from
> pg_stat_activity where state != 'idle' order by query_start
> (22 rows)
>
> One of the sessions has the stack as below:
>
> #0 0x0000003ce42eaf37 in semop () from /lib64/libc.so.6
> #1 0x00000000006d9d7a in PGSemaphoreLock (sema=0x7faf6078c490,
> interruptOK=0 '\000') at pg_sema.c:415
> #2 0x000000000072dce3 in LWLockAcquire (lockid=TwoPhaseStateLock,
> mode=LW_EXCLUSIVE) at lwlock.c:474
> #3 0x00000000004adae4 in MarkAsPreparing (xid=69984, gid=0xe9f060 "T69983",
> prepared_at=442376096193723, owner=10, databaseid=16450) at twophase.c:267
> #4 0x00000000004a567a in PrepareTransaction () at xact.c:2684
> #5 0x00000000004a5e55 in CommitTransactionCommand () at xact.c:3248
> #6 0x000000000073d025 in finish_xact_command () at postgres.c:2551
> #7 0x000000000073ac89 in exec_simple_query (query_string=0xd7bc60 "PREPARE
> TRANSACTION 'T69983'") at postgres.c:1159
> #8 0x000000000073f018 in PostgresMain (argc=2, argv=0xd63828,
> username=0xd636b0 "zhangzl") at postgres.c:4212
> #9 0x00000000006eafca in BackendRun (port=0xd86960) at postmaster.c:3803
> #10 0x00000000006ea6b9 in BackendStartup (port=0xd86960) at
> postmaster.c:3488
> #11 0x00000000006e7473 in ServerLoop () at postmaster.c:1466
> #12 0x00000000006e6e7c in PostmasterMain (argc=5, argv=0xd61870) at
> postmaster.c:1226
> #13 0x0000000000650b1d in main (argc=5, argv=0xd61870) at main.c:199
>
>
>
> But the session holding the TwoPhaseStatLock is blocked at:
>
> (gdb) bt
> #0 0x0000003ce42eaf37 in semop () from /lib64/libc.so.6
> #1 0x00000000006d9e1a in PGSemaphoreLock (sema=0x7f103f857b90,
> interruptOK=1 '\001') at pg_sema.c:415
> #2 0x000000000072b15d in ProcSleep (locallock=0x1712650,
> lockMethodTable=0x9dcb20) at proc.c:1086
> #3 0x0000000000726566 in WaitOnLock (locallock=0x1712650, owner=0x179ed40)
> at lock.c:1537
> #4 0x00000000007258be in LockAcquireExtendedXC (locktag=0x7fff84dca5b0,
> lockmode=7, sessionLock=0 '\000', dontWait=0 '\000', reportMemoryError=1
> '\001', only_increment=0 '\000') at lock.c:914
> #5 0x0000000000724fcd in LockAcquireExtended (locktag=0x7fff84dca5b0,
> lockmode=7, sessionLock=0 '\000', dontWait=0 '\000', reportMemoryError=1
> '\001') at lock.c:616
> #6 0x0000000000724f36 in LockAcquire (locktag=0x7fff84dca5b0, lockmode=7,
> sessionLock=0 '\000', dontWait=0 '\000') at lock.c:575
> #7 0x0000000000724665 in XactLockTableInsert (xid=25010) at lmgr.c:433
> #8 0x00000000004a390c in AssignTransactionId (s=0xcbd9c0) at xact.c:619
> #9 0x00000000004a3657 in GetTopTransactionId () at xact.c:429
> #10 0x00000000004ae2e7 in LockGXact (gid=0x170e3a8 "T24965", user=10) at
> twophase.c:460
> #11 0x00000000004afa05 in FinishPreparedTransaction (gid=0x170e3a8 "T24965",
> isCommit=1 '\001') at twophase.c:1298
> #12 0x0000000000742fa6 in standard_ProcessUtility (parsetree=0x170e3c0,
> queryString=0x170d9e0 "COMMIT PREPARED 'T24965'", params=0x0, isTopLevel=1
> '\001', dest=0x170e700, sentToRemote=0 '\000', completionTag=0x7fff84dcaee0
> "") at utility.c:520
> #13 0x0000000000742c10 in ProcessUtility (parsetree=0x170e3c0,
> queryString=0x170d9e0 "COMMIT PREPARED 'T24965'", params=0x0, isTopLevel=1
> '\001', dest=0x170e700, sentToRemote=0 '\000', completionTag=0x7fff84dcaee0
> "") at utility.c:377
> #14 0x0000000000741b9b in PortalRunUtility (portal=0x1713330,
> utilityStmt=0x170e3c0, isTopLevel=1 '\001', dest=0x170e700,
> completionTag=0x7fff84dcaee0 "") at pquery.c:1284
> #15 0x0000000000741dc8 in PortalRunMulti (portal=0x1713330, isTopLevel=1
> '\001', dest=0x170e700, altdest=0x170e700, completionTag=0x7fff84dcaee0 "")
> at pquery.c:1431
> #16 0x000000000074126f in PortalRun (portal=0x1713330,
> count=9223372036854775807, isTopLevel=1 '\001', dest=0x170e700,
> altdest=0x170e700, completionTag=0x7fff84dcaee0 "") at pquery.c:881
> #17 0x000000000073ae2d in exec_simple_query (query_string=0x170d9e0 "COMMIT
> PREPARED 'T24965'") at postgres.c:1142
> #18 0x000000000073f1f0 in PostgresMain (argc=2, argv=0x16f55a8,
> username=0x16f5430 "zhangzl") at postgres.c:4212
> #19 0x00000000006eb06a in BackendRun (port=0x17186b0) at postmaster.c:3803
> #20 0x00000000006ea759 in BackendStartup (port=0x17186b0) at
> postmaster.c:3488
> #21 0x00000000006e7513 in ServerLoop () at postmaster.c:1466
> #22 0x00000000006e6f1c in PostmasterMain (argc=5, argv=0x16f3610) at
> postmaster.c:1226
> #23 0x0000000000650bbd in main (argc=5, argv=0x16f3610) at main.c:199
>
>
>
>
>
> Please describe a way to repeat the problem. Please try to provide a
>
> concise reproducible example, if at all possible:
>
> ----------------------------------------------------------------------
>
> Run a TPCC test tool named HammerDB, create a workload with 100 warehouses,
> run tpcc tests with 20 users.
>
>
>
>
>
>
>
>
> If you know how this problem might be fixed, list the solution below:
>
> ---------------------------------------------------------------------
>
> According to the stack of the session which holds the TwoPhaseStateLock, the
> error is in the function of
> LockGXact:
>
> static GlobalTransaction
> LockGXact(const char *gid, Oid user)
> {
> ......
> LWLockAcquire(TwoPhaseStateLock, LW_EXCLUSIVE);
>
> for (i = 0; i < TwoPhaseState->numPrepXacts; i++)
> {
> ......
> gxact->locking_xid = GetTopTransactionId();
>
> LWLockRelease(TwoPhaseStateLock);
>
> return gxact;
> }
>
> LWLockRelease(TwoPhaseStateLock);
> ......
> }
>
>
> GetTopTransactionId() is blocked by acquiring another lock, but it is
> invoked between the TwoPhaseStatLock's Acquire and Release.
>
> To fix it, I just call GetTopTransactionId() before
> "LWLockAcquire(TwoPhaseStateLock, LW_EXCLUSIVE)" to enable the
> TopTransactionId can be got directly later.
>
> diff --git a/src/backend/access/transam/twophase.c
> b/src/backend/access/transam/twophase.c
> index c39d9e6..1312e88 100644
> --- a/src/backend/access/transam/twophase.c
> +++ b/src/backend/access/transam/twophase.c
> @@ -414,6 +414,8 @@ LockGXact(const char *gid, Oid user)
> {
> int i;
>
> + GetTopTransactionId();
> +
> LWLockAcquire(TwoPhaseStateLock, LW_EXCLUSIVE);
>
> for (i = 0; i < TwoPhaseState->numPrepXacts; i++)
>
>
> Any commiter can help to review it and commit it to the Github master
> branch?
>
>
> Thanks
>
> Julian
>
>
>
> ------------------------------------------------------------------------------
> CenturyLink Cloud: The Leader in Enterprise Cloud Services.
> Learn Why More Businesses Are Choosing CenturyLink Cloud For
> Critical Workloads, Development Environments & Everything In Between.
> Get a Quote or Start a Free Trial Today.
> http://pubads.g.doubleclick.net/gampad/clk?id=119420431&iu=/4140/ostg.clktrk
> _______________________________________________
> Postgres-xc-bugs mailing list
> Pos...@li...
> https://lists.sourceforge.net/lists/listinfo/postgres-xc-bugs
>
|
|
From: Michael P. <mic...@gm...> - 2014-01-09 05:44:08
|
On Thu, Jan 9, 2014 at 2:33 PM, Koichi Suzuki <koi...@gm...> wrote: > Thanks a lot for the analysis and the patch. > > The patch is not for XC-specific code. Do you think the patch should > go to PG as well? No. The code path taken in the first section is exclusive to XC: it uses an internal 2PC. IMO the patch is fine as I recall (some souvenir, feel free to correct me) that in this code path a transaction ID is already assigned, so GetTopTransactionId is not going to call GTM. Regards, -- Michael |
|
From: Koichi S. <koi...@gm...> - 2014-01-09 06:04:07
|
I understand this. If the code is specific to XC, we should enclose the additional line with #ifdef PGXC --- #endif. Regards; --- Koichi Suzuki 2014/1/9 Michael Paquier <mic...@gm...>: > On Thu, Jan 9, 2014 at 2:33 PM, Koichi Suzuki <koi...@gm...> wrote: >> Thanks a lot for the analysis and the patch. >> >> The patch is not for XC-specific code. Do you think the patch should >> go to PG as well? > No. The code path taken in the first section is exclusive to XC: it > uses an internal 2PC. IMO the patch is fine as I recall (some > souvenir, feel free to correct me) that in this code path a > transaction ID is already assigned, so GetTopTransactionId is not > going to call GTM. > Regards, > -- > Michael |
|
From: Koichi S. <koi...@gm...> - 2014-01-09 06:17:20
|
I understand that PG may need this additional call to make request to GTM. GTM requires calling backend's transaction ID. If I'm right, then the patch should go to all the releases. As much as I tested with REL1_0_STABLE and REL1_1_STABLE, it does not affect any of the regression tests. Also this looks good to go to the master and REL1_2_STABLE, which is under preparation. Any more inputs? --- Koichi Suzuki 2014/1/9 Koichi Suzuki <koi...@gm...>: > I understand this. If the code is specific to XC, we should enclose > the additional line with #ifdef PGXC --- #endif. > > Regards; > --- > Koichi Suzuki > > > 2014/1/9 Michael Paquier <mic...@gm...>: >> On Thu, Jan 9, 2014 at 2:33 PM, Koichi Suzuki <koi...@gm...> wrote: >>> Thanks a lot for the analysis and the patch. >>> >>> The patch is not for XC-specific code. Do you think the patch should >>> go to PG as well? >> No. The code path taken in the first section is exclusive to XC: it >> uses an internal 2PC. IMO the patch is fine as I recall (some >> souvenir, feel free to correct me) that in this code path a >> transaction ID is already assigned, so GetTopTransactionId is not >> going to call GTM. >> Regards, >> -- >> Michael |
|
From: Michael P. <mic...@gm...> - 2014-01-09 06:32:07
|
On Thu, Jan 9, 2014 at 3:17 PM, Koichi Suzuki <koi...@gm...> wrote: > If I'm right, then the patch should go to all the releases. As much > as I tested with REL1_0_STABLE and REL1_1_STABLE, it does not affect > any of the regression tests. Also this looks good to go to the > master and REL1_2_STABLE, which is under preparation. Agreed. -- Michael |
|
From: Mason S. <ms...@tr...> - 2014-01-09 14:49:29
|
On Thu, Jan 9, 2014 at 12:27 AM, Ashutosh Bapat < ash...@en...> wrote: > A patch is a patch! You can not expect people to pull code out of email > and try it out. > > Also, we need regression results with the patch, which are absent in the > mail. > > > Julian, please do not be discouraged. I appreciate you taking the time to participate in the list and send along a patch, and I hope that you continue to do so in order for us to grow the Postgres-XC community. To make it a little easier for everyone, in the future please include the patch as an attachment if you can. Thanks much, -- Mason Sharp TransLattice - http://www.translattice.com Distributed and Clustered Database Solutions |
|
From: Koichi S. <koi...@gm...> - 2014-01-09 15:48:25
|
Don't worry. I made a patch from Julian's info. Sorry it's in my office so I will post this Friday morning in my time. Hope this fixes several relevant transaction management problems. Regards; --- Koichi Suzuki 2014/1/9 Mason Sharp <ms...@tr...>: > > On Thu, Jan 9, 2014 at 12:27 AM, Ashutosh Bapat > <ash...@en...> wrote: >> >> A patch is a patch! You can not expect people to pull code out of email >> and try it out. >> >> Also, we need regression results with the patch, which are absent in the >> mail. >> >> > > Julian, please do not be discouraged. I appreciate you taking the time to > participate in the list and send along a patch, and I hope that you continue > to do so in order for us to grow the Postgres-XC community. > > To make it a little easier for everyone, in the future please include the > patch as an attachment if you can. > > Thanks much, > > > -- > Mason Sharp > > TransLattice - http://www.translattice.com > Distributed and Clustered Database Solutions > > > > ------------------------------------------------------------------------------ > CenturyLink Cloud: The Leader in Enterprise Cloud Services. > Learn Why More Businesses Are Choosing CenturyLink Cloud For > Critical Workloads, Development Environments & Everything In Between. > Get a Quote or Start a Free Trial Today. > http://pubads.g.doubleclick.net/gampad/clk?id=119420431&iu=/4140/ostg.clktrk > _______________________________________________ > Postgres-xc-bugs mailing list > Pos...@li... > https://lists.sourceforge.net/lists/listinfo/postgres-xc-bugs > |
|
From: ZhangJulian <jul...@ou...> - 2014-01-10 01:24:26
|
Hi Mason, I am very glad to know the fix could pass your review. I will continue focusing and working on the amazing product. It is the first time I wrote the mail to the mail group, next time, I will append the regression test result and the patch as the attachments. :) Thanks Date: Thu, 9 Jan 2014 09:49:22 -0500 Subject: Re: [Postgres-xc-bugs] A bug and related fix: Dead lock when testing TPCC on PGXC by HammerDB From: ms...@tr... To: jul...@ou... CC: mic...@gm...; ash...@en...; pos...@li... On Thu, Jan 9, 2014 at 12:27 AM, Ashutosh Bapat <ash...@en...> wrote: A patch is a patch! You can not expect people to pull code out of email and try it out. Also, we need regression results with the patch, which are absent in the mail. Julian, please do not be discouraged. I appreciate you taking the time to participate in the list and send along a patch, and I hope that you continue to do so in order for us to grow the Postgres-XC community. To make it a little easier for everyone, in the future please include the patch as an attachment if you can. Thanks much, -- Mason Sharp TransLattice - http://www.translattice.com Distributed and Clustered Database Solutions |
|
From: Koichi S. <koi...@gm...> - 2014-01-10 01:34:36
Attachments:
20140109_two_phase.patch
|
Here's the patch. It passes regressions. BTW, I'm afraid we may need similar fix to other part of gtm-related code. At present, GTM requires backend's GXID to obtain a snapshot, which may not be needed, especially for read-only TXNs. In the next chance of code clean-up, we might better consider to remove this requirement, or allow invalid gxid (zero value) to eliminate this case and to save GXID value. This is a better fix for the issue. Other thoughts? Regards; --- Koichi Suzuki 2014/1/10 ZhangJulian <jul...@ou...>: > Hi Mason, > > > > I am very glad to know the fix could pass your review. I will continue > focusing and working on the amazing product. > > It is the first time I wrote the mail to the mail group, next time, I will > append the regression test result and the patch as the attachments. :) > > > > Thanks > > > ________________________________ > Date: Thu, 9 Jan 2014 09:49:22 -0500 > Subject: Re: [Postgres-xc-bugs] A bug and related fix: Dead lock when > testing TPCC on PGXC by HammerDB > From: ms...@tr... > To: jul...@ou... > CC: mic...@gm...; ash...@en...; > pos...@li... > > > > On Thu, Jan 9, 2014 at 12:27 AM, Ashutosh Bapat > <ash...@en...> wrote: > > A patch is a patch! You can not expect people to pull code out of email and > try it out. > > Also, we need regression results with the patch, which are absent in the > mail. > > > > Julian, please do not be discouraged. I appreciate you taking the time to > participate in the list and send along a patch, and I hope that you continue > to do so in order for us to grow the Postgres-XC community. > > To make it a little easier for everyone, in the future please include the > patch as an attachment if you can. > > Thanks much, > > > -- > Mason Sharp > > TransLattice - http://www.translattice.com > Distributed and Clustered Database Solutions > > > > ------------------------------------------------------------------------------ > CenturyLink Cloud: The Leader in Enterprise Cloud Services. > Learn Why More Businesses Are Choosing CenturyLink Cloud For > Critical Workloads, Development Environments & Everything In Between. > Get a Quote or Start a Free Trial Today. > http://pubads.g.doubleclick.net/gampad/clk?id=119420431&iu=/4140/ostg.clktrk > _______________________________________________ > Postgres-xc-bugs mailing list > Pos...@li... > https://lists.sourceforge.net/lists/listinfo/postgres-xc-bugs > |
|
From: Michael P. <mic...@gm...> - 2014-01-10 01:56:37
|
On Fri, Jan 10, 2014 at 10:24 AM, ZhangJulian <jul...@ou...> wrote: > I am very glad to know the fix could pass your review. I will continue > focusing and working on the amazing product. > > It is the first time I wrote the mail to the mail group, next time, I will > append the regression test result and the patch as the attachments. :) Yeah actually I personally think that you have provided enough for this bug and I am amazed to see that many details for your first bug report on this mailing list. Having a patch (though not directly attached to the mail), an analysis AND test case is something that you barely see when bugs are reported. So don't feel discouraged and keep up with the nice efforts! Regards, -- Michael |