|
From: 鈴木 幸市 <ko...@in...> - 2014-07-22 09:48:35
|
2014/07/22 12:20、ZhangJulian <jul...@ou...<mailto:jul...@ou...>> のメール:
Hi Koichi,
Thanks for the reply! Then how to fix this type of issue?
================code segment1:
if (TransactionIdIsValid(next_xid))
{
xid = next_xid;
elog(DEBUG1, "TransactionId = %d", next_xid);
next_xid = InvalidTransactionId; /* reset */
if (!TransactionIdFollowsOrEquals(xid, ShmemVariableCache->nextXid))
{
/* This should be ok, due to concurrency from multiple coords
* passing down the xids.
* We later do not want to bother incrementing the value
* in shared memory though.
*/
increment_xid = false;
elog(DEBUG1, "xid (%d) does not follow ShmemVariableCache->nextXid (%d)",
xid, ShmemVariableCache->nextXid);
}
else
ShmemVariableCache->nextXid = xid;
}
else
{
/* Fallback to default */
if (!useLocalXid)
elog(LOG, "Falling back to local Xid. Was = %d, now is = %d",
next_xid, ShmemVariableCache->nextXid);
xid = ShmemVariableCache->nextXid;
}
The above else clause looks bad, as you suggested.
===============
Just fix the ELSE branch by replacing it to a elog(ERROR, ...) ?
===============code segment2:
if (MyPgXact->vacuumFlags & PROC_IN_VACUUM)
next_xid = (TransactionId) BeginTranAutovacuumGTM();
else
next_xid = (TransactionId) BeginTranGTM(timestamp);
===============
Or fix the BeginTranAutovacuumGTM() and BeginTranGTM(timestamp) to ensure which return a valid xid?
This is correct. Difference is that GXID for vacuum does not propagate to other TXN in snapshot. Although vacuum runs as a TXN, its GXID
never appears in xmin or xmax. Vacuum is a long transaction and including this in a snapshot has bad affect to total performance.
Regards;
—
Koichi Suzuki
Thanks
Julian
________________________________
From: ko...@in...<mailto:ko...@in...>
To: jul...@ou...<mailto:jul...@ou...>
CC: pos...@li...<mailto:pos...@li...>
Subject: Re: [Postgres-xc-bugs] A stack trace for your investigating.
Date: Tue, 22 Jul 2014 02:09:55 +0000
There’s still some code which falls to local XID. Initial XC code included such a fallback which should not happen and there must still be this type of code somewhere.
Thank you.
---
Koichi Suzuki
2014/07/21 13:08、ZhangJulian <jul...@ou...<mailto:jul...@ou...>> のメール:
Hi All,
It is hard to reproduce, just paste it here for your information.
A autovacuum process got a invalid xid, then it fell back to a local xid. But the xid can not get the related Snapshot from GTM.
It looks a design error, but I know little about the local transaction design. Is there any document talking about it?
===On the datanode:
(gdb) bt
#0 0x000000340b232925 in raise () from /lib64/libc.so.6
#1 0x000000340b234105 in abort () from /lib64/libc.so.6
#2 0x0000000000870fef in errfinish (dummy=0) at elog.c:566
#3 0x000000000087322f in elog_finish (elevel=22, fmt=0x8def08 "cannot abort transaction %u, it was already committed") at elog.c:1334
#4 0x00000000004cd162 in RecordTransactionAbort (isSubXact=0 '\000') at xact.c:1686
#5 0x00000000004ce48f in AbortTransaction () at xact.c:2998
#6 0x00000000004d00b8 in AbortOutOfAnyTransaction () at xact.c:4663
#7 0x0000000000883801 in ShutdownPostgres (code=0, arg=0) at postinit.c:1033
#8 0x0000000000738990 in shmem_exit (code=0) at ipc.c:221
#9 0x0000000000738892 in proc_exit_prepare (code=0) at ipc.c:181
#10 0x00000000007387f9 in proc_exit (code=0) at ipc.c:96
#11 0x00000000006f26f2 in AutoVacWorkerMain (argc=0, argv=0x0) at autovacuum.c:1566
#12 0x00000000006f258b in StartAutoVacWorker () at autovacuum.c:1464
#13 0x00000000007034f3 in StartAutovacuumWorker () at postmaster.c:5317
#14 0x0000000000702d8d in sigusr1_handler (postgres_signal_arg=10) at postmaster.c:4963
#15 <signal handler called>
#16 0x000000340b2e15c3 in __select_nocancel () from /lib64/libc.so.6
#17 0x00000000006fea75 in ServerLoop () at postmaster.c:1662
#18 0x00000000006fe411 in PostmasterMain (argc=5, argv=0x2a4b790) at postmaster.c:1369
#19 0x000000000066326d in main (argc=5, argv=0x2a4b790) at main.c:206
====Datanode Log:
postgresql-2014-07-18_105851.log:14716 pgxc 2014-07-18 11:42:39 CSTLOG: statement: select autoanalyze_count from pg_stat_user_tables where schemaname = 'public' and relname = 'imei_historyseristatus'
postgresql-2014-07-18_105851.log:14716 2014-07-18 13:09:22 CSTLOG: Falling back to local Xid. Was = 0, now is = 307616
postgresql-2014-07-18_105851.log:14716 2014-07-18 13:09:23 CSTERROR: GTM error, could not obtain snapshot
postgresql-2014-07-18_105851.log:14716 2014-07-18 13:09:23 CSTPANIC: cannot abort transaction 307616, it was already committed
postgresql-2014-07-18_105851.log:27533 2014-07-18 13:10:35 CSTLOG: server process (PID 14716) was terminated by signal 6: Aborted
====GTM Log:
1:140085928392448:2014-07-18 13:05:19.803 CST -LOG: Saving transaction restoration info, backed-up gxid: 308842
LOCATION: GTM_WriteRestorePointXid, gtm_txn.c:2649
1:140085928392448:2014-07-18 13:09:23.001 CST -WARNING: No transaction handle for gxid: 307616
LOCATION: GTM_GXIDToHandle, gtm_txn.c:163
1:140085928392448:2014-07-18 13:09:23.020 CST -WARNING: Invalid transaction handle: -1
LOCATION: GTM_HandleToTransactionInfo, gtm_txn.c:213
1:140085928392448:2014-07-18 13:09:23.068 CST -ERROR: Failed to get a snapshot
LOCATION: ProcessGetSnapshotCommandMulti, gtm_snap.c:420
1:140085928392448:2014-07-18 13:18:44.968 CST -LOG: Saving transaction restoration info, backed-up gxid: 310846
Thanks
Julian
------------------------------------------------------------------------------
Want fast and easy access to all the code in your enterprise? Index and
search up to 200,000 lines of code with a free copy of Black Duck
Code Sight - the same software that powers the world's largest code
search on Ohloh, the Black Duck Open Hub! Try it now.
http://p.sf.net/sfu/bds_______________________________________________
Postgres-xc-bugs mailing list
Pos...@li...<mailto:Pos...@li...>
https://lists.sourceforge.net/lists/listinfo/postgres-xc-bugs
|