| 
      
      
      From: ZhangJulian <jul...@ou...> - 2014-07-22 03:20:18
      
     | 
| Hi Koichi,
Thanks for the reply! Then how to fix this type of issue?
================code segment1:
        if (TransactionIdIsValid(next_xid))
        {
            xid = next_xid;
            elog(DEBUG1, "TransactionId = %d", next_xid);
            next_xid = InvalidTransactionId; /* reset */
            if (!TransactionIdFollowsOrEquals(xid, ShmemVariableCache->nextXid))
            {
                /* This should be ok, due to concurrency from multiple coords
                 * passing down the xids.
                 * We later do not want to bother incrementing the value
                 * in shared memory though.
                 */
                increment_xid = false;
                elog(DEBUG1, "xid (%d) does not follow ShmemVariableCache->nextXid (%d)",
                    xid, ShmemVariableCache->nextXid);
            }
            else
                ShmemVariableCache->nextXid = xid;
        }
        else
        {
            /* Fallback to default */
            if (!useLocalXid)
                elog(LOG, "Falling back to local Xid. Was = %d, now is = %d",
                    next_xid, ShmemVariableCache->nextXid);
            xid = ShmemVariableCache->nextXid;
        }
===============
Just fix the ELSE branch by replacing it to a elog(ERROR, ...) ?
===============code segment2:
            if (MyPgXact->vacuumFlags & PROC_IN_VACUUM)
                next_xid = (TransactionId) BeginTranAutovacuumGTM();
            else
                next_xid = (TransactionId) BeginTranGTM(timestamp);
===============
Or fix the BeginTranAutovacuumGTM() and BeginTranGTM(timestamp) to ensure which return a valid xid? 
Thanks
Julian
From: ko...@in...
To: jul...@ou...
CC: pos...@li...
Subject: Re: [Postgres-xc-bugs] A stack trace for your investigating.
Date: Tue, 22 Jul 2014 02:09:55 +0000
There’s still some code which falls to local XID.   Initial XC code included such a fallback which should not happen and there must still be this type of code somewhere.
Thank you.
---
Koichi Suzuki
2014/07/21 13:08、ZhangJulian <jul...@ou...> のメル:
Hi All,
It is hard to reproduce, just paste it here for your information.
A autovacuum process got a invalid xid, then it fell back to a local xid. But the xid can not get the related Snapshot from GTM.
It looks a design error, but I know little about the local transaction design. Is there any document talking about it?
===On the datanode:
(gdb) bt
#0  0x000000340b232925 in raise () from /lib64/libc.so.6
#1  0x000000340b234105 in abort () from /lib64/libc.so.6
#2  0x0000000000870fef in errfinish (dummy=0) at elog.c:566
#3  0x000000000087322f in elog_finish (elevel=22, fmt=0x8def08 "cannot abort transaction %u, it was already committed") at elog.c:1334
#4  0x00000000004cd162 in RecordTransactionAbort (isSubXact=0 '\000') at xact.c:1686
#5  0x00000000004ce48f in AbortTransaction () at xact.c:2998
#6  0x00000000004d00b8 in AbortOutOfAnyTransaction () at xact.c:4663
#7  0x0000000000883801 in ShutdownPostgres (code=0, arg=0) at postinit.c:1033
#8  0x0000000000738990 in shmem_exit (code=0) at ipc.c:221
#9  0x0000000000738892 in proc_exit_prepare (code=0) at ipc.c:181
#10 0x00000000007387f9 in proc_exit (code=0) at ipc.c:96
#11 0x00000000006f26f2 in AutoVacWorkerMain (argc=0, argv=0x0) at autovacuum.c:1566
#12 0x00000000006f258b in StartAutoVacWorker () at autovacuum.c:1464
#13 0x00000000007034f3 in StartAutovacuumWorker () at postmaster.c:5317
#14 0x0000000000702d8d in sigusr1_handler (postgres_signal_arg=10) at postmaster.c:4963
#15 <signal handler called>
#16 0x000000340b2e15c3 in __select_nocancel () from /lib64/libc.so.6
#17 0x00000000006fea75 in ServerLoop () at postmaster.c:1662
#18 0x00000000006fe411 in PostmasterMain (argc=5, argv=0x2a4b790) at postmaster.c:1369
#19 0x000000000066326d in main (argc=5, argv=0x2a4b790) at main.c:206
====Datanode Log:
postgresql-2014-07-18_105851.log:14716 pgxc 2014-07-18 11:42:39 CSTLOG:  statement: select autoanalyze_count from pg_stat_user_tables where schemaname = 'public' and relname = 'imei_historyseristatus'
postgresql-2014-07-18_105851.log:14716  2014-07-18 13:09:22 CSTLOG:  Falling back to local Xid. Was = 0, now is = 307616
postgresql-2014-07-18_105851.log:14716  2014-07-18 13:09:23 CSTERROR:  GTM error, could not obtain snapshot
postgresql-2014-07-18_105851.log:14716  2014-07-18 13:09:23 CSTPANIC:  cannot abort transaction 307616, it was already committed
postgresql-2014-07-18_105851.log:27533  2014-07-18 13:10:35 CSTLOG:  server process (PID 14716) was terminated by signal 6: Aborted
====GTM Log:
1:140085928392448:2014-07-18 13:05:19.803 CST -LOG:  Saving transaction restoration info, backed-up gxid: 308842
LOCATION:  GTM_WriteRestorePointXid, gtm_txn.c:2649
1:140085928392448:2014-07-18 13:09:23.001 CST -WARNING:  No transaction handle for gxid: 307616
LOCATION:  GTM_GXIDToHandle, gtm_txn.c:163
1:140085928392448:2014-07-18 13:09:23.020 CST -WARNING:  Invalid transaction handle: -1
LOCATION:  GTM_HandleToTransactionInfo, gtm_txn.c:213
1:140085928392448:2014-07-18 13:09:23.068 CST -ERROR:  Failed to get a snapshot
LOCATION:  ProcessGetSnapshotCommandMulti, gtm_snap.c:420
1:140085928392448:2014-07-18 13:18:44.968 CST -LOG:  Saving transaction restoration info, backed-up gxid: 310846
Thanks
Julian
------------------------------------------------------------------------------
Want fast and easy access to all the code in your enterprise? Index and
search up to 200,000 lines of code with a free copy of Black Duck
Code Sight - the same software that powers the world's largest code
search on Ohloh, the Black Duck Open Hub! Try it now.
http://p.sf.net/sfu/bds_______________________________________________
Postgres-xc-bugs mailing list
Pos...@li...
https://lists.sourceforge.net/lists/listinfo/postgres-xc-bugs
 		 	   		   |