From: ZhangJulian <jul...@ou...> - 2014-07-21 04:08:20
|
Hi All, It is hard to reproduce, just paste it here for your information. A autovacuum process got a invalid xid, then it fell back to a local xid. But the xid can not get the related Snapshot from GTM. It looks a design error, but I know little about the local transaction design. Is there any document talking about it? ===On the datanode: (gdb) bt #0 0x000000340b232925 in raise () from /lib64/libc.so.6 #1 0x000000340b234105 in abort () from /lib64/libc.so.6 #2 0x0000000000870fef in errfinish (dummy=0) at elog.c:566 #3 0x000000000087322f in elog_finish (elevel=22, fmt=0x8def08 "cannot abort transaction %u, it was already committed") at elog.c:1334 #4 0x00000000004cd162 in RecordTransactionAbort (isSubXact=0 '\000') at xact.c:1686 #5 0x00000000004ce48f in AbortTransaction () at xact.c:2998 #6 0x00000000004d00b8 in AbortOutOfAnyTransaction () at xact.c:4663 #7 0x0000000000883801 in ShutdownPostgres (code=0, arg=0) at postinit.c:1033 #8 0x0000000000738990 in shmem_exit (code=0) at ipc.c:221 #9 0x0000000000738892 in proc_exit_prepare (code=0) at ipc.c:181 #10 0x00000000007387f9 in proc_exit (code=0) at ipc.c:96 #11 0x00000000006f26f2 in AutoVacWorkerMain (argc=0, argv=0x0) at autovacuum.c:1566 #12 0x00000000006f258b in StartAutoVacWorker () at autovacuum.c:1464 #13 0x00000000007034f3 in StartAutovacuumWorker () at postmaster.c:5317 #14 0x0000000000702d8d in sigusr1_handler (postgres_signal_arg=10) at postmaster.c:4963 #15 <signal handler called> #16 0x000000340b2e15c3 in __select_nocancel () from /lib64/libc.so.6 #17 0x00000000006fea75 in ServerLoop () at postmaster.c:1662 #18 0x00000000006fe411 in PostmasterMain (argc=5, argv=0x2a4b790) at postmaster.c:1369 #19 0x000000000066326d in main (argc=5, argv=0x2a4b790) at main.c:206 ====Datanode Log: postgresql-2014-07-18_105851.log:14716 pgxc 2014-07-18 11:42:39 CSTLOG: statement: select autoanalyze_count from pg_stat_user_tables where schemaname = 'public' and relname = 'imei_historyseristatus' postgresql-2014-07-18_105851.log:14716 2014-07-18 13:09:22 CSTLOG: Falling back to local Xid. Was = 0, now is = 307616 postgresql-2014-07-18_105851.log:14716 2014-07-18 13:09:23 CSTERROR: GTM error, could not obtain snapshot postgresql-2014-07-18_105851.log:14716 2014-07-18 13:09:23 CSTPANIC: cannot abort transaction 307616, it was already committed postgresql-2014-07-18_105851.log:27533 2014-07-18 13:10:35 CSTLOG: server process (PID 14716) was terminated by signal 6: Aborted ====GTM Log: 1:140085928392448:2014-07-18 13:05:19.803 CST -LOG: Saving transaction restoration info, backed-up gxid: 308842 LOCATION: GTM_WriteRestorePointXid, gtm_txn.c:2649 1:140085928392448:2014-07-18 13:09:23.001 CST -WARNING: No transaction handle for gxid: 307616 LOCATION: GTM_GXIDToHandle, gtm_txn.c:163 1:140085928392448:2014-07-18 13:09:23.020 CST -WARNING: Invalid transaction handle: -1 LOCATION: GTM_HandleToTransactionInfo, gtm_txn.c:213 1:140085928392448:2014-07-18 13:09:23.068 CST -ERROR: Failed to get a snapshot LOCATION: ProcessGetSnapshotCommandMulti, gtm_snap.c:420 1:140085928392448:2014-07-18 13:18:44.968 CST -LOG: Saving transaction restoration info, backed-up gxid: 310846 Thanks Julian |