From: Koichi S. <koi...@gm...> - 2014-02-24 05:26:59
|
Sorry I didn't respond for a while. I took a look at your configuration and anything special. Yes, between coordinator and datamode can stay open. It is owned by the pooler process, which manages coordinator and datanode connection. Even though a coordinator backend does not need a connection to a datanode any more, the pooler can keep it opened for subsequent use, to save connection overhead. Regards; --- Koichi Suzuki 2014-02-16 17:19 GMT+09:00 Rishi Ramraj <the...@gm...>: > FYI: I've been running some more tests on the REL1_2_STABLE branch to see if > I can find more information on what's wrong with my system. To keep my setup > simple, I have reverted to using initdb and initgtm to create the cluster > and am running only the gtm, coordinator and datanode. They are all hosted > on the same machine and communicate using TCP sockets on localhost. > > TL;DR there is a stack trace at the end of the email. > > I've added some more debug logging to src/backend/storage/ipc/procarray.c > 436: > > elog(LOG, "failed to find proc %p in ProcArray", proc); > + elog(LOG, "pid %d", proc->pid); > + elog(LOG, "pgprocno %d", proc->pgprocno); > > The logs only seem to start happening after I connect using psql. Here's > what they look like on the coordinator without any psql commands issued: > > SID 5300263f.163d LOG: 00000: database system was shut down at 2014-02-15 > 19:13:59 EST > SID 5300263f.163d LOCATION: StartupXLOG, xlog.c:4959 > SID 5300263f.163b LOG: 00000: database system is ready to accept > connections > SID 5300263f.163b LOCATION: reaper, postmaster.c:2768 > SID 5300263f.1642 LOG: 00000: autovacuum launcher started > SID 5300263f.1642 LOCATION: AutoVacLauncherMain, autovacuum.c:417 > SID 5300276c.16c3 LOG: 00000: failed to find proc 0x7f777d08ab80 in > ProcArray > SID 5300276c.16c3 LOCATION: ProcArrayRemove, procarray.c:436 > SID 5300276c.16c3 LOG: 00000: pid 5827 > SID 5300276c.16c3 LOCATION: ProcArrayRemove, procarray.c:437 > SID 5300276c.16c3 LOG: 00000: pgprocno 102 > SID 5300276c.16c3 LOCATION: ProcArrayRemove, procarray.c:438 > SID 530027a8.16f9 LOG: 00000: failed to find proc 0x7f777d08ab80 in > ProcArray > SID 530027a8.16f9 LOCATION: ProcArrayRemove, procarray.c:436 > SID 530027a8.16f9 LOG: 00000: pid 5881 > SID 530027a8.16f9 LOCATION: ProcArrayRemove, procarray.c:437 > SID 530027a8.16f9 LOG: 00000: pgprocno 102 > SID 530027a8.16f9 LOCATION: ProcArrayRemove, procarray.c:438 > > Here is what the logs in the coordinator look like after I issued drop > database test; > > SID 53002923.1764 LOG: 00000: statement: drop database test; > SID 53002923.1764 LOCATION: exec_simple_query, postgres.c:966 > SID 53002923.1764 LOG: 00000: failed to find proc 0x7f777d08d980 in > ProcArray > SID 53002923.1764 LOCATION: ProcArrayRemove, procarray.c:436 > SID 53002923.1764 STATEMENT: drop database test; > SID 53002923.1764 LOG: 00000: pid 0 > SID 53002923.1764 LOCATION: ProcArrayRemove, procarray.c:437 > SID 53002923.1764 STATEMENT: drop database test; > SID 53002923.1764 LOG: 00000: pgprocno 118 > SID 53002923.1764 LOCATION: ProcArrayRemove, procarray.c:438 > SID 53002923.1764 STATEMENT: drop database test; > SID 5300294c.176a LOG: 00000: failed to find proc 0x7f777d08ab80 in > ProcArray > SID 5300294c.176a LOCATION: ProcArrayRemove, procarray.c:436 > SID 5300294c.176a LOG: 00000: pid 5994 > SID 5300294c.176a LOCATION: ProcArrayRemove, procarray.c:437 > SID 5300294c.176a LOG: 00000: pgprocno 102 > SID 5300294c.176a LOCATION: ProcArrayRemove, procarray.c:438 > SID 53002988.1784 LOG: 00000: failed to find proc 0x7f777d08ab80 in > ProcArray > SID 53002988.1784 LOCATION: ProcArrayRemove, procarray.c:436 > SID 53002988.1784 LOG: 00000: pid 6020 > SID 53002988.1784 LOCATION: ProcArrayRemove, procarray.c:437 > SID 53002988.1784 LOG: 00000: pgprocno 102 > SID 53002988.1784 LOCATION: ProcArrayRemove, procarray.c:438 > SID 530029c4.1791 LOG: 00000: failed to find proc 0x7f777d08ab80 in > ProcArray > SID 530029c4.1791 LOCATION: ProcArrayRemove, procarray.c:436 > SID 530029c4.1791 LOG: 00000: pid 6033 > SID 530029c4.1791 LOCATION: ProcArrayRemove, procarray.c:437 > SID 530029c4.1791 LOG: 00000: pgprocno 102 > SID 530029c4.1791 LOCATION: ProcArrayRemove, procarray.c:438 > SID 53002a00.17c6 LOG: 00000: failed to find proc 0x7f777d08ab80 in > ProcArray > SID 53002a00.17c6 LOCATION: ProcArrayRemove, procarray.c:436 > SID 53002a00.17c6 LOG: 00000: pid 6086 > SID 53002a00.17c6 LOCATION: ProcArrayRemove, procarray.c:437 > SID 53002a00.17c6 LOG: 00000: pgprocno 102 > SID 53002a00.17c6 LOCATION: ProcArrayRemove, procarray.c:438 > > At this point, these logs start appearing in the datanode's log: > > SID 53002927.1766 LOG: 00000: failed to find proc 0x7f46c031e980 in > ProcArray > SID 53002927.1766 LOCATION: ProcArrayRemove, procarray.c:436 > SID 53002927.1766 STATEMENT: COMMIT PREPARED 'T10226' > SID 53002927.1766 LOG: 00000: pid 0 > SID 53002927.1766 LOCATION: ProcArrayRemove, procarray.c:437 > SID 53002927.1766 STATEMENT: COMMIT PREPARED 'T10226' > SID 53002927.1766 LOG: 00000: pgprocno 118 > SID 53002927.1766 LOCATION: ProcArrayRemove, procarray.c:438 > SID 53002927.1766 STATEMENT: COMMIT PREPARED 'T10226' > SID 5300293d.1767 LOG: 00000: failed to find proc 0x7f46c031bb80 in > ProcArray > SID 5300293d.1767 LOCATION: ProcArrayRemove, procarray.c:436 > SID 5300293d.1767 LOG: 00000: pid 5991 > SID 5300293d.1767 LOCATION: ProcArrayRemove, procarray.c:437 > SID 5300293d.1767 LOG: 00000: pgprocno 102 > SID 5300293d.1767 LOCATION: ProcArrayRemove, procarray.c:438 > SID 53002979.1781 LOG: 00000: failed to find proc 0x7f46c031bb80 in > ProcArray > SID 53002979.1781 LOCATION: ProcArrayRemove, procarray.c:436 > SID 53002979.1781 LOG: 00000: pid 6017 > SID 53002979.1781 LOCATION: ProcArrayRemove, procarray.c:437 > SID 53002979.1781 LOG: 00000: pgprocno 102 > SID 53002979.1781 LOCATION: ProcArrayRemove, procarray.c:438 > SID 530029b5.178e LOG: 00000: failed to find proc 0x7f46c031bb80 in > ProcArray > ... > > When I disconnect psql, the logs stop in the coordinator but continue in the > datanode (there's an open connection between the coordinator and the > datanode). After I set autovacuum = off, the logs with pgprocno 102 > disappeared in both the datanode and coordinator. > > I originally set out to get stack traces for both proc 118 and 102. > Unfortunately, it seems that the autovacuum launcher spawns multiple > processes making it difficult to intercept the process with gdb. I was able > to get a stack trace for pgprocno 118 from the open connection between the > datanode and coordinator: > > #0 ProcArrayRemove (proc=proc@entry=0x7ffcb25f9980, > latestXid=latestXid@entry=10437) at procarray.c:436 > #1 0x00000000004b69ed in FinishPreparedTransaction (gid=<optimized out>, > isCommit=<optimized out>) at twophase.c:1368 > #2 0x0000000000674551 in standard_ProcessUtility (parsetree=0x29d4110, > queryString=0x29d3730 "COMMIT PREPARED 'T10437'", > context=PROCESS_UTILITY_TOPLEVEL, params=0x0, > dest=<optimized out>, sentToRemote=<optimized out>, > completionTag=0x7fff2c308910 "") at utility.c:574 > #3 0x0000000000670d02 in PortalRunUtility (portal=portal@entry=0x29d99c0, > utilityStmt=utilityStmt@entry=0x29d4110, > isTopLevel=isTopLevel@entry=1 '\001', dest=dest@entry=0x29d4450, > completionTag=completionTag@entry=0x7fff2c308910 "") at pquery.c:1285 > #4 0x00000000006718cd in PortalRunMulti (portal=portal@entry=0x29d99c0, > isTopLevel=isTopLevel@entry=1 '\001', > dest=dest@entry=0x29d4450, altdest=altdest@entry=0x29d4450, > completionTag=completionTag@entry=0x7fff2c308910 "") > at pquery.c:1432 > #5 0x00000000006723e9 in PortalRun (portal=portal@entry=0x29d99c0, > count=count@entry=9223372036854775807, > isTopLevel=isTopLevel@entry=1 '\001', dest=dest@entry=0x29d4450, > altdest=altdest@entry=0x29d4450, > completionTag=completionTag@entry=0x7fff2c308910 "") at pquery.c:882 > #6 0x00000000006702d2 in exec_simple_query (query_string=0x29d3730 "COMMIT > PREPARED 'T10437'") at postgres.c:1140 > #7 PostgresMain (argc=<optimized out>, argv=argv@entry=0x29bb360, > dbname=0x29bb288 "postgres", > username=<optimized out>) at postgres.c:4251 > #8 0x00000000004621c6 in BackendRun (port=0x29dd9a0) at postmaster.c:4205 > #9 BackendStartup (port=0x29dd9a0) at postmaster.c:3894 > #10 ServerLoop () at postmaster.c:1705 > #11 0x000000000062f768 in PostmasterMain (argc=argc@entry=4, > argv=argv@entry=0x29b9340) at postmaster.c:1374 > #12 0x0000000000462b47 in main (argc=4, argv=0x29b9340) at main.c:196 > > I'm not familiar with the codebase, so I'm not entirely sure what's > happening. As far as I can tell, FinishPreparedTransaction is trying to end > a two phase commit, which it cannot find. I did a select * from > pg_prepared_xacts on both the coordinator and the datanode, but both tables > were empty. The commits seem to complete successfully despite the log. I'll > keep digging to see what I can find. > > > On Sat, Feb 15, 2014 at 2:44 AM, Rishi Ramraj <the...@gm...> > wrote: >> >> I just recompiled XC using the REL1_2_STABLE branch. I used pgxc_ctl to >> configure and create the cluster. The physical configuration is still the >> same, except now there's a GTM proxy on the machine as well. >> >> The database seems to be functioning correctly, but I'm still getting the >> ProcArray error. Here's the log from the coordinator: >> >> 2014-02-15 02:26:28 EST SID 52ff16a4.3cb0 XID 0LOG: 00000: database >> system was shut down at 2014-02-15 02:23:33 EST >> 2014-02-15 02:26:28 EST SID 52ff16a4.3cb0 XID 0LOCATION: StartupXLOG, >> xlog.c:4959 >> 2014-02-15 02:26:28 EST SID 52ff16a4.3ca5 XID 0LOG: 00000: database >> system is ready to accept connections >> 2014-02-15 02:26:28 EST SID 52ff16a4.3ca5 XID 0LOCATION: reaper, >> postmaster.c:2768 >> 2014-02-15 02:26:28 EST SID 52ff16a4.3cb5 XID 0LOG: 00000: autovacuum >> launcher started >> 2014-02-15 02:26:28 EST SID 52ff16a4.3cb5 XID 0LOCATION: >> AutoVacLauncherMain, autovacuum.c:417 >> 2014-02-15 02:28:53 EST SID 52ff1727.3cd2 XID 0ERROR: 42601: syntax error >> at or near "asdf" at character 1 >> 2014-02-15 02:28:53 EST SID 52ff1727.3cd2 XID 0LOCATION: scanner_yyerror, >> scan.l:1044 >> 2014-02-15 02:28:53 EST SID 52ff1727.3cd2 XID 0STATEMENT: asdf >> ; >> 2014-02-15 02:29:30 EST SID 52ff1759.3cd3 XID 0LOG: 00000: failed to find >> proc 0x7f3c12ff5b80 in ProcArray >> 2014-02-15 02:29:30 EST SID 52ff1759.3cd3 XID 0LOCATION: ProcArrayRemove, >> procarray.c:436 >> 2014-02-15 02:30:30 EST SID 52ff1795.3cd5 XID 0LOG: 00000: failed to find >> proc 0x7f3c12ff5b80 in ProcArray >> 2014-02-15 02:30:30 EST SID 52ff1795.3cd5 XID 0LOCATION: ProcArrayRemove, >> procarray.c:436 >> 2014-02-15 02:31:01 EST SID 52ff1727.3cd2 XID 0LOG: 00000: statement: >> create node data with (type='datanode', host='localhost', port=20008); >> 2014-02-15 02:31:01 EST SID 52ff1727.3cd2 XID 0LOCATION: >> exec_simple_query, postgres.c:966 >> 2014-02-15 02:31:08 EST SID 52ff1727.3cd2 XID 0ERROR: 42601: syntax error >> at or near "exit" at character 1 >> 2014-02-15 02:31:08 EST SID 52ff1727.3cd2 XID 0LOCATION: scanner_yyerror, >> scan.l:1044 >> 2014-02-15 02:31:08 EST SID 52ff1727.3cd2 XID 0STATEMENT: exit >> ; >> 2014-02-15 02:32:29 EST SID 52ff1808.3d1e XID 0LOG: 00000: statement: >> create database test; >> 2014-02-15 02:32:29 EST SID 52ff1808.3d1e XID 0LOCATION: >> exec_simple_query, postgres.c:966 >> 2014-02-15 02:32:29 EST SID 52ff180d.3d20 XID 0LOG: 00000: failed to find >> proc 0x7f3c12ff5b80 in ProcArray >> 2014-02-15 02:32:29 EST SID 52ff180d.3d20 XID 0LOCATION: ProcArrayRemove, >> procarray.c:436 >> 2014-02-15 02:32:31 EST SID 52ff1808.3d1e XID 2054LOG: 00000: failed to >> find proc 0x7f3c12ff8980 in ProcArray >> 2014-02-15 02:32:31 EST SID 52ff1808.3d1e XID 2054LOCATION: >> ProcArrayRemove, procarray.c:436 >> 2014-02-15 02:32:31 EST SID 52ff1808.3d1e XID 2054STATEMENT: create >> database test; >> 2014-02-15 02:33:30 EST SID 52ff1849.3e12 XID 0LOG: 00000: failed to find >> proc 0x7f3c12ff5b80 in ProcArray >> 2014-02-15 02:33:30 EST SID 52ff1849.3e12 XID 0LOCATION: ProcArrayRemove, >> procarray.c:436 >> 2014-02-15 02:34:30 EST SID 52ff1885.3e2e XID 0LOG: 00000: failed to find >> proc 0x7f3c12ff5b80 in ProcArray >> 2014-02-15 02:34:30 EST SID 52ff1885.3e2e XID 0LOCATION: ProcArrayRemove, >> procarray.c:436 >> 2014-02-15 02:35:30 EST SID 52ff18c1.3e30 XID 0LOG: 00000: failed to find >> proc 0x7f3c12ff5b80 in ProcArray >> 2014-02-15 02:35:30 EST SID 52ff18c1.3e30 XID 0LOCATION: ProcArrayRemove, >> procarray.c:436 >> >> Here's the log from the datanode: >> >> LOG: database system was shut down at 2014-02-15 02:23:35 EST >> LOG: database system is ready to accept connections >> LOG: autovacuum launcher started >> ERROR: PGXC Node data: object already defined >> STATEMENT: create node data with (type='coordinator', host='localhost', >> port=20004); >> LOG: failed to find proc 0x7f5a2cc55b80 in ProcArray >> LOG: failed to find proc 0x7f5a2cc58980 in ProcArray >> STATEMENT: COMMIT PREPARED 'T2051' >> LOG: failed to find proc 0x7f5a2cc55b80 in ProcArray >> LOG: failed to find proc 0x7f5a2cc55b80 in ProcArray >> LOG: failed to find proc 0x7f5a2cc55b80 in ProcArray >> LOG: failed to find proc 0x7f5a2cc55b80 in ProcArray >> LOG: failed to find proc 0x7f5a2cc55b80 in ProcArray >> >> I've also attached the pgxcConf file I used to generate the cluster. I'm >> going to try to run it under gdb and get a stack trace. Let me know if there >> are any other diagnostics you would like me to run. That being said, the >> database in its current condition is fine for my testing :) >> >> >> On Thu, Feb 13, 2014 at 11:14 PM, Rishi Ramraj >> <the...@gm...> wrote: >>> >>> Will give it a shot. Thanks for the help! >>> >>> >>> On Thu, Feb 13, 2014 at 11:05 PM, Koichi Suzuki <koi...@gm...> >>> wrote: >>>> >>>> I'm afraid something is wrong inside but sorry I couldn't locate what >>>> it is. Series of error message is supposed to be from COMMIT, ABORT, >>>> COMMIT PREPARED or ABORT PREPARED. >>>> >>>> I'd advise to recreate the cluster with pgxc_ctl. I hope this is >>>> better to track what is going on. You will find materials for this >>>> from PGXC wiki page. Try www.postgres-xc.org, >>>> >>>> Regards; >>>> --- >>>> Koichi Suzuki >>>> >>>> >>>> 2014-02-14 12:57 GMT+09:00 Rishi Ramraj <the...@gm...>: >>>> > Apparently I left the datanode running, and after a while the logs >>>> > were >>>> > filled with the following: >>>> > >>>> > FATAL: sorry, too many clients already >>>> > >>>> > Here are the logs with the increased log levels. First on the >>>> > coordinator: >>>> > >>>> > 2014-02-13 22:24:31 EST SID 52fd8c6f.386 XID 0LOG: database system >>>> > was shut >>>> > down at 2014-02-13 22:07:11 EST >>>> > 2014-02-13 22:24:31 EST SID 52fd8c6e.382 XID 0LOG: database system is >>>> > ready >>>> > to accept connections >>>> > 2014-02-13 22:24:31 EST SID 52fd8c6f.38b XID 0LOG: autovacuum >>>> > launcher >>>> > started >>>> > 2014-02-13 22:45:18 EST SID 52fd914a.59f XID 0LOG: statement: drop >>>> > database >>>> > test; >>>> > 2014-02-13 22:45:18 EST SID 52fd914a.59f XID 11571ERROR: database >>>> > "test" >>>> > does not exist >>>> > 2014-02-13 22:45:18 EST SID 52fd914a.59f XID 11571STATEMENT: drop >>>> > database >>>> > test; >>>> > 2014-02-13 22:45:32 EST SID 52fd915c.5a2 XID 0LOG: failed to find >>>> > proc >>>> > 0x7f0297db9c60 in ProcArray >>>> > 2014-02-13 22:45:33 EST SID 52fd914a.59f XID 0LOG: statement: create >>>> > database test; >>>> > 2014-02-13 22:45:35 EST SID 52fd914a.59f XID 11575LOG: failed to find >>>> > proc >>>> > 0x7f0297dbca60 in ProcArray >>>> > 2014-02-13 22:45:35 EST SID 52fd914a.59f XID 11575STATEMENT: create >>>> > database test; >>>> > 2014-02-13 22:46:32 EST SID 52fd9198.5b1 XID 0LOG: failed to find >>>> > proc >>>> > 0x7f0297db9c60 in ProcArray >>>> > 2014-02-13 22:47:02 EST SID 52fd91b6.5b6 XID 0LOG: failed to find >>>> > proc >>>> > 0x7f0297db9c60 in ProcArray >>>> > 2014-02-13 22:47:32 EST SID 52fd91d4.5bb XID 0LOG: failed to find >>>> > proc >>>> > 0x7f0297db9c60 in ProcArray >>>> > 2014-02-13 22:48:02 EST SID 52fd91f2.5be XID 0LOG: failed to find >>>> > proc >>>> > 0x7f0297db9c60 in ProcArray >>>> > 2014-02-13 22:48:32 EST SID 52fd9210.5c7 XID 0LOG: failed to find >>>> > proc >>>> > 0x7f0297db9c60 in ProcArray >>>> > 2014-02-13 22:49:02 EST SID 52fd922e.5d1 XID 0LOG: failed to find >>>> > proc >>>> > 0x7f0297db9c60 in ProcArray >>>> > 2014-02-13 22:49:32 EST SID 52fd924c.5d6 XID 0LOG: failed to find >>>> > proc >>>> > 0x7f0297db9c60 in ProcArray >>>> > 2014-02-13 22:50:02 EST SID 52fd926a.5da XID 0LOG: failed to find >>>> > proc >>>> > 0x7f0297db9c60 in ProcArray >>>> > 2014-02-13 22:50:32 EST SID 52fd9288.5e0 XID 0LOG: failed to find >>>> > proc >>>> > 0x7f0297db9c60 in ProcArray >>>> > 2014-02-13 22:51:02 EST SID 52fd92a6.5e3 XID 0LOG: failed to find >>>> > proc >>>> > 0x7f0297db9c60 in ProcArray >>>> > 2014-02-13 22:51:32 EST SID 52fd92c4.5ee XID 0LOG: failed to find >>>> > proc >>>> > 0x7f0297db9c60 in ProcArray >>>> > 2014-02-13 22:52:02 EST SID 52fd92e2.5f1 XID 0LOG: failed to find >>>> > proc >>>> > 0x7f0297db9c60 in ProcArray >>>> > >>>> > Next, on the datanode: >>>> > >>>> > 2014-02-13 22:24:21 EST SID 52fd8c65.33a XID 0LOG: database system >>>> > was shut >>>> > down at 2014-02-13 22:23:35 EST >>>> > 2014-02-13 22:24:21 EST SID 52fd8c64.336 XID 0LOG: database system is >>>> > ready >>>> > to accept connections >>>> > 2014-02-13 22:24:21 EST SID 52fd8c65.33e XID 0LOG: autovacuum >>>> > launcher >>>> > started >>>> > 2014-02-13 22:45:34 EST SID 52fd915e.5a4 XID 0LOG: statement: START >>>> > TRANSACTION ISOLATION LEVEL read committed READ WRITE >>>> > 2014-02-13 22:45:34 EST SID 52fd915e.5a4 XID 0LOG: statement: create >>>> > database test; >>>> > 2014-02-13 22:45:35 EST SID 52fd915e.5a4 XID 11574LOG: statement: >>>> > PREPARE >>>> > TRANSACTION 'T11574' >>>> > 2014-02-13 22:45:35 EST SID 52fd915e.5a4 XID 0LOG: statement: COMMIT >>>> > PREPARED 'T11574' >>>> > 2014-02-13 22:45:35 EST SID 52fd915e.5a4 XID 11575LOG: failed to find >>>> > proc >>>> > 0x7f1c3de46a60 in ProcArray >>>> > 2014-02-13 22:45:35 EST SID 52fd915e.5a4 XID 11575STATEMENT: COMMIT >>>> > PREPARED 'T11574' >>>> > 2014-02-13 22:46:22 EST SID 52fd918e.5ae XID 0LOG: failed to find >>>> > proc >>>> > 0x7f1c3de43c60 in ProcArray >>>> > 2014-02-13 22:47:22 EST SID 52fd91ca.5b8 XID 0LOG: failed to find >>>> > proc >>>> > 0x7f1c3de43c60 in ProcArray >>>> > 2014-02-13 22:48:22 EST SID 52fd9206.5c2 XID 0LOG: failed to find >>>> > proc >>>> > 0x7f1c3de43c60 in ProcArray >>>> > 2014-02-13 22:49:22 EST SID 52fd9242.5d3 XID 0LOG: failed to find >>>> > proc >>>> > 0x7f1c3de43c60 in ProcArray >>>> > 2014-02-13 22:50:22 EST SID 52fd927e.5dc XID 0LOG: failed to find >>>> > proc >>>> > 0x7f1c3de43c60 in ProcArray >>>> > 2014-02-13 22:51:22 EST SID 52fd92ba.5e5 XID 0LOG: failed to find >>>> > proc >>>> > 0x7f1c3de43c60 in ProcArray >>>> > 2014-02-13 22:52:22 EST SID 52fd92f6.5f5 XID 0LOG: failed to find >>>> > proc >>>> > 0x7f1c3de43c60 in ProcArray >>>> > 2014-02-13 22:53:22 EST SID 52fd9332.5fd XID 0LOG: failed to find >>>> > proc >>>> > 0x7f1c3de43c60 in ProcArray >>>> > 2014-02-13 22:54:22 EST SID 52fd936e.60d XID 0LOG: failed to find >>>> > proc >>>> > 0x7f1c3de43c60 in ProcArray >>>> > >>>> > Would you like me to keep experimenting on this installation, or >>>> > should I >>>> > recreate the cluster? >>>> > >>>> > >>>> > On Thu, Feb 13, 2014 at 9:49 PM, Koichi Suzuki <koi...@gm...> >>>> > wrote: >>>> >> >>>> >> I don't see anything strange in the configuration file. >>>> >> >>>> >> I found regression test sets up gtm port explicitly (value is the >>>> >> default, 6666, though). Could you try to configure your cluster >>>> >> with pgxc_ctl, which is far simpler and you have much less chance to >>>> >> encounter errors? This utility comes with built-in command >>>> >> sequences >>>> >> needed to configure and operate your cluster. >>>> >> >>>> >> Or, could you re-run the same with different log_min_message level? >>>> >> >>>> >> Regards; >>>> >> --- >>>> >> Koichi Suzuki >>>> >> >>>> >> >>>> >> 2014-02-14 11:16 GMT+09:00 Rishi Ramraj >>>> >> <the...@gm...>: >>>> >> > Missed the coordinator.conf file. Find attached. >>>> >> > >>>> >> > >>>> >> > On Thu, Feb 13, 2014 at 9:15 PM, Rishi Ramraj >>>> >> > <the...@gm...> >>>> >> > wrote: >>>> >> >> >>>> >> >> Find all of the files attached. I have the GTM, coordinator and >>>> >> >> datanode >>>> >> >> services all hosted on the same Linux Mint machine for testing. >>>> >> >> They >>>> >> >> communicate using the loopback interface. >>>> >> >> >>>> >> >> To initialize the cluster, I ran the following: >>>> >> >> >>>> >> >> /var/lib/postgres-xc/gtm $ initgtm -Z >>>> >> >> /var/lib/postgres-xc/data $ initdb -D . --nodename data >>>> >> >> /var/lib/postgres-xc/coord $ initdb -D . --nodename coord >>>> >> >> >>>> >> >> To start the cluster, I use upstart: >>>> >> >> >>>> >> >> /usr/local/pgsql/bin/gtm -D /var/lib/postgres-xc/gtm >>>> >> >> /usr/local/pgsql/bin/postgres --datanode -D >>>> >> >> /var/lib/postgres-xc/data >>>> >> >> /usr/local/pgsql/bin/postgres --coordinator -D >>>> >> >> /var/lib/postgres-xc/coord >>>> >> >> >>>> >> >> I will increase the log level and try to reproduce the errors. >>>> >> >> >>>> >> >> >>>> >> >> On Thu, Feb 13, 2014 at 8:39 PM, 鈴木 幸市 <ko...@in...> >>>> >> >> wrote: >>>> >> >>> >>>> >> >>> Hello; >>>> >> >>> >>>> >> >>> You need to set log_statement GUC to appropriate value. Default >>>> >> >>> is >>>> >> >>> “none”. “all” prints all statements accepted. Also, it will >>>> >> >>> be a >>>> >> >>> good >>>> >> >>> idea to include session ID in the log_line_prefix GUC. Your >>>> >> >>> postgresql.conf has some information how to set this. I tested >>>> >> >>> REL1_1_STABLE with four coordinators, four datanodes and four >>>> >> >>> gtm_proxy and >>>> >> >>> did not have any additional log for any coordinators/datanodes. >>>> >> >>> It >>>> >> >>> will >>>> >> >>> be helpful what statements you issued. >>>> >> >>> >>>> >> >>> Here’s what I got in REL1_1_STABLE (I configured the cluster >>>> >> >>> using >>>> >> >>> pgxc_ctl). >>>> >> >>> >>>> >> >>> Coordinator: >>>> >> >>> LOG: database system was shut down at 2014-02-14 10:2sd >>>> >> >>> LOG: autovacuum launcher started >>>> >> >>> >>>> >> >>> Datanode: >>>> >> >>> LOG: database system was shut down at 2014-02-14 10:21:08 JST >>>> >> >>> LOG: database system is ready to accept connections >>>> >> >>> LOG: autovacuum launcher started >>>> >> >>> >>>> >> >>> I’d like to have your configuration, each postgresql.conf file >>>> >> >>> and >>>> >> >>> what >>>> >> >>> you’ve done to initialize, start and run your application. >>>> >> >>> >>>> >> >>> Best; >>>> >> >>> --- >>>> >> >>> Koichi Suzuki >>>> >> >>> >>>> >> >>> 2014/02/14 0:38、Rishi Ramraj <the...@gm...> のメール: >>>> >> >>> >>>> >> >>> I just tried with REL1_1_STABLE. Here are my logs from the >>>> >> >>> coordinator: >>>> >> >>> >>>> >> >>> LOG: database system was shut down at 2014-02-13 10:14:16 EST >>>> >> >>> LOG: database system is ready to accept connections >>>> >> >>> LOG: autovacuum launcher started >>>> >> >>> ERROR: syntax error at or near "asdf" at character 1 >>>> >> >>> STATEMENT: asdf; >>>> >> >>> LOG: failed to find proc 0x7f7d4a79ea60 in ProcArray >>>> >> >>> STATEMENT: create database test; >>>> >> >>> LOG: failed to find proc 0x7f7d4a79bc60 in ProcArray >>>> >> >>> ERROR: cannot drop the currently open database >>>> >> >>> STATEMENT: drop database test; >>>> >> >>> LOG: failed to find proc 0x7f7d4a79ea60 in ProcArray >>>> >> >>> STATEMENT: drop database test; >>>> >> >>> ERROR: syntax error at or near "blah" at character 1 >>>> >> >>> STATEMENT: blah blah blah; >>>> >> >>> LOG: failed to find proc 0x7f7d4a79bc60 in ProcArray >>>> >> >>> LOG: failed to find proc 0x7f7d4a79bc60 in ProcArray >>>> >> >>> LOG: failed to find proc 0x7f7d4a79bc60 in ProcArray >>>> >> >>> >>>> >> >>> The logs from the datanode: >>>> >> >>> >>>> >> >>> LOG: database system was shut down at 2014-02-13 10:19:33 EST >>>> >> >>> LOG: database system is ready to accept connections >>>> >> >>> LOG: autovacuum launcher started >>>> >> >>> LOG: failed to find proc 0x7f06bee5ea60 in ProcArray >>>> >> >>> STATEMENT: COMMIT PREPARED 'T10014' >>>> >> >>> LOG: failed to find proc 0x7f06bee5bc60 in ProcArray >>>> >> >>> LOG: failed to find proc 0x7f06bee5ea60 in ProcArray >>>> >> >>> STATEMENT: COMMIT PREPARED 'T10023' >>>> >> >>> LOG: failed to find proc 0x7f06bee5bc60 in ProcArray >>>> >> >>> LOG: failed to find proc 0x7f06bee5bc60 in ProcArray >>>> >> >>> LOG: failed to find proc 0x7f06bee5bc60 in ProcArray >>>> >> >>> LOG: failed to find proc 0x7f06bee5bc60 in ProcArray >>>> >> >>> LOG: failed to find proc 0x7f06bee5bc60 in ProcArray >>>> >> >>> LOG: failed to find proc 0x7f06bee5bc60 in ProcArray >>>> >> >>> >>>> >> >>> The datanode seems to continuously produce logs but the >>>> >> >>> coordinator >>>> >> >>> does >>>> >> >>> not. I issued CREATE NODE commands to both services but they >>>> >> >>> don't >>>> >> >>> seem to >>>> >> >>> show up in the log. >>>> >> >>> >>>> >> >>> >>>> >> >>> On Thu, Feb 13, 2014 at 9:49 AM, Rishi Ramraj >>>> >> >>> <the...@gm...> wrote: >>>> >> >>>> >>>> >> >>>> I haven't been issuing commits or aborts every 30 seconds. So >>>> >> >>>> far, >>>> >> >>>> I've >>>> >> >>>> only issued five commands to the cluster using psql, but I have >>>> >> >>>> over >>>> >> >>>> 100 >>>> >> >>>> logs. I will try 1.1 today and let you know. >>>> >> >>>> >>>> >> >>>> Do you use a specific branching methodology, like gitflow? Do >>>> >> >>>> you >>>> >> >>>> have a >>>> >> >>>> bug tracking system? If you need any help with release >>>> >> >>>> engineering, >>>> >> >>>> let me >>>> >> >>>> know; I don't mind volunteering some time. >>>> >> >>>> >>>> >> >>>> >>>> >> >>>> On Thu, Feb 13, 2014 at 1:41 AM, 鈴木 幸市 >>>> >> >>>> <ko...@in...> >>>> >> >>>> wrote: >>>> >> >>>>> >>>> >> >>>>> GTM message is just a report. When GTM starts, it tries to >>>> >> >>>>> read if >>>> >> >>>>> there is any slave connected in previous run and it didn’t find >>>> >> >>>>> one. >>>> >> >>>>> >>>> >> >>>>> The first message looks not harmful but could be some potential >>>> >> >>>>> issues. >>>> >> >>>>> Please let me look into it. The chance to have this message >>>> >> >>>>> is at >>>> >> >>>>> the >>>> >> >>>>> initialization of each datanode/coordinator, COMMIT, ABORT, >>>> >> >>>>> COMMIT >>>> >> >>>>> PREPARED >>>> >> >>>>> or ABORT PREPARED. >>>> >> >>>>> >>>> >> >>>>> Do you have a chance to issue them every 30 seconds? >>>> >> >>>>> >>>> >> >>>>> If possible, could you try release 1.1 and see if you have the >>>> >> >>>>> same >>>> >> >>>>> issue (first message)? I think release 1.1 is better because >>>> >> >>>>> master >>>> >> >>>>> is >>>> >> >>>>> anyway development branch. >>>> >> >>>>> >>>> >> >>>>> Best; >>>> >> >>>>> --- >>>> >> >>>>> Koichi Suzuki >>>> >> >>>>> >>>> >> >>>>> 2014/02/13 14:51、Rishi Ramraj <the...@gm...> >>>> >> >>>>> のメール: >>>> >> >>>>> >>>> >> >>>>> > Hello All, >>>> >> >>>>> > >>>> >> >>>>> > I just installed postgres-xc from the git master branch on a >>>> >> >>>>> > test >>>> >> >>>>> > machine. All processes are running on the same box. On both >>>> >> >>>>> > the >>>> >> >>>>> > coordinator >>>> >> >>>>> > and data processes, I'm getting the following logs about >>>> >> >>>>> > every 30 >>>> >> >>>>> > seconds: >>>> >> >>>>> > >>>> >> >>>>> > LOG: failed to find proc 0x7fd9ee703f80 in ProcArray >>>> >> >>>>> > >>>> >> >>>>> > On the GTM process, I'm getting the following logs at about >>>> >> >>>>> > the >>>> >> >>>>> > same >>>> >> >>>>> > frequency: >>>> >> >>>>> > >>>> >> >>>>> > LOG: Any GTM standby node not found in registered node(s). >>>> >> >>>>> > LOCATION: gtm_standby_connect_to_standby_int, >>>> >> >>>>> > gtm_standby.c:381 >>>> >> >>>>> > >>>> >> >>>>> > The cluster seems to be working properly. I was able to >>>> >> >>>>> > create a >>>> >> >>>>> > new >>>> >> >>>>> > database and a table within that database without any >>>> >> >>>>> > problem. I >>>> >> >>>>> > restarted >>>> >> >>>>> > all services and the data was persisted, but the logs >>>> >> >>>>> > persist. Any >>>> >> >>>>> > idea >>>> >> >>>>> > what's causing these logs? >>>> >> >>>>> > >>>> >> >>>>> > Thanks, >>>> >> >>>>> > - Rishi >>>> >> >>>>> > >>>> >> >>>>> > >>>> >> >>>>> > >>>> >> >>>>> > ------------------------------------------------------------------------------ >>>> >> >>>>> > Android apps run on BlackBerry 10 >>>> >> >>>>> > Introducing the new BlackBerry 10.2.1 Runtime for Android >>>> >> >>>>> > apps. >>>> >> >>>>> > Now with support for Jelly Bean, Bluetooth, Mapview and more. >>>> >> >>>>> > Get your Android app in front of a whole new audience. Start >>>> >> >>>>> > now. >>>> >> >>>>> > >>>> >> >>>>> > >>>> >> >>>>> > >>>> >> >>>>> > http://pubads.g.doubleclick.net/gampad/clk?id=124407151&iu=/4140/ostg.clktrk_______________________________________________ >>>> >> >>>>> > Postgres-xc-general mailing list >>>> >> >>>>> > Pos...@li... >>>> >> >>>>> > >>>> >> >>>>> > https://lists.sourceforge.net/lists/listinfo/postgres-xc-general >>>> >> >>>>> >>>> >> >>>> >>>> >> >>>> >>>> >> >>>> >>>> >> >>>> -- >>>> >> >>>> Cheers, >>>> >> >>>> - Rishi >>>> >> >>> >>>> >> >>> >>>> >> >>> >>>> >> >>> >>>> >> >>> -- >>>> >> >>> Cheers, >>>> >> >>> - Rishi >>>> >> >>> >>>> >> >>> >>>> >> >> >>>> >> >> >>>> >> >> >>>> >> >> -- >>>> >> >> Cheers, >>>> >> >> - Rishi >>>> >> > >>>> >> > >>>> >> > >>>> >> > >>>> >> > -- >>>> >> > Cheers, >>>> >> > - Rishi >>>> >> > >>>> >> > >>>> >> > >>>> >> > ------------------------------------------------------------------------------ >>>> >> > Android apps run on BlackBerry 10 >>>> >> > Introducing the new BlackBerry 10.2.1 Runtime for Android apps. >>>> >> > Now with support for Jelly Bean, Bluetooth, Mapview and more. >>>> >> > Get your Android app in front of a whole new audience. Start now. >>>> >> > >>>> >> > >>>> >> > http://pubads.g.doubleclick.net/gampad/clk?id=124407151&iu=/4140/ostg.clktrk >>>> >> > _______________________________________________ >>>> >> > Postgres-xc-general mailing list >>>> >> > Pos...@li... >>>> >> > https://lists.sourceforge.net/lists/listinfo/postgres-xc-general >>>> >> > >>>> > >>>> > >>>> > >>>> > >>>> > -- >>>> > Cheers, >>>> > - Rishi >>> >>> >>> >>> >>> -- >>> Cheers, >>> - Rishi >> >> >> >> >> -- >> Cheers, >> - Rishi > > > > > -- > Cheers, > - Rishi |