From: Lionel F. <lio...@gm...> - 2011-06-06 09:06:28
|
Hello, I've cut the autovacuum on each node (including coordinator) and the problem persists, even on small tables : Start District Data for 10 Dists @ Mon Jun 06 10:53:50 CEST 2011 ... Elasped Time(ms): 0.018 Writing record 10 of 10 ERROR: Could not commit prepared transaction implicitely End District Load @ Mon Jun 06 10:53:50 CEST 2011 As it lloks like it's not source of the pb, I'll set it back on all node. For logs, on first node there is nothing, but on second and third, the same message appears : ERROR: prepared transaction with identifier "T454" does not exist STATEMENT: COMMIT PREPARED 'T454' I'm reinitializing the cluster for it to start with gxid > 628 and keep you posted of progress (including max_prepared_transactions parameter) Lionel F. 2011/6/2 Mason <ma...@us...> > On Wed, Jun 1, 2011 at 9:09 PM, Michael Paquier > <mic...@gm...> wrote: > > The problem you are facing with the pooler may be related to this bug > that > > has been found recently: > > > https://sourceforge.net/tracker/?func=detail&aid=3310399&group_id=311227&atid=1310232 > > > > It looks that datanode is not able to manage efficiently autovacuum > commit. > > This problem may cause problems in data consistency, making a node to > crash > > in the worst scenario. > > > > This could explain why you cannot begin a transaction correctly on nodes, > > connections to backends being closed by a crash or a consistency problem. > > Can you provide some backtrace or give hints about the problem you have? > > Some tips in node logs perhaps? > > To see if it is autovacuum, Lionel, you could temporarily disable it > and try to reproduce the error. > > Mason > > > > > On Wed, Jun 1, 2011 at 8:12 PM, Lionel Frachon <lio...@gm... > > > > wrote: > >> > >> Hello, > >> > >> I was forced to distribute data by replication and not by hash, as I'm > >> constantly getting "ERROR: Could not commit prepared transaction > >> implicitely" on other tables than Warehouse (w_id), using 10 > >> warehouses (this error appears both on data loading, when using hash, > >> and when performing distributed queries). > >> > >> I used slightly different setup : > >> - 1 GTM-only node > >> - 1 Coordinator-only node > >> - 3 Datanodes > >> > >> Coordinator has 256MB RAM, Datanodes having 768. They did not reach at > >> any moment the full usage of dedicated RAM. > >> > >> However, running benchmark more than a few minutes (2 or 3) drives to > >> the following errors > >> > >> --- Unexpected SQLException caught in NEW-ORDER Txn --- > >> Message: ERROR: Could not begin transaction on data nodes. > >> SQLState: XX000 > >> ErrorCode: 0 > >> > >> Then a bit later > >> --- Unexpected SQLException caught in NEW-ORDER Txn --- > >> > >> Message: ERROR: Failed to get pooled connections > >> SQLState: 53000 > >> ErrorCode: 0 > >> > >> then (and I assume they are linked) > >> --- Unexpected SQLException caught in NEW-ORDER Txn --- > >> Message: ERROR: Could not begin transaction on data nodes. > >> SQLState: XX000 > >> ErrorCode: 0 > >> > >> additionnally, the test end with many > >> --- Unexpected SQLException caught in NEW-ORDER Txn --- > >> Message: This connection has been closed. > >> SQLState: 08003 > >> ErrorCode: 0 > >> > >> I'm using 10 terminals, using 10 warehouses. > >> > >> Any clue for this error, (and for distribution by hash, I understand > >> they're probably linked...) > >> > >> Lionel F. > >> > >> > >> > >> 2011/5/31 Lionel Frachon <lio...@gm...>: > >> > Hi, > >> > > >> > yes, persistent_datanode_connections is now set to off - it may not be > >> > related to the issues I have. > >> > > >> > What amount of memory do you have on your datanodes & coordinator ? > >> > > >> > Here are my settings : > >> > datanode : shared_buffers = 512MB > >> > coordinator=256MB (now, was 96MB) > >> > > >> > I still get for some distributed tables (by hash) > >> > "ERROR: Could not commit prepared transaction implicitely" > >> > > >> > For distribution syntax, yes, I found your webpage talking about > >> > regression tests > >> > > >> >> You also have to know that it is important to set a limit of > >> >> connections on > >> >> datanodes equal to the sum of max connections on all coordinators. > >> >> For example, if your cluster is using 2 coordinator with 20 max > >> >> connections > >> >> each, you may have a maximum of 40 connections to datanodes. > >> > > >> > Ok, tweaking this today and launching the tests again... > >> > > >> > > >> > Lionel F. > >> > > >> > > >> > > >> > 2011/5/31 Michael Paquier <mic...@gm...>: > >> >> > >> >> > >> >> On Mon, May 30, 2011 at 7:34 PM, Lionel Frachon > >> >> <lio...@gm...> > >> >> wrote: > >> >>> > >> >>> Hi again, > >> >>> > >> >>> I turned off connection pooling on coordinator (dunno why it sayed > >> >>> on), raised the shared_buffers of coordinator, allowed 1000 > >> >>> connections and the error disappeared. > >> >> > >> >> I am not really sure I get the meaning of this, but how did you turn > >> >> off > >> >> pooler on coordinator. > >> >> Did you use the parameter persistent_connections? > >> >> Connection pooling from coordinator is an automatic feature and you > >> >> have to > >> >> use it if you want to connect from a remote coordinator to backend XC > >> >> nodes. > >> >> > >> >> You also have to know that it is important to set a limit of > >> >> connections on > >> >> datanodes equal to the sum of max connections on all coordinators. > >> >> For example, if your cluster is using 2 coordinator with 20 max > >> >> connections > >> >> each, you may have a maximum of 40 connections to datanodes. > >> >> This uses a lot of shared buffer on a node, but typically this > maximum > >> >> number of connections is never reached thanks to the connection > >> >> pooling. > >> >> > >> >> Please node also that number of Coordinator <-> Coordinator > connections > >> >> may > >> >> also increase if DDL are used from several coordinators. > >> >> > >> >>> However, all data is still going on one node (and whatever I could > >> >>> choose as primary datanode), with 40 warehouses... any specific > syntax > >> >>> to load balance warehouses over nodes ? > >> >> > >> >> CREATE TABLE foo (column_key type, other_column int) DISTRIBUTE BY > >> >> HASH(column_key); > >> >> -- > >> >> Michael Paquier > >> >> http://michael.otacoo.com > >> >> > >> > > > > > > > > > -- > > Michael Paquier > > http://michael.otacoo.com > > > > > ------------------------------------------------------------------------------ > > Simplify data backup and recovery for your virtual environment with > vRanger. > > Installation's a snap, and flexible recovery options mean your data is > safe, > > secure and there when you need it. Data protection magic? > > Nope - It's vRanger. Get your free trial download today. > > http://p.sf.net/sfu/quest-sfdev2dev > > _______________________________________________ > > Postgres-xc-general mailing list > > Pos...@li... > > https://lists.sourceforge.net/lists/listinfo/postgres-xc-general > > > > > |