From: Michael P. <mic...@gm...> - 2011-06-02 01:09:34
|
The problem you are facing with the pooler may be related to this bug that has been found recently: https://sourceforge.net/tracker/?func=detail&aid=3310399&group_id=311227&atid=1310232 It looks that datanode is not able to manage efficiently autovacuum commit. This problem may cause problems in data consistency, making a node to crash in the worst scenario. This could explain why you cannot begin a transaction correctly on nodes, connections to backends being closed by a crash or a consistency problem. Can you provide some backtrace or give hints about the problem you have? Some tips in node logs perhaps? On Wed, Jun 1, 2011 at 8:12 PM, Lionel Frachon <lio...@gm...>wrote: > Hello, > > I was forced to distribute data by replication and not by hash, as I'm > constantly getting "ERROR: Could not commit prepared transaction > implicitely" on other tables than Warehouse (w_id), using 10 > warehouses (this error appears both on data loading, when using hash, > and when performing distributed queries). > > I used slightly different setup : > - 1 GTM-only node > - 1 Coordinator-only node > - 3 Datanodes > > Coordinator has 256MB RAM, Datanodes having 768. They did not reach at > any moment the full usage of dedicated RAM. > > However, running benchmark more than a few minutes (2 or 3) drives to > the following errors > > --- Unexpected SQLException caught in NEW-ORDER Txn --- > Message: ERROR: Could not begin transaction on data nodes. > SQLState: XX000 > ErrorCode: 0 > > Then a bit later > --- Unexpected SQLException caught in NEW-ORDER Txn --- > > Message: ERROR: Failed to get pooled connections > SQLState: 53000 > ErrorCode: 0 > > then (and I assume they are linked) > --- Unexpected SQLException caught in NEW-ORDER Txn --- > Message: ERROR: Could not begin transaction on data nodes. > SQLState: XX000 > ErrorCode: 0 > > additionnally, the test end with many > --- Unexpected SQLException caught in NEW-ORDER Txn --- > Message: This connection has been closed. > SQLState: 08003 > ErrorCode: 0 > > I'm using 10 terminals, using 10 warehouses. > > Any clue for this error, (and for distribution by hash, I understand > they're probably linked...) > > Lionel F. > > > > 2011/5/31 Lionel Frachon <lio...@gm...>: > > Hi, > > > > yes, persistent_datanode_connections is now set to off - it may not be > > related to the issues I have. > > > > What amount of memory do you have on your datanodes & coordinator ? > > > > Here are my settings : > > datanode : shared_buffers = 512MB > > coordinator=256MB (now, was 96MB) > > > > I still get for some distributed tables (by hash) > > "ERROR: Could not commit prepared transaction implicitely" > > > > For distribution syntax, yes, I found your webpage talking about > > regression tests > > > >> You also have to know that it is important to set a limit of connections > on > >> datanodes equal to the sum of max connections on all coordinators. > >> For example, if your cluster is using 2 coordinator with 20 max > connections > >> each, you may have a maximum of 40 connections to datanodes. > > > > Ok, tweaking this today and launching the tests again... > > > > > > Lionel F. > > > > > > > > 2011/5/31 Michael Paquier <mic...@gm...>: > >> > >> > >> On Mon, May 30, 2011 at 7:34 PM, Lionel Frachon < > lio...@gm...> > >> wrote: > >>> > >>> Hi again, > >>> > >>> I turned off connection pooling on coordinator (dunno why it sayed > >>> on), raised the shared_buffers of coordinator, allowed 1000 > >>> connections and the error disappeared. > >> > >> I am not really sure I get the meaning of this, but how did you turn off > >> pooler on coordinator. > >> Did you use the parameter persistent_connections? > >> Connection pooling from coordinator is an automatic feature and you have > to > >> use it if you want to connect from a remote coordinator to backend XC > nodes. > >> > >> You also have to know that it is important to set a limit of connections > on > >> datanodes equal to the sum of max connections on all coordinators. > >> For example, if your cluster is using 2 coordinator with 20 max > connections > >> each, you may have a maximum of 40 connections to datanodes. > >> This uses a lot of shared buffer on a node, but typically this maximum > >> number of connections is never reached thanks to the connection pooling. > >> > >> Please node also that number of Coordinator <-> Coordinator connections > may > >> also increase if DDL are used from several coordinators. > >> > >>> However, all data is still going on one node (and whatever I could > >>> choose as primary datanode), with 40 warehouses... any specific syntax > >>> to load balance warehouses over nodes ? > >> > >> CREATE TABLE foo (column_key type, other_column int) DISTRIBUTE BY > >> HASH(column_key); > >> -- > >> Michael Paquier > >> http://michael.otacoo.com > >> > > > -- Michael Paquier http://michael.otacoo.com |