From: Lionel F. <lio...@gm...> - 2011-06-01 11:12:25
|
Hello, I was forced to distribute data by replication and not by hash, as I'm constantly getting "ERROR: Could not commit prepared transaction implicitely" on other tables than Warehouse (w_id), using 10 warehouses (this error appears both on data loading, when using hash, and when performing distributed queries). I used slightly different setup : - 1 GTM-only node - 1 Coordinator-only node - 3 Datanodes Coordinator has 256MB RAM, Datanodes having 768. They did not reach at any moment the full usage of dedicated RAM. However, running benchmark more than a few minutes (2 or 3) drives to the following errors --- Unexpected SQLException caught in NEW-ORDER Txn --- Message: ERROR: Could not begin transaction on data nodes. SQLState: XX000 ErrorCode: 0 Then a bit later --- Unexpected SQLException caught in NEW-ORDER Txn --- Message: ERROR: Failed to get pooled connections SQLState: 53000 ErrorCode: 0 then (and I assume they are linked) --- Unexpected SQLException caught in NEW-ORDER Txn --- Message: ERROR: Could not begin transaction on data nodes. SQLState: XX000 ErrorCode: 0 additionnally, the test end with many --- Unexpected SQLException caught in NEW-ORDER Txn --- Message: This connection has been closed. SQLState: 08003 ErrorCode: 0 I'm using 10 terminals, using 10 warehouses. Any clue for this error, (and for distribution by hash, I understand they're probably linked...) Lionel F. 2011/5/31 Lionel Frachon <lio...@gm...>: > Hi, > > yes, persistent_datanode_connections is now set to off - it may not be > related to the issues I have. > > What amount of memory do you have on your datanodes & coordinator ? > > Here are my settings : > datanode : shared_buffers = 512MB > coordinator=256MB (now, was 96MB) > > I still get for some distributed tables (by hash) > "ERROR: Could not commit prepared transaction implicitely" > > For distribution syntax, yes, I found your webpage talking about > regression tests > >> You also have to know that it is important to set a limit of connections on >> datanodes equal to the sum of max connections on all coordinators. >> For example, if your cluster is using 2 coordinator with 20 max connections >> each, you may have a maximum of 40 connections to datanodes. > > Ok, tweaking this today and launching the tests again... > > > Lionel F. > > > > 2011/5/31 Michael Paquier <mic...@gm...>: >> >> >> On Mon, May 30, 2011 at 7:34 PM, Lionel Frachon <lio...@gm...> >> wrote: >>> >>> Hi again, >>> >>> I turned off connection pooling on coordinator (dunno why it sayed >>> on), raised the shared_buffers of coordinator, allowed 1000 >>> connections and the error disappeared. >> >> I am not really sure I get the meaning of this, but how did you turn off >> pooler on coordinator. >> Did you use the parameter persistent_connections? >> Connection pooling from coordinator is an automatic feature and you have to >> use it if you want to connect from a remote coordinator to backend XC nodes. >> >> You also have to know that it is important to set a limit of connections on >> datanodes equal to the sum of max connections on all coordinators. >> For example, if your cluster is using 2 coordinator with 20 max connections >> each, you may have a maximum of 40 connections to datanodes. >> This uses a lot of shared buffer on a node, but typically this maximum >> number of connections is never reached thanks to the connection pooling. >> >> Please node also that number of Coordinator <-> Coordinator connections may >> also increase if DDL are used from several coordinators. >> >>> However, all data is still going on one node (and whatever I could >>> choose as primary datanode), with 40 warehouses... any specific syntax >>> to load balance warehouses over nodes ? >> >> CREATE TABLE foo (column_key type, other_column int) DISTRIBUTE BY >> HASH(column_key); >> -- >> Michael Paquier >> http://michael.otacoo.com >> > |