Re: [Postgres-xc-general] JDBC driver for 0.9.4

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

The problem you are facing with the pooler may be related to this bug that
has been found recently:
https://sourceforge.net/tracker/?func=detail&aid=3310399&group_id=311227&atid=1310232

It looks that datanode is not able to manage efficiently autovacuum commit.
This problem may cause problems in data consistency, making a node to crash
in the worst scenario.

This could explain why you cannot begin a transaction correctly on nodes,
connections to backends being closed by a crash or a consistency problem.
Can you provide some backtrace or give hints about the problem you have?
Some tips in node logs perhaps?

On Wed, Jun 1, 2011 at 8:12 PM, Lionel Frachon <lio...@gm...>wrote:

> Hello,
>
> I was forced to distribute data by replication and not by hash, as I'm
> constantly getting "ERROR: Could not commit prepared transaction
> implicitely" on other tables than Warehouse (w_id), using 10
> warehouses (this error appears both on data loading, when using hash,
> and when performing distributed queries).
>
> I used slightly different setup :
> - 1 GTM-only node
> - 1 Coordinator-only node
> - 3 Datanodes
>
> Coordinator has 256MB RAM, Datanodes having 768. They did not reach at
> any moment the full usage of dedicated RAM.
>
> However, running benchmark more than a few minutes (2 or 3) drives to
> the following errors
>
> --- Unexpected SQLException caught in NEW-ORDER Txn ---
> Message:   ERROR: Could not begin transaction on data nodes.
> SQLState:  XX000
> ErrorCode: 0
>
> Then a bit later
> --- Unexpected SQLException caught in NEW-ORDER Txn ---
>
> Message:   ERROR: Failed to get pooled connections
> SQLState:  53000
> ErrorCode: 0
>
> then (and I assume they are linked)
> --- Unexpected SQLException caught in NEW-ORDER Txn ---
> Message:   ERROR: Could not begin transaction on data nodes.
> SQLState:  XX000
> ErrorCode: 0
>
> additionnally, the test end with many
> --- Unexpected SQLException caught in NEW-ORDER Txn ---
> Message:   This connection has been closed.
> SQLState:  08003
> ErrorCode: 0
>
> I'm using 10 terminals, using 10 warehouses.
>
> Any clue for this error, (and for distribution by hash, I understand
> they're probably linked...)
>
> Lionel F.
>
>
>
> 2011/5/31 Lionel Frachon <lio...@gm...>:
> > Hi,
> >
> > yes, persistent_datanode_connections is now set to off - it may not be
> > related to the issues I have.
> >
> > What amount of memory do you have on your datanodes & coordinator ?
> >
> > Here are my settings :
> > datanode : shared_buffers = 512MB
> > coordinator=256MB (now, was 96MB)
> >
> > I still get for some distributed tables (by hash)
> > "ERROR: Could not commit prepared transaction implicitely"
> >
> > For distribution syntax, yes, I found your webpage talking about
> > regression tests
> >
> >> You also have to know that it is important to set a limit of connections
> on
> >> datanodes equal to the sum of max connections on all coordinators.
> >> For example, if your cluster is using 2 coordinator with 20 max
> connections
> >> each, you may have a maximum of 40 connections to datanodes.
> >
> > Ok, tweaking this today and launching the tests again...
> >
> >
> > Lionel F.
> >
> >
> >
> > 2011/5/31 Michael Paquier <mic...@gm...>:
> >>
> >>
> >> On Mon, May 30, 2011 at 7:34 PM, Lionel Frachon <
> lio...@gm...>
> >> wrote:
> >>>
> >>> Hi again,
> >>>
> >>> I turned off connection pooling on coordinator (dunno why it sayed
> >>> on), raised the shared_buffers of coordinator, allowed 1000
> >>> connections and the error disappeared.
> >>
> >> I am not really sure I get the meaning of this, but how did you turn off
> >> pooler on coordinator.
> >> Did you use the parameter persistent_connections?
> >> Connection pooling from coordinator is an automatic feature and you have
> to
> >> use it if you want to connect from a remote coordinator to backend XC
> nodes.
> >>
> >> You also have to know that it is important to set a limit of connections
> on
> >> datanodes equal to the sum of max connections on all coordinators.
> >> For example, if your cluster is using 2 coordinator with 20 max
> connections
> >> each, you may have a maximum of 40 connections to datanodes.
> >> This uses a lot of shared buffer on a node, but typically this maximum
> >> number of connections is never reached thanks to the connection pooling.
> >>
> >> Please node also that number of Coordinator <-> Coordinator connections
> may
> >> also increase if DDL are used from several coordinators.
> >>
> >>> However, all data is still going on one node (and whatever I could
> >>> choose as primary datanode), with 40 warehouses... any specific syntax
> >>> to load balance warehouses over nodes ?
> >>
> >> CREATE TABLE foo (column_key type, other_column int) DISTRIBUTE BY
> >> HASH(column_key);
> >> --
> >> Michael Paquier
> >> http://michael.otacoo.com
> >>
> >
>

-- 
Michael Paquier
http://michael.otacoo.com