Re: [Postgres-xc-general] JDBC driver for 0.9.4

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Hello,

looking at the debug1 mode log on datanode3, I found some interesting points
hereafter (vacuum on, max_prepared_transactions=5000):

(with normal inserts)
[....]
DEBUG:  unset snapshot info
DEBUG:  global snapshot info: gxmin: 101, gxmax: 101, gxcnt: 0
DEBUG:  unset snapshot info
DEBUG:  global snapshot info: gxmin: 101, gxmax: 101, gxcnt: 0
DEBUG:  unset snapshot info
DEBUG:  [re]setting xid = 0, old_value = 0
DEBUG:  unset snapshot info
DEBUG:  Received new gxid 102
DEBUG:  [re]setting xid = 102, old_value = 0
DEBUG:  TransactionId = 102
DEBUG:  xid (102) does not follow ShmemVariableCache->nextXid (665)
DEBUG:  Record transaction commit 101
DEBUG:  Record transaction commit 102
DEBUG:  [re]setting xid = 0, old_value = 0
DEBUG:  unset snapshot info
DEBUG:  Received new gxid 103
DEBUG:  [re]setting xid = 103, old_value = 0
DEBUG:  unset snapshot info
DEBUG:  global snapshot info: gxmin: 103, gxmax: 103, gxcnt: 0
DEBUG:  TransactionId = 103
DEBUG:  xid (103) does not follow ShmemVariableCache->nextXid (665)
DEBUG:  unset snapshot info
DEBUG:  global snapshot info: gxmin: 103, gxmax: 103, gxcnt: 0

While inserting with dsitrubuted hashed keys :

[...]
DEBUG:  [re]setting xid = 0, old_value = 0
DEBUG:  unset snapshot info
DEBUG:  Received new gxid 522
DEBUG:  [re]setting xid = 522, old_value = 0
DEBUG:  TransactionId = 522
DEBUG:  xid (522) does not follow ShmemVariableCache->nextXid (665)
DEBUG:  Record transaction commit 521
DEBUG:  Record transaction commit 522
DEBUG:  [re]setting xid = 0, old_value = 0
DEBUG:  unset snapshot info
DEBUG:  Received new gxid 524
DEBUG:  [re]setting xid = 524, old_value = 0
ERROR:  prepared transaction with identifier "T523" does not exist
STATEMENT:  COMMIT PREPARED 'T523'
DEBUG:  [re]setting xid = 0, old_value = 524
DEBUG:  unset snapshot info
DEBUG:  Received new gxid 526
DEBUG:  [re]setting xid = 526, old_value = 0
ERROR:  prepared transaction with identifier "T525" does not exist
STATEMENT:  COMMIT PREPARED 'T525'
DEBUG:  [re]setting xid = 0, old_value = 526
DEBUG:  unset snapshot info
DEBUG:  Received new gxid 528
DEBUG:  [re]setting xid = 528, old_value = 0
ERROR:  prepared transaction with identifier "T527" does not exist
[...]

No special info on the gtm node regarding the same transactions, though.

Hope this can help

Regards

Lionel F.

2011/6/2 Michael Paquier <mic...@gm...>

> The problem you are facing with the pooler may be related to this bug that
> has been found recently:
>
> https://sourceforge.net/tracker/?func=detail&aid=3310399&group_id=311227&atid=1310232
>
> It looks that datanode is not able to manage efficiently autovacuum commit.
> This problem may cause problems in data consistency, making a node to crash
> in the worst scenario.
>
> This could explain why you cannot begin a transaction correctly on nodes,
> connections to backends being closed by a crash or a consistency problem.
> Can you provide some backtrace or give hints about the problem you have?
> Some tips in node logs perhaps?
>
>
> On Wed, Jun 1, 2011 at 8:12 PM, Lionel Frachon <lio...@gm...>wrote:
>
>> Hello,
>>
>> I was forced to distribute data by replication and not by hash, as I'm
>> constantly getting "ERROR: Could not commit prepared transaction
>> implicitely" on other tables than Warehouse (w_id), using 10
>> warehouses (this error appears both on data loading, when using hash,
>> and when performing distributed queries).
>>
>> I used slightly different setup :
>> - 1 GTM-only node
>> - 1 Coordinator-only node
>> - 3 Datanodes
>>
>> Coordinator has 256MB RAM, Datanodes having 768. They did not reach at
>> any moment the full usage of dedicated RAM.
>>
>> However, running benchmark more than a few minutes (2 or 3) drives to
>> the following errors
>>
>> --- Unexpected SQLException caught in NEW-ORDER Txn ---
>> Message:   ERROR: Could not begin transaction on data nodes.
>> SQLState:  XX000
>> ErrorCode: 0
>>
>> Then a bit later
>> --- Unexpected SQLException caught in NEW-ORDER Txn ---
>>
>> Message:   ERROR: Failed to get pooled connections
>> SQLState:  53000
>> ErrorCode: 0
>>
>> then (and I assume they are linked)
>> --- Unexpected SQLException caught in NEW-ORDER Txn ---
>> Message:   ERROR: Could not begin transaction on data nodes.
>> SQLState:  XX000
>> ErrorCode: 0
>>
>> additionnally, the test end with many
>> --- Unexpected SQLException caught in NEW-ORDER Txn ---
>> Message:   This connection has been closed.
>> SQLState:  08003
>> ErrorCode: 0
>>
>> I'm using 10 terminals, using 10 warehouses.
>>
>> Any clue for this error, (and for distribution by hash, I understand
>> they're probably linked...)
>>
>> Lionel F.
>>
>>
>>
>> 2011/5/31 Lionel Frachon <lio...@gm...>:
>> > Hi,
>> >
>> > yes, persistent_datanode_connections is now set to off - it may not be
>> > related to the issues I have.
>> >
>> > What amount of memory do you have on your datanodes & coordinator ?
>> >
>> > Here are my settings :
>> > datanode : shared_buffers = 512MB
>> > coordinator=256MB (now, was 96MB)
>> >
>> > I still get for some distributed tables (by hash)
>> > "ERROR: Could not commit prepared transaction implicitely"
>> >
>> > For distribution syntax, yes, I found your webpage talking about
>> > regression tests
>> >
>> >> You also have to know that it is important to set a limit of
>> connections on
>> >> datanodes equal to the sum of max connections on all coordinators.
>> >> For example, if your cluster is using 2 coordinator with 20 max
>> connections
>> >> each, you may have a maximum of 40 connections to datanodes.
>> >
>> > Ok, tweaking this today and launching the tests again...
>> >
>> >
>> > Lionel F.
>> >
>> >
>> >
>> > 2011/5/31 Michael Paquier <mic...@gm...>:
>> >>
>> >>
>> >> On Mon, May 30, 2011 at 7:34 PM, Lionel Frachon <
>> lio...@gm...>
>> >> wrote:
>> >>>
>> >>> Hi again,
>> >>>
>> >>> I turned off connection pooling on coordinator (dunno why it sayed
>> >>> on), raised the shared_buffers of coordinator, allowed 1000
>> >>> connections and the error disappeared.
>> >>
>> >> I am not really sure I get the meaning of this, but how did you turn
>> off
>> >> pooler on coordinator.
>> >> Did you use the parameter persistent_connections?
>> >> Connection pooling from coordinator is an automatic feature and you
>> have to
>> >> use it if you want to connect from a remote coordinator to backend XC
>> nodes.
>> >>
>> >> You also have to know that it is important to set a limit of
>> connections on
>> >> datanodes equal to the sum of max connections on all coordinators.
>> >> For example, if your cluster is using 2 coordinator with 20 max
>> connections
>> >> each, you may have a maximum of 40 connections to datanodes.
>> >> This uses a lot of shared buffer on a node, but typically this maximum
>> >> number of connections is never reached thanks to the connection
>> pooling.
>> >>
>> >> Please node also that number of Coordinator <-> Coordinator connections
>> may
>> >> also increase if DDL are used from several coordinators.
>> >>
>> >>> However, all data is still going on one node (and whatever I could
>> >>> choose as primary datanode), with 40 warehouses... any specific syntax
>> >>> to load balance warehouses over nodes ?
>> >>
>> >> CREATE TABLE foo (column_key type, other_column int) DISTRIBUTE BY
>> >> HASH(column_key);
>> >> --
>> >> Michael Paquier
>> >> http://michael.otacoo.com
>> >>
>> >
>>
>
>
>
> --
> Michael Paquier
> http://michael.otacoo.com
>