|
From: Lionel F. <lio...@gm...> - 2011-06-06 09:12:01
|
Hi again,
done the test (with 3 initial warehouses, distributed by hash on their ID).
Expected behaviour is they've distributed amongst nodes, but (connected
through coordinator):
testperfs=# EXECUTE DIRECT ON NODE 3 'select * from warehouse';
w_id | w_ytd | w_tax | w_name | w_street_1 | w_street_2 | w_city | w_state
| w_zip
------+-------+-------+--------+------------+------------+--------+---------+-------
(0 rows)
testperfs=# EXECUTE DIRECT ON NODE 2 'select * from warehouse';
w_id | w_ytd | w_tax | w_name | w_street_1 | w_street_2 | w_city | w_state
| w_zip
------+-------+-------+--------+------------+------------+--------+---------+-------
(0 rows)
testperfs=# EXECUTE DIRECT ON NODE 1 'select * from warehouse';
w_id | w_ytd | w_tax | w_name | w_street_1 | w_street_2
| w_city | w_state | w_zip
------+-----------+--------+----------+-------------------+--------------+---------------------+---------+-----------
1 | 300000.00 | 0.0253 | awmmmaRe | sKsjzyBoATkSdQCKv | gzWxflQdxagP |
kEcZGWmkZRQuPTEnJYq | HA | 123456789
(1 row)
Lionel F.
2011/6/3 Michael Paquier <mic...@gm...>
> I am also wondering if the status of your connections is OK. It is not
> really normal that you get error messages:
>
> ERROR: Could not begin transaction on data nodes.
> ERROR: prepared transaction with identifier "T711" does not exist
>
> Do you know the existence of EXECUTE DIRECT?
>
> With a query like that:
> EXECUTE DIRECT ON NODE 1 'select * from a';
> you can check the results that are only on node 1.
>
> It could be worth checking once with a psql terminal that data is loaded
> correctly.
> If execute direct returns an error it would mean that something is missing
> in your settings.
> If there are no errors, something with JDBC does not work correctly.
>
> Also I have something else in mind, do you start up GTM with a first GXID
> more than 628?
> There may be visibility issues as initdb uses transaction ID lower than
> those ones for initialization.
>
>
> On Thu, Jun 2, 2011 at 8:46 PM, Mason <ma...@us...>wrote:
>
>> On Wed, Jun 1, 2011 at 9:09 PM, Michael Paquier
>> <mic...@gm...> wrote:
>> > The problem you are facing with the pooler may be related to this bug
>> that
>> > has been found recently:
>> >
>> https://sourceforge.net/tracker/?func=detail&aid=3310399&group_id=311227&atid=1310232
>> >
>> > It looks that datanode is not able to manage efficiently autovacuum
>> commit.
>> > This problem may cause problems in data consistency, making a node to
>> crash
>> > in the worst scenario.
>> >
>> > This could explain why you cannot begin a transaction correctly on
>> nodes,
>> > connections to backends being closed by a crash or a consistency
>> problem.
>> > Can you provide some backtrace or give hints about the problem you have?
>> > Some tips in node logs perhaps?
>>
>> To see if it is autovacuum, Lionel, you could temporarily disable it
>> and try to reproduce the error.
>>
>> Mason
>>
>> >
>> > On Wed, Jun 1, 2011 at 8:12 PM, Lionel Frachon <
>> lio...@gm...>
>> > wrote:
>> >>
>> >> Hello,
>> >>
>> >> I was forced to distribute data by replication and not by hash, as I'm
>> >> constantly getting "ERROR: Could not commit prepared transaction
>> >> implicitely" on other tables than Warehouse (w_id), using 10
>> >> warehouses (this error appears both on data loading, when using hash,
>> >> and when performing distributed queries).
>> >>
>> >> I used slightly different setup :
>> >> - 1 GTM-only node
>> >> - 1 Coordinator-only node
>> >> - 3 Datanodes
>> >>
>> >> Coordinator has 256MB RAM, Datanodes having 768. They did not reach at
>> >> any moment the full usage of dedicated RAM.
>> >>
>> >> However, running benchmark more than a few minutes (2 or 3) drives to
>> >> the following errors
>> >>
>> >> --- Unexpected SQLException caught in NEW-ORDER Txn ---
>> >> Message: ERROR: Could not begin transaction on data nodes.
>> >> SQLState: XX000
>> >> ErrorCode: 0
>> >>
>> >> Then a bit later
>> >> --- Unexpected SQLException caught in NEW-ORDER Txn ---
>> >>
>> >> Message: ERROR: Failed to get pooled connections
>> >> SQLState: 53000
>> >> ErrorCode: 0
>> >>
>> >> then (and I assume they are linked)
>> >> --- Unexpected SQLException caught in NEW-ORDER Txn ---
>> >> Message: ERROR: Could not begin transaction on data nodes.
>> >> SQLState: XX000
>> >> ErrorCode: 0
>> >>
>> >> additionnally, the test end with many
>> >> --- Unexpected SQLException caught in NEW-ORDER Txn ---
>> >> Message: This connection has been closed.
>> >> SQLState: 08003
>> >> ErrorCode: 0
>> >>
>> >> I'm using 10 terminals, using 10 warehouses.
>> >>
>> >> Any clue for this error, (and for distribution by hash, I understand
>> >> they're probably linked...)
>> >>
>> >> Lionel F.
>> >>
>> >>
>> >>
>> >> 2011/5/31 Lionel Frachon <lio...@gm...>:
>> >> > Hi,
>> >> >
>> >> > yes, persistent_datanode_connections is now set to off - it may not
>> be
>> >> > related to the issues I have.
>> >> >
>> >> > What amount of memory do you have on your datanodes & coordinator ?
>> >> >
>> >> > Here are my settings :
>> >> > datanode : shared_buffers = 512MB
>> >> > coordinator=256MB (now, was 96MB)
>> >> >
>> >> > I still get for some distributed tables (by hash)
>> >> > "ERROR: Could not commit prepared transaction implicitely"
>> >> >
>> >> > For distribution syntax, yes, I found your webpage talking about
>> >> > regression tests
>> >> >
>> >> >> You also have to know that it is important to set a limit of
>> >> >> connections on
>> >> >> datanodes equal to the sum of max connections on all coordinators.
>> >> >> For example, if your cluster is using 2 coordinator with 20 max
>> >> >> connections
>> >> >> each, you may have a maximum of 40 connections to datanodes.
>> >> >
>> >> > Ok, tweaking this today and launching the tests again...
>> >> >
>> >> >
>> >> > Lionel F.
>> >> >
>> >> >
>> >> >
>> >> > 2011/5/31 Michael Paquier <mic...@gm...>:
>> >> >>
>> >> >>
>> >> >> On Mon, May 30, 2011 at 7:34 PM, Lionel Frachon
>> >> >> <lio...@gm...>
>> >> >> wrote:
>> >> >>>
>> >> >>> Hi again,
>> >> >>>
>> >> >>> I turned off connection pooling on coordinator (dunno why it sayed
>> >> >>> on), raised the shared_buffers of coordinator, allowed 1000
>> >> >>> connections and the error disappeared.
>> >> >>
>> >> >> I am not really sure I get the meaning of this, but how did you turn
>> >> >> off
>> >> >> pooler on coordinator.
>> >> >> Did you use the parameter persistent_connections?
>> >> >> Connection pooling from coordinator is an automatic feature and you
>> >> >> have to
>> >> >> use it if you want to connect from a remote coordinator to backend
>> XC
>> >> >> nodes.
>> >> >>
>> >> >> You also have to know that it is important to set a limit of
>> >> >> connections on
>> >> >> datanodes equal to the sum of max connections on all coordinators.
>> >> >> For example, if your cluster is using 2 coordinator with 20 max
>> >> >> connections
>> >> >> each, you may have a maximum of 40 connections to datanodes.
>> >> >> This uses a lot of shared buffer on a node, but typically this
>> maximum
>> >> >> number of connections is never reached thanks to the connection
>> >> >> pooling.
>> >> >>
>> >> >> Please node also that number of Coordinator <-> Coordinator
>> connections
>> >> >> may
>> >> >> also increase if DDL are used from several coordinators.
>> >> >>
>> >> >>> However, all data is still going on one node (and whatever I could
>> >> >>> choose as primary datanode), with 40 warehouses... any specific
>> syntax
>> >> >>> to load balance warehouses over nodes ?
>> >> >>
>> >> >> CREATE TABLE foo (column_key type, other_column int) DISTRIBUTE BY
>> >> >> HASH(column_key);
>> >> >> --
>> >> >> Michael Paquier
>> >> >> http://michael.otacoo.com
>> >> >>
>> >> >
>> >
>> >
>> >
>> > --
>> > Michael Paquier
>> > http://michael.otacoo.com
>> >
>> >
>> ------------------------------------------------------------------------------
>> > Simplify data backup and recovery for your virtual environment with
>> vRanger.
>> > Installation's a snap, and flexible recovery options mean your data is
>> safe,
>> > secure and there when you need it. Data protection magic?
>> > Nope - It's vRanger. Get your free trial download today.
>> > http://p.sf.net/sfu/quest-sfdev2dev
>> > _______________________________________________
>> > Postgres-xc-general mailing list
>> > Pos...@li...
>> > https://lists.sourceforge.net/lists/listinfo/postgres-xc-general
>> >
>> >
>>
>
>
>
> --
> Michael Paquier
> http://michael.otacoo.com
>
|