From: Lionel F. <lio...@gm...> - 2011-06-06 09:12:01
|
Hi again, done the test (with 3 initial warehouses, distributed by hash on their ID). Expected behaviour is they've distributed amongst nodes, but (connected through coordinator): testperfs=# EXECUTE DIRECT ON NODE 3 'select * from warehouse'; w_id | w_ytd | w_tax | w_name | w_street_1 | w_street_2 | w_city | w_state | w_zip ------+-------+-------+--------+------------+------------+--------+---------+------- (0 rows) testperfs=# EXECUTE DIRECT ON NODE 2 'select * from warehouse'; w_id | w_ytd | w_tax | w_name | w_street_1 | w_street_2 | w_city | w_state | w_zip ------+-------+-------+--------+------------+------------+--------+---------+------- (0 rows) testperfs=# EXECUTE DIRECT ON NODE 1 'select * from warehouse'; w_id | w_ytd | w_tax | w_name | w_street_1 | w_street_2 | w_city | w_state | w_zip ------+-----------+--------+----------+-------------------+--------------+---------------------+---------+----------- 1 | 300000.00 | 0.0253 | awmmmaRe | sKsjzyBoATkSdQCKv | gzWxflQdxagP | kEcZGWmkZRQuPTEnJYq | HA | 123456789 (1 row) Lionel F. 2011/6/3 Michael Paquier <mic...@gm...> > I am also wondering if the status of your connections is OK. It is not > really normal that you get error messages: > > ERROR: Could not begin transaction on data nodes. > ERROR: prepared transaction with identifier "T711" does not exist > > Do you know the existence of EXECUTE DIRECT? > > With a query like that: > EXECUTE DIRECT ON NODE 1 'select * from a'; > you can check the results that are only on node 1. > > It could be worth checking once with a psql terminal that data is loaded > correctly. > If execute direct returns an error it would mean that something is missing > in your settings. > If there are no errors, something with JDBC does not work correctly. > > Also I have something else in mind, do you start up GTM with a first GXID > more than 628? > There may be visibility issues as initdb uses transaction ID lower than > those ones for initialization. > > > On Thu, Jun 2, 2011 at 8:46 PM, Mason <ma...@us...>wrote: > >> On Wed, Jun 1, 2011 at 9:09 PM, Michael Paquier >> <mic...@gm...> wrote: >> > The problem you are facing with the pooler may be related to this bug >> that >> > has been found recently: >> > >> https://sourceforge.net/tracker/?func=detail&aid=3310399&group_id=311227&atid=1310232 >> > >> > It looks that datanode is not able to manage efficiently autovacuum >> commit. >> > This problem may cause problems in data consistency, making a node to >> crash >> > in the worst scenario. >> > >> > This could explain why you cannot begin a transaction correctly on >> nodes, >> > connections to backends being closed by a crash or a consistency >> problem. >> > Can you provide some backtrace or give hints about the problem you have? >> > Some tips in node logs perhaps? >> >> To see if it is autovacuum, Lionel, you could temporarily disable it >> and try to reproduce the error. >> >> Mason >> >> > >> > On Wed, Jun 1, 2011 at 8:12 PM, Lionel Frachon < >> lio...@gm...> >> > wrote: >> >> >> >> Hello, >> >> >> >> I was forced to distribute data by replication and not by hash, as I'm >> >> constantly getting "ERROR: Could not commit prepared transaction >> >> implicitely" on other tables than Warehouse (w_id), using 10 >> >> warehouses (this error appears both on data loading, when using hash, >> >> and when performing distributed queries). >> >> >> >> I used slightly different setup : >> >> - 1 GTM-only node >> >> - 1 Coordinator-only node >> >> - 3 Datanodes >> >> >> >> Coordinator has 256MB RAM, Datanodes having 768. They did not reach at >> >> any moment the full usage of dedicated RAM. >> >> >> >> However, running benchmark more than a few minutes (2 or 3) drives to >> >> the following errors >> >> >> >> --- Unexpected SQLException caught in NEW-ORDER Txn --- >> >> Message: ERROR: Could not begin transaction on data nodes. >> >> SQLState: XX000 >> >> ErrorCode: 0 >> >> >> >> Then a bit later >> >> --- Unexpected SQLException caught in NEW-ORDER Txn --- >> >> >> >> Message: ERROR: Failed to get pooled connections >> >> SQLState: 53000 >> >> ErrorCode: 0 >> >> >> >> then (and I assume they are linked) >> >> --- Unexpected SQLException caught in NEW-ORDER Txn --- >> >> Message: ERROR: Could not begin transaction on data nodes. >> >> SQLState: XX000 >> >> ErrorCode: 0 >> >> >> >> additionnally, the test end with many >> >> --- Unexpected SQLException caught in NEW-ORDER Txn --- >> >> Message: This connection has been closed. >> >> SQLState: 08003 >> >> ErrorCode: 0 >> >> >> >> I'm using 10 terminals, using 10 warehouses. >> >> >> >> Any clue for this error, (and for distribution by hash, I understand >> >> they're probably linked...) >> >> >> >> Lionel F. >> >> >> >> >> >> >> >> 2011/5/31 Lionel Frachon <lio...@gm...>: >> >> > Hi, >> >> > >> >> > yes, persistent_datanode_connections is now set to off - it may not >> be >> >> > related to the issues I have. >> >> > >> >> > What amount of memory do you have on your datanodes & coordinator ? >> >> > >> >> > Here are my settings : >> >> > datanode : shared_buffers = 512MB >> >> > coordinator=256MB (now, was 96MB) >> >> > >> >> > I still get for some distributed tables (by hash) >> >> > "ERROR: Could not commit prepared transaction implicitely" >> >> > >> >> > For distribution syntax, yes, I found your webpage talking about >> >> > regression tests >> >> > >> >> >> You also have to know that it is important to set a limit of >> >> >> connections on >> >> >> datanodes equal to the sum of max connections on all coordinators. >> >> >> For example, if your cluster is using 2 coordinator with 20 max >> >> >> connections >> >> >> each, you may have a maximum of 40 connections to datanodes. >> >> > >> >> > Ok, tweaking this today and launching the tests again... >> >> > >> >> > >> >> > Lionel F. >> >> > >> >> > >> >> > >> >> > 2011/5/31 Michael Paquier <mic...@gm...>: >> >> >> >> >> >> >> >> >> On Mon, May 30, 2011 at 7:34 PM, Lionel Frachon >> >> >> <lio...@gm...> >> >> >> wrote: >> >> >>> >> >> >>> Hi again, >> >> >>> >> >> >>> I turned off connection pooling on coordinator (dunno why it sayed >> >> >>> on), raised the shared_buffers of coordinator, allowed 1000 >> >> >>> connections and the error disappeared. >> >> >> >> >> >> I am not really sure I get the meaning of this, but how did you turn >> >> >> off >> >> >> pooler on coordinator. >> >> >> Did you use the parameter persistent_connections? >> >> >> Connection pooling from coordinator is an automatic feature and you >> >> >> have to >> >> >> use it if you want to connect from a remote coordinator to backend >> XC >> >> >> nodes. >> >> >> >> >> >> You also have to know that it is important to set a limit of >> >> >> connections on >> >> >> datanodes equal to the sum of max connections on all coordinators. >> >> >> For example, if your cluster is using 2 coordinator with 20 max >> >> >> connections >> >> >> each, you may have a maximum of 40 connections to datanodes. >> >> >> This uses a lot of shared buffer on a node, but typically this >> maximum >> >> >> number of connections is never reached thanks to the connection >> >> >> pooling. >> >> >> >> >> >> Please node also that number of Coordinator <-> Coordinator >> connections >> >> >> may >> >> >> also increase if DDL are used from several coordinators. >> >> >> >> >> >>> However, all data is still going on one node (and whatever I could >> >> >>> choose as primary datanode), with 40 warehouses... any specific >> syntax >> >> >>> to load balance warehouses over nodes ? >> >> >> >> >> >> CREATE TABLE foo (column_key type, other_column int) DISTRIBUTE BY >> >> >> HASH(column_key); >> >> >> -- >> >> >> Michael Paquier >> >> >> http://michael.otacoo.com >> >> >> >> >> > >> > >> > >> > >> > -- >> > Michael Paquier >> > http://michael.otacoo.com >> > >> > >> ------------------------------------------------------------------------------ >> > Simplify data backup and recovery for your virtual environment with >> vRanger. >> > Installation's a snap, and flexible recovery options mean your data is >> safe, >> > secure and there when you need it. Data protection magic? >> > Nope - It's vRanger. Get your free trial download today. >> > http://p.sf.net/sfu/quest-sfdev2dev >> > _______________________________________________ >> > Postgres-xc-general mailing list >> > Pos...@li... >> > https://lists.sourceforge.net/lists/listinfo/postgres-xc-general >> > >> > >> > > > > -- > Michael Paquier > http://michael.otacoo.com > |