|
From: seikath <se...@gm...> - 2013-06-05 13:36:58
|
Hello all,
this is in short the idea of the "HA" implementation at AWS:
We use Amazon VPC, several frontends, they use TCP Amazon Loadbalancer to balance the dblink to the 4 nodes XC.
Each XC node has nginx which is used by the AWS LB to perform various checks on XC node,
if one of them fails, nginx responce is 404 and the AWS LoadBalancer excludes the XC node from the round robin.
Every XC node has pgboucer in front of the coordinators.
So in short :
internet --port 443-->[AWS LB SSL certificate offloading ] --port 80--> [apps]x4 --port 5432 - > [AWS Internal LB TCP ] --pgbouncer-port 5434 -- > [ [pgboucer]
--port 5432-->[coordinator] --port 6543-->[datanode] ]x4
The database is small, and we do expect massive concurrent load.
Later on bigger db size we will separate the load on two XC clusters, write only and read only, I plan to use streaming/other type of replication from the write
group
to the massive read only group.
But this is just the idea, atm we just test the Rails Framework with XC.
In a view to the expected huge load, we want to use XC in production.
Regariding the select nodeoids from pgxc_class result, I see 4 columns on all the 4 nodes:
boxes@vpc-xc-coord-01:5432::boxes_production=[Wed Jun 5 13:34:27 UTC 2013]> select nodeoids from pgxc_class limit 4;
nodeoids
-------------------------
16384 16386 16388 16390
16384 16386 16388 16390
16384 16386 16388 16390
16384 16386 16388 16390
(4 rows)
Time: 4.444 ms
postgres@vpc-xc-coord-02:5432::boxes_production=[Wed Jun 5 13:34:31 UTC 2013]# select nodeoids from pgxc_class limit 4;
nodeoids
-------------------------
16386 16384 16388 16390
16386 16384 16388 16390
16386 16384 16388 16390
16386 16384 16388 16390
(4 rows)
postgres@vpc-xc-coord-03:5432::boxes_production=[Wed Jun 5 13:34:34 UTC 2013]# select nodeoids from pgxc_class limit 4;
nodeoids
-------------------------
16386 16388 16384 16390
16386 16388 16384 16390
16386 16388 16384 16390
16386 16388 16384 16390
(4 rows)
postgres@vpc-xc-coord-04:5432::boxes_production=[Wed Jun 5 13:34:37 UTC 2013]# select nodeoids from pgxc_class limit 4;
nodeoids
-------------------------
16386 16388 16390 16384
16386 16388 16390 16384
16386 16388 16390 16384
16386 16388 16390 16384
(4 rows)
Kind regards, and again, thank you all.
Cheers
Ivan
On 06/05/2013 02:29 PM, Mason Sharp wrote:
>
>
>
> On Wed, Jun 5, 2013 at 7:37 AM, Andrei Martsinchyk <and...@gm... <mailto:and...@gm...>> wrote:
>
> Hi Iván,
>
> First of all, you should not replicate all your tables. No such case when it might be reasonable, except maybe some ultimate test case. Single Postgres
> server would perform better then your four-node cluster in any application. So think again about the distribution planning.
>
>
> It might be ok if the data set is relatively small (fits in memory on all nodes) and there is very high read-only concurrency with multiple coordinators.
> Anyway, I agree, in general distributing tables is the thing to do.
>
>
> Regarding your problem, I guess node definitions was different at the moment when you created your tables.
> To verify, please run following query on all your coordinators:
>
> select nodeoids from pgxc_class;
>
> The result should look like this:
>
> nodeoids
> ---------------------------
> 16386 16387 16388 16389
> 16386 16387 16388 16389
> ...
> (N rows)
>
> If you see less then four node OIDs in some or all rows that is the cause.
>
>
>
>
> 2013/6/5 seikath <se...@gm... <mailto:se...@gm...>>
>
> Hello guys,
>
> I am facing one problem with XC I would like to know if its common or not.
>
> 4 AWS based XC nodes installed with datanode and coordinator on each one, then I have separated one gtm-proxy and one gtm node.
> version used:
> psql (Postgres-XC) 1.0.3
> (based on PostgreSQL) 9.1.9
>
> once installed, it operates OK as db/roles creation , database import etc.
>
> The issue is, on db import, I get only three of the nodes replicated , as all the tables definitions are with *distribute by replication* :
>
> xzcat prod-db01-new.2013-06-04.12.31.34.sql.xz | sed 'h;/^CREATE TABLE/,/^);/s/;/ DISTRIBUTE BY REPLICATION;/' | psql -U postgres dbname
> I see no errors at the time of the dbimport ,
> the tables , indexes , pkeys are replicated, just the data is only at three of the 4 active nodes.
>
> Details :
>
> # coordinatods config:
>
> boxes@vpc-xc-coord-01:5432::boxes_production=[Wed Jun 5 08:41:26 UTC 2013]> select * from pgxc_node;
> node_name | node_type | node_port | node_host | nodeis_primary | nodeis_preferred | node_id
> ------------+-----------+-----------+-----------+----------------+------------------+-------------
> coord01 | C | 5432 | localhost | f | f | -951114102
> datanode01 | D | 6543 | localhost | t | t | -561864558
> coord02 | C | 5432 | 10.0.1.12 | f | f | -1523582700
> datanode02 | D | 6543 | 10.0.1.12 | f | f | 670480207
> coord03 | C | 5432 | 10.0.1.13 | f | f | 1641506819
> datanode03 | D | 6543 | 10.0.1.13 | f | f | -1804036519
> coord04 | C | 5432 | 10.0.1.14 | f | f | -1385444041
> datanode04 | D | 6543 | 10.0.1.14 | f | f | 1005050720
> (8 rows)
>
> postgres@vpc-xc-coord-02:5432::boxes_production=[Wed Jun 5 08:42:24 UTC 2013]# select * from pgxc_node;
> node_name | node_type | node_port | node_host | nodeis_primary | nodeis_preferred | node_id
> ------------+-----------+-----------+-----------+----------------+------------------+-------------
> coord02 | C | 5432 | localhost | f | f | -1523582700
> datanode02 | D | 6543 | localhost | f | t | 670480207
> coord01 | C | 5432 | 10.0.1.11 | f | f | -951114102
> datanode01 | D | 6543 | 10.0.1.11 | t | f | -561864558
> coord03 | C | 5432 | 10.0.1.13 | f | f | 1641506819
> datanode03 | D | 6543 | 10.0.1.13 | f | f | -1804036519
> coord04 | C | 5432 | 10.0.1.14 | f | f | -1385444041
> datanode04 | D | 6543 | 10.0.1.14 | f | f | 1005050720
> (8 rows)
> postgres@vpc-xc-coord-03:5432::boxes_production=[Wed Jun 5 08:42:57 UTC 2013]# select * from pgxc_node;
> node_name | node_type | node_port | node_host | nodeis_primary | nodeis_preferred | node_id
> ------------+-----------+-----------+-----------+----------------+------------------+-------------
> coord03 | C | 5432 | localhost | f | f | 1641506819
> datanode03 | D | 6543 | localhost | f | t | -1804036519
> coord01 | C | 5432 | 10.0.1.11 | f | f | -951114102
> datanode01 | D | 6543 | 10.0.1.11 | t | f | -561864558
> coord02 | C | 5432 | 10.0.1.12 | f | f | -1523582700
> datanode02 | D | 6543 | 10.0.1.12 | f | f | 670480207
> coord04 | C | 5432 | 10.0.1.14 | f | f | -1385444041
> datanode04 | D | 6543 | 10.0.1.14 | f | f | 1005050720
>
> postgres@vpc-xc-coord-04:5432::boxes_production=[Wed Jun 5 08:20:35 UTC 2013]# select * from pgxc_node;
> node_name | node_type | node_port | node_host | nodeis_primary | nodeis_preferred | node_id
> ------------+-----------+-----------+-----------+----------------+------------------+-------------
> coord04 | C | 5432 | localhost | f | f | -1385444041
> datanode04 | D | 6543 | localhost | f | t | 1005050720
> coord01 | C | 5432 | 10.0.1.11 | f | f | -951114102
> datanode01 | D | 6543 | 10.0.1.11 | t | f | -561864558
> coord02 | C | 5432 | 10.0.1.12 | f | f | -1523582700
> datanode02 | D | 6543 | 10.0.1.12 | f | f | 670480207
> coord03 | C | 5432 | 10.0.1.13 | f | f | 1641506819
> datanode03 | D | 6543 | 10.0.1.13 | f | f | -1804036519
>
> fist node coordinator and datanodes config:
> ====================================================================
>
> postgres@vpc-xc-coord-01:[Wed Jun 05 08:40:00][/usr/local/pgsql]$ cat datanode.postgresql.conf
> listen_addresses = '*' # what IP address(es) to listen on;
> port = 6543 # (change requires restart)
> max_connections = 100 # (change requires restart)
> shared_buffers = 320MB # min 128kB
> max_prepared_transactions = 100 # zero disables the feature
> datestyle = 'iso, mdy'
> lc_messages = 'en_US.UTF-8' # locale for system error message
> lc_monetary = 'en_US.UTF-8' # locale for monetary formatting
> lc_numeric = 'en_US.UTF-8' # locale for number formatting
> lc_time = 'en_US.UTF-8' # locale for time formatting
> default_text_search_config = 'pg_catalog.english'
> include '/usr/local/pgsql/gtm.include.conf'
> include '/usr/local/pgsql/datanode_node_name.conf'
> #pgxc_node_name = 'datanode04' # Coordinator or Datanode name
> enforce_two_phase_commit = on # Enforce the usage of two-phase commit on transactions
> enable_fast_query_shipping = on
> enable_remotejoin = on
> enable_remotegroup = on
>
>
> postgres@vpc-xc-coord-01:[Wed Jun 05 08:43:29][/usr/local/pgsql]$ cat coordinator.postgresql.conf
> listen_addresses = '*' # what IP address(es) to listen on;
> port = 5432 # (change requires restart)
> max_connections = 100 # (change requires restart)
> shared_buffers = 120MB # min 128kB
> max_prepared_transactions = 100 # zero disables the feature
> datestyle = 'iso, mdy'
> lc_messages = 'en_US.UTF-8' # locale for system error message
> lc_monetary = 'en_US.UTF-8' # locale for monetary formatting
> lc_numeric = 'en_US.UTF-8' # locale for number formatting
> lc_time = 'en_US.UTF-8' # locale for time formatting
> default_text_search_config = 'pg_catalog.english'
> pooler_port = 6667 # Pool Manager TCP port
> min_pool_size = 1 # Initial pool size
> max_pool_size = 100 # Maximum pool size
> max_coordinators = 16 # Maximum number of Coordinators
> max_datanodes = 16 # Maximum number of Datanodes
> include '/usr/local/pgsql/gtm.include.conf'
> include '/usr/local/pgsql/coordinator_node_name.conf'
> enforce_two_phase_commit = on # Enforce the usage of two-phase commit on transactions
> enable_fast_query_shipping = on
> enable_remotejoin = on
> enable_remotegroup = on
>
>
> postgres@vpc-xc-coord-01:[Wed Jun 05 08:43:38][/usr/local/pgsql]$ cat /usr/local/pgsql/gtm.include.conf
> gtm_host = '10.0.1.16' # Host name or address of GTM Proxy, if not - direct link to GTM
> gtm_port = 6543
>
> #gtm_host = '127.0.0.1'
> #gtm_port = 5434
>
> boxes@vpc-xc-coord-01:5432::boxes_production=[Wed Jun 5 08:42:04 UTC 2013]> select count(*) from friends;
> count
> -------
> 1
> (1 row)
>
> Time: 4.698 ms
>
> postgres@vpc-xc-coord-02:5432::boxes_production=[Wed Jun 5 08:42:25 UTC 2013]# select count(*) from friends;
> count
> -------
> 41416
> (1 row)
>
> postgres@vpc-xc-coord-03:5432::boxes_production=[Wed Jun 5 08:43:01 UTC 2013]# select count(*) from friends;
> count
> -------
> 41416
> (1 row)
>
> postgres@vpc-xc-coord-04:5432::boxes_production=[Wed Jun 5 08:42:53 UTC 2013]# select count(*) from friends;
> count
> -------
> 41416
> (1 row)
>
> # identical configs :
> postgres@vpc-xc-coord-01:[Wed Jun 05 08:53:58][/usr/local/pgsql]$ md5sum coordinator.postgresql.conf
> b7f61b5d8baeec83cd82d9f1ee744728 coordinator.postgresql.conf
> postgres@vpc-xc-coord-02:[Wed Jun 05 08:53:53][~]$ md5sum coordinator.postgresql.conf
> b7f61b5d8baeec83cd82d9f1ee744728 coordinator.postgresql.conf
>
> postgres@vpc-xc-coord-01:[Wed Jun 05 08:54:58][/usr/local/pgsql]$ md5sum datanode.postgresql.conf
> 00d6a5736b6401dc6cc3d820fb412082 datanode.postgresql.conf
> postgres@vpc-xc-coord-02:[Wed Jun 05 08:54:37][~]$ md5sum datanode.postgresql.conf
> 00d6a5736b6401dc6cc3d820fb412082 datanode.postgresql.conf
>
> postgres@vpc-xc-coord-01:[Wed Jun 05 08:55:24][/usr/local/pgsql]$ md5sum /usr/local/pgsql/gtm.include.conf
> a6e7c3a21958a23bfb5054dc645d9576 /usr/local/pgsql/gtm.include.conf
> postgres@vpc-xc-coord-02:[Wed Jun 05 08:55:01][~]$ md5sum /usr/local/pgsql/gtm.include.conf
> a6e7c3a21958a23bfb5054dc645d9576 /usr/local/pgsql/gtm.include.conf
>
>
> ====================================================================
> I
> In general I suspect wrong configuration, but atm can not find any.
>
> I did a test importing the same db at the second XC node, and its the same issue: the fixt XC gets eveyrthing replicated but the actual data.
>
> My plan is to launch new clone instance of the vpc-xc-coord-02 as a replacement of the vpc-xc-coord-01, but this is the desperate plan C
> I want to know what happens .. :)
>
>
> Kind regards,
>
> Iván
>
>
>
>
> ------------------------------------------------------------------------------
> How ServiceNow helps IT people transform IT departments:
> 1. A cloud service to automate IT design, transition and operations
> 2. Dashboards that offer high-level views of enterprise services
> 3. A single system of record for all IT processes
> http://p.sf.net/sfu/servicenow-d2d-j
> _______________________________________________
> Postgres-xc-general mailing list
> Pos...@li... <mailto:Pos...@li...>
> https://lists.sourceforge.net/lists/listinfo/postgres-xc-general
>
>
>
>
> --
> Andrei Martsinchyk
>
> StormDB - http://www.stormdb.com <http://www.stormdb.com/>
> The Database Cloud
>
>
> ------------------------------------------------------------------------------
> How ServiceNow helps IT people transform IT departments:
> 1. A cloud service to automate IT design, transition and operations
> 2. Dashboards that offer high-level views of enterprise services
> 3. A single system of record for all IT processes
> http://p.sf.net/sfu/servicenow-d2d-j
> _______________________________________________
> Postgres-xc-general mailing list
> Pos...@li... <mailto:Pos...@li...>
> https://lists.sourceforge.net/lists/listinfo/postgres-xc-general
>
>
>
>
> --
> Mason Sharp
>
> StormDB - http://www.stormdb.com
> The Database Cloud
> Postgres-XC Support and Services
|