|
From: <pos...@gz...> - 2013-08-12 14:59:22
|
All 8 datanodes and the coordinator responded, but not the gtm.
psql -p 6666 -c 'select 1'
psql: could not connect to server: No such file or directory
Is the server running locally and accepting
connections on Unix domain socket "/tmp/.s.PGSQL.6666"?
If I do a ps aux | grep gtm I get this:
/usr/local/pgsql/bin/gtm -D /var/lib/pgsql/9.2/postgres-xc/data_gtm
So it is running. If I tail gtm.log I get lots of these (one every
couple seconds):
1:139982497588992:2013-08-12 10:54:49.677 EDT -LOG: Any GTM standby
node not found in registered node(s).
LOCATION: gtm_standby_connect_to_standby_int, gtm_standby.c:378
1:139981775435520:2013-08-12 10:54:49.677 EDT -LOG: Assigning new
transaction ID = 79640
LOCATION: GTM_GetGlobalTransactionIdMulti, gtm_txn.c:581
1:139981775435520:2013-08-12 10:54:49.677 EDT -LOG: Sending transaction
id 79640
LOCATION: ProcessBeginTransactionGetGXIDCommand, gtm_txn.c:1172
1:139981775435520:2013-08-12 10:54:49.677 EDT -LOG: Received
transaction ID 79640 for snapshot obtention
LOCATION: ProcessGetSnapshotCommand, gtm_snap.c:307
1:139981775435520:2013-08-12 10:54:49.679 EDT -LOG: Committing
transaction id 79640
LOCATION: ProcessCommitTransactionCommand, gtm_txn.c:1592
1:139981775435520:2013-08-12 10:54:49.679 EDT -LOG: Cleaning up thread
state
LOCATION: GTM_ThreadCleanup, gtm_thread.c:265
If I do:
select * from pgxc_node;
I get:
node_name | node_type | node_port | node_host | nodeis_primary |
nodeis_preferred | node_id
-----------+-----------+-----------+-----------+----------------+------------------+-------------
datanode1 | D | 15432 | localhost | f |
f | 888802358
datanode2 | D | 15433 | localhost | f |
f | -905831925
datanode3 | D | 15434 | localhost | f |
f | -1894792127
datanode4 | D | 15435 | localhost | f |
f | -1307323892
datanode5 | D | 15436 | localhost | f |
f | 1797586929
datanode6 | D | 15437 | localhost | f |
f | 587455710
datanode7 | D | 15438 | localhost | f |
f | -1685037427
datanode8 | D | 15439 | localhost | f |
f | -993847320
coord1 | C | 5477 | localhost | f |
f | 1885696643
(9 rows)
Is the GTM supposed to be listed there?
Here are the values defined in gtm.conf (everything else is commented out):
nodename = 'one'
port = 6666
Anything else I can check?
Thanks,
Brian
On 8/8/13 8:53 PM, Koichi Suzuki wrote:
> Could you check the log of the coordinator you connected? This may
> tell something about the issue.
>
> Also, could you check if all the nodes (gtm, coordinators and
> datanodes) are working correctly? You can do it with psql like:
>
> psql -p port -h host -c 'select 1'
>
> where port and host is these of each node you're testing. If there're
> anything wrong, psql will report an error.
>
> Regards;
> ---
> Koichi Suzuki
>
>
>
> 2013/8/9 <pos...@gz...
> <mailto:pos...@gz...>>
>
> We have a server with 64 cores and 384GB of RAM. We'd like to
> take advantage of that hardware to speed up some queries that take
> 8+ hours to run, and Postgres-XC seems like a good fit.
>
> I've setup a cluster of 8 data nodes (I'll increase that to 48 for
> real usage), 1 coordinator, and 1 GTM, all running on the same
> physical server. I'm using 1.1 beta, on Postgres 9.2.
>
> Here are the relevant commands (near-identical repeated commands
> are omitted):
> /usr/local/pgsql/bin/initdb -D
> /var/lib/pgsql/9.2/postgres-xc/data_coord1 --nodename coord1
> /usr/local/pgsql/bin/initdb -D
> /var/lib/pgsql/9.2/postgres-xc/data_datanode1 --nodename datanode1
> ...
> /usr/local/pgsql/bin/initdb -D
> /var/lib/pgsql/9.2/postgres-xc/data_datanode8 --nodename datanode8
> /usr/local/pgsql/bin/initgtm -D
> /var/lib/pgsql/9.2/postgres-xc/data_gtm -Z gtm
> /usr/local/pgsql/bin/gtm -D
> /var/lib/pgsql/9.2/postgres-xc/data_gtm >>
> /var/lib/pgsql/9.2/postgres-xc/logfile 2>&1 &
> /usr/local/pgsql/bin/postgres -X -p 15432 -D
> /var/lib/pgsql/9.2/postgres-xc/data_datanode1 >>
> /var/lib/pgsql/9.2/postgres-xc/logfile 2>&1 &
> ...
> /usr/local/pgsql/bin/postgres -X -p 15439 -D
> /var/lib/pgsql/9.2/postgres-xc/data_datanode8 >>
> /var/lib/pgsql/9.2/postgres-xc/logfile 2>&1 &
> /usr/local/pgsql/bin/postgres -C -p 5477 -D
> /var/lib/pgsql/9.2/postgres-xc/data_coord1 >>
> /var/lib/pgsql/9.2/postgres-xc/logfile 2>&1 &
> /usr/local/pgsql/bin/psql -p 5477 -c "CREATE NODE datanode1 WITH
> (TYPE = 'datanode', PORT = 15432)" postgres
> ...
> /usr/local/pgsql/bin/psql -p 5477 -c "CREATE NODE datanode8 WITH
> (TYPE = 'datanode', PORT = 15439)" postgres
> /usr/local/pgsql/bin/psql -p 5477 -c "SELECT pgxc_pool_reload()"
> postgres
> /usr/local/pgsql/bin/createdb -p 5477 test
> /usr/local/pgsql/bin/psql -p 5477 test
>
> I then created the following tables:
> CREATE TABLE trails1 (
> id text,
> a_lat double precision,
> a_long double precision,
> b_lat double precision,
> b_long double precision,
> trail_id character varying(20),
> type character varying(4),
> distance numeric(10,5)
> );
> CREATE INDEX table1_a_lat ON table1 USING btree (a_lat);
> CREATE INDEX table1_a_long ON table1 USING btree (a_long);
> CREATE INDEX table1_b_lat ON table1 USING btree (b_lat);
> CREATE INDEX table1_b_long ON table1 USING btree (b_long);
> CREATE INDEX table1_type ON table1 USING btree (type);
> CREATE INDEX table1_distance ON table1 USING btree (distance);
>
> CREATE TABLE trails2 (
> a_lat double precision,
> a_long double precision,
> b_lat double precision,
> b_long double precision,
> type character varying(5),
> distance numeric(16,8)
> );
> CREATE INDEX table2_a_lat ON table2 USING btree (a_lat);
> CREATE INDEX table2_a_long ON table2 USING btree (a_long);
> CREATE INDEX table2_b_lat ON table2 USING btree (b_lat);
> CREATE INDEX table2_b_long ON table2 USING btree (b_long);
> CREATE INDEX table2_type ON table2 USING btree (type);
> CREATE INDEX table2_distance ON table2 USING btree (distance);
>
> I think I should have had something in the table definition about
> how to partition the data.
>
> I populated them using copy, with 331,106 rows in table1, and
> 1,124,421 rows in table2.
>
> Simple queries return the right values:
>
> select count(*) as count from table1;
> 331106
>
> select count(*) as count from table2;
> 1124421
>
> When I do a big query (the one that normally takes hours to
> complete) it is only using one CPU core. I expected to see
> Postgres processes using near 100% CPU on 8 cores.
>
> SELECT
> count(*) as count
> FROM
> trails1
> WHERE
> not exists (
> SELECT
> 'x'
> FROM
> trails2
> WHERE
> trails2.a_lat >= trails1.a_lat - 0.000833 AND
> trails2.a_lat <= trails1.a_lat + 0.000833 AND
> trails2.a_long >= trails1.a_long - 0.000833 AND
> trails2.a_long <= trails1.a_long + 0.000833 AND
> trails2.b_lat >= trails1.b_lat - 0.000833 AND
> trails2.b_lat <= trails1.b_lat + 0.000833 AND
> trails2.b_long >= trails1.b_long - 0.000833 AND
> trails2.b_long <= trails1.b_long + 0.000833 AND
> (
> trails2.type = trails1.type OR
> trails2.type = 'S'
> ) AND
> trails2.distance >= trails1.distance - 1.0 AND
> trails2.distance <= trails1.distance + 1.0
> );
>
> I haven't let it run to completion. After seeing the lack of CPU
> usage I went looking for other problems (and found them).
>
> If I reload the pool and then check it, I get true:
> SELECT pgxc_pool_reload();
> t
>
> SELECT pgxc_pool_check();
> t
>
> But after I interact with the data at all, it fails:
> select count(*) as count from table1;
> 331106
>
> SELECT pgxc_pool_check();
> f
>
> So I don't think it's working properly. The logs don't show anything.
>
> Any suggestions?
>
> Thanks!
>
>
>
>
>
>
> ------------------------------------------------------------------------------
> Get 100% visibility into Java/.NET code with AppDynamics Lite!
> It's a free troubleshooting tool designed for production.
> Get down to code-level detail for bottlenecks, with <2% overhead.
> Download for free and get started troubleshooting in minutes.
> http://pubads.g.doubleclick.net/gampad/clk?id=48897031&iu=/4140/ostg.clktrk
> _______________________________________________
> Postgres-xc-general mailing list
> Pos...@li...
> <mailto:Pos...@li...>
> https://lists.sourceforge.net/lists/listinfo/postgres-xc-general
>
>
|