Re: [Postgres-xc-general] ERROR: Failed to get pooled connections

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

On Fri, Jul 20, 2012 at 5:32 PM, Vladimir Stavrinov <vst...@gm...>wrote:

> I have fresh installed XC consisting of two data nodes with all defaults
> settings. Nothing special was configured. I have created database and
> one table with one only text field. Then I inserted text string and
> tried SELECT. At this point all was OK. But after shutting down of one
> data node SELECT fails returning the message in subject. This is
> not what was I expected. DROP DATANODE doesn't help. If it is not a bug,
> then I have questions:
>
This is not a bug. What you did here was removed a component from the
cluster.
An incomplete cluster will not work.


>
> 1. What should failover and then recovery procedures be after one
> data node fails?
>
Like postgreSQL, you can attach a slave node to a datanode and then perform
a failover on it.
After the master node fails for a reason or another, you will need to
promote a slave waiting behind.
Something like pg_ctl promote -D $DN_FOLDER is enough.
This is for the Datanode side.
Then what you need to do is update the node catalogs on each Coordinator to
allow them to redirect to the new promoted node.
Let's suppose that the node that failed was called datanodeN (you need the
same node name for master and slave).
In order to do that, issue "ALTER NODE datanodeN WITH (HOST = '$new_ip',
PORT = $NEW_PORT); SELECT pgxc_pool_reload();"
Do that on each Coordinator and then the promoted slave will be visible to
each Coordinator and will be a part of cluster.


> 2. Does this means, that XC is scalable in one only direction, that is
> it can be expanded, but not shrunk? In other words, we cannot remove
> data node.
>
You can remove a Datanode, just be sure that before doing that you redirect
to an existing node the data of distributed tables.
In 1.0 you can do that with those kind of things (want to remove data from
datanodeN):
CREATE TABLE new_table TO NODE (datanode1, ... datanode(N-1),
datanode(N+1), datanodeP) AS SELECT * from old_table;
DROP TABLE old_table;
ALTER TABLE new_table RENAME TO old_table;
Once you are sure that the datanode you want to remove has no unique data
(don't care about replicated...), perform a DROP NODE on each Coordinator,
then pgxc_pool_reload() and the node will be removed correctly.
Please note that I working on a patch able to do such stuff
automatically... Will be committed soon.

>
> 3. Does this means, that without external infrastructure (like drbd +
> corosync + pacemaker)  with default setup (CREATE TABLE ...
> DISTRIBUTE BY REPLICATION) XC itself have no neither HA nor LB (at least
> for writes) capabilities?
>
Basically it has both, I know some guys who are already building an HA/LB
solution based on that...
-- 
Michael Paquier
http://michael.otacoo.com