Re: [Postgres-xc-general] Our general use case

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

On Thu, Oct 25, 2012 at 5:41 AM, David Hofstee <pg...@c0...> wrote:

> **
>
> Hi,
>
> I've been reading the '*ERROR: Failed to get pooled connections*' thread
> about what XC should and should not do. I opted to start a new thread
> (instead of replying) about how I would like XC to be.
>
> Some background. I work for a SaaS company (mostly dev, some ops) which
> has to be online 24/7. We are now running apache/tomcat/mysql for each set
> of customers on about 30 nodes and we want to centralize and make our
> application more robust, efficient and simple. It basically means creating
> layers: LB, webservers, application servers, database cluster. Some easy
> parts are already done (haproxy, nginx). Our 'platform' is pretty complex
> and I have so many tasks, I prefer to *not* dig into details. We are now
> discussing the db issue (mysql cluster is not that great).
>
> My dream DB cluster:
>
Scalability - that means read and write scalability. XC should do that
> right now. Nice.
>
> High availability - a node can go offline and it should not hinder
> availability (only processing capacity)
>
> Maintainability - Since maintenance/change is our primary cause of
> downtime, it should be possible to kill a node and add it later. This can
> be because the VM is being moved, the OS is updated/upgraded, etc. Also,
> think about how a cluster is updated from major version to major version
> (lets say 9.x to 10.x). Maybe that is not an issue (but I don't know about
> it yet).
>
> Simplicity - It would be nice if the default package+config file is all I
> need. If it is too complex I cannot go on holidays. Some points:
>
>    - I read that *'...even the stock postgresql.conf configuration file
>    is pretty conservative and users tweak it as per their requirements...*'.
>    For me that translates as 'if you are new to Postgres it works bad'. Not
>    simple (for e.g. some of our dev-ers).
>    - For HA* '...Like Postgres, you need an external application to
>    provide it'*. When using a cluster I think HA is very often wanted. I
>    need to explain all this to every ops-colleague of mine and some are not
>    very accurate. Not simple again.
>
> XC is a fork of Postgres and we try to share the same philosophy as the
parent project about being really conservative on the things that should or
should not be added in core.
For example, let's take the case of HA. It is of course possible to
implement an HA solution directly in the core of XC, but there are 2 things
that would go against that:
1) It is not our goal to oblige the users to user an HA solution or
another, and I do not believe that it is the role of core people to
integrate directly in XC core a solution that might be good for a certain
type of applications, without caring of the other types of applications.
Postgres is popular because it lets all the users free to use what they
want, and depending on the application people want to use with XC, they
might prefer an HA solution or another.
2) If in the future Postgres integrates a native HA solution (I do not
believe it will be the case as the community is really conservative, but
let's assume), and if XC had a some point integrated an HA solution
directly in its core, we would certainly have to drop the XC solution and
rely on the Postgres solution as XC is a fork of Postgres. This would be a
waste of time for the core people who integrated the HA solution, and
people merging Postgres code with XC. One of the reasons explaining that XC
is able to keep up with Postgres code pace easily is that we avoid to
implement solutions in core that might impact unnecessarily its
interactions with Postgres.

>
>
> Quick setup - I want to setup an NxM cluster quickly (N times duplication
> for HA, M times distributed writes for performance). I prefer to setup a
> single node with a given config file, add nodes and be ready to go. Maybe
> an hour in case of disaster recovery?
>
There are already tools about that like this one written in Ruby:
https://sourceforge.net/projects/postgres-xc/files/misc/pgxc_config_v0_9_3.tar.gz/download
It is not maintained since 0.9.3 as this is not honestly a part of core.
You might have a look at it.

> Managability - I want to manage a cluster easily (add node, remove node,
> spare nodes, monitoring, ...). It cannot be simple enough.
>
Sure. I don't know about any utilities able to do that, but if you could
build a utility like this running on top of XC and sell it, well you might
be able to make some money if XC becomes popular, what is not really the
case now ;)

> Backup - I'm not familiar with running backups on Postgres but we
> currently run a blocking backup on the mysql, for consistency, and it
> causes issues. We use Bacula on a file level. Which brings up a question:
> How do you backup a cluster (if you don't know which nodes are hot)?
>
In the case of XC, you might directly take a dump from a Coordinator with
pg_dump, and then restore the dump file with pg_restore. You might want to
use archive files.
There are many ways to accomplish that, like in Postgres. The only
difference in the case of XC is that you need to do that for each node as
architecture is shared nothing.

> Logging - Yes...
>
> Some may respond that things are not that simple. I know. But I still want
> it to be simple. It would make PGXC a no-brainer for everyone. Thanks for
> listening and keep up the good work! I appreciate it.
>
There are already utilities implemented for Postgres that can work natively
with XC, like for logging you might want to use log analyzers like pgbadger.
You should have a look at that first for each thing you want to do, then
evaluate the effort necessary to achieve each of your goals.

Thanks,
-- 
Michael Paquier
http://michael.otacoo.com