Re: [Postgres-xc-general] Composite table types, replicate + distribute.

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

On 6 July 2012 15:25, Michael Paquier <mic...@gm...> wrote:
>
>
> On Fri, Jul 6, 2012 at 2:17 PM, Ashutosh Bapat
> <ash...@en...> wrote:
>>
>> Hi Joseph,
>> I have come across this question about supporting mixed distribution
>> strategy a few times by now.
>>
>> We have to judge it's advantages (taking into consideration that there can
>> be solutions outside of core XC for the same) against the efforts required
>> for implementing and maintaining it. If the pains in a. using third
>> party/outside XC solutions 2. implementing and maintaining it in core and
>> using it are more of less same, we may have to leave it out of the core at
>> least for some near future. If we take option 2 and find that using it is
>> equally painful as the option 1, we wasted our effort. In order to judge the
>> 2nd point, we can look at some other DBMS available with these features and
>> how do they perform from various aspects. So following questions are
>> relevant :- Is there another distributed database, having a similar scheme
>> of mixed distribution available? How (and widely) is that feature being used
>> in field? What is the pain point in using such a feature?
>
> Good point here, indeed. Thanks for pointing that.

Indeed all excellent points.

In terms of how difficult it is to integrate into core/vs using other
middleware to achieve HA properties - I don't think it's easily to
come up with an answer. (atleast one that isn't highly opinionated)
I spent a few days building an XC cluster with streaming replication
for each datanode + scripting failover events and recovery etc.
The main issues I found were along the lines of lack of integration
effectively. Configuring each datanode with different wal archive
stores and recovery commands is very painful and difficult to
understand the implications of.
I did make an attempt at fixing this with even more middleware
(pgpool+repmgr) but gave up after deciding that it's far too many
moving parts for a DBMS for me to consider using it.
I just can't see how so many pieces of completely disparate software
can possibly know enough about the state of the system to make
reasonable decisions with my data, which leaves me with developing my
own manager to control them all..
Streaming replication is also quite limited as it allows you to
replicate entire nodes only.

But enough opinion. Some facts from current DBMS that are using
similar replication strategies.
I say similar because none of them have quite the same architecture to XC.

Cassandra[1] uses consistent hashing + a replica count to achieve both
horizontal partitioning and replication for read/write scalability.
This has some interesting challenges for them mostly stemming from the
cluster size changing dynamically, dealing with maintaining consistent
hashing rings and resilvering those.
In my opinion this is made harder by the fact it uses cluster gossip
without any node coordinator along with it's eventual consistency
guarantees.

Riak[2] also uses consistent hashing however based on a per 'bucket'
basis where you can set a replication count.

There are a bunch more too, like LightCloud, Voldemort, DynamoDB,
BigTable, HBase etc.

I appreciate these aren't RDBMS systems but I don't believe that is a
big deal, it's perfectly viable to have a fully horizontal scaling
RDBMS too, it just doesn't exist yet.
Infact by having proper global transaction management I think this is
made considerably easier and more reliable. Eventual consistency and
no actual master node I don't think are good concessions to make.
For the most part having a global picture of the state of all data is
probably the biggest advantage of implementing this in XC vs other
solutions.

Oher major advantages are:

a) Service impact from loss of datanodes is minimized (non-existent)
in the case of losing only replica(s) using middleware requires an
orchestrated failover
b) Time to recovery (in terms of read performance) is reduced
considerably because XC is able to implement a distributed recovery of
out of date nodes
c) Per table replication management (XC already has this but it would
be even more valuable with composite partitioning)
d) Increased read performance where replicas can be used to speed up
read heavy workloads and lessen the impact of read hotspots.
e) In band heartbeat can be used to determine fail-over requirements,
no scripting or other points of failure.
f) Components required to facilitate recovery could also be used to do
online repartitioning (ie. increasing the size of the cluster)
g) Probably the world's first real distributed RDBMS

Obvious disadvantages are:
a) Alot of work, difficult, hard etc. (this is actually the biggest
barrier, there are lots of very difficult challenges in partitioning
data)
b) Making use of most of the features of said composite table
partitioning is quite difficult, it would take a long time to optimize
the query planner to make good use of them.

There are probably more but would most probably require a proper
devils advocate to reveal them (I am human and set it my opinions
unfortunately)

> --
> Michael Paquier
> http://michael.otacoo.com

Joseph.

[1] - http://www.slideshare.net/benjaminblack/introduction-to-cassandra-replication-and-consistency
[2] - http://wiki.basho.com/Replication.html

-- 
CTO | Orion Virtualisation Solutions | www.orionvm.com.au
Phone: 1300 56 99 52 | Mobile: 0428 754 846