Re: [Postgres-xc-general] Composite table types, replicate + distribute.

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

2012/7/4 Michael Paquier <mic...@gm...>:
>
>
> On Wed, Jul 4, 2012 at 2:31 PM, Joseph Glanville
> <jos...@or...> wrote:
>>
>> Hey guys,
>>
>> This is more of a feature request/question regarding how HA could be
>> implemented with PostgreXC in the future.
>>
>> Could it be possible to have a composite table type which could
>> replicate to X nodes and distribute to Y nodes in such a way that
>> atleast X copies of every row is maintained but the table is shareded
>> across Y data nodes.
>
> The answer is yes. It is possible.
>>
>>
>> For example in a cluster of 6 nodes one would be able configure at
>> table with REPLICATION 2, DISTRIBUTE 3 BY HASH etc (I can't remember
>> what the table definitions look like) as such that the table would be
>> replicated to 2 sets of 3 nodes.
>
> As you seem to be aware of, now XC only supports horizontal partitioning,
> meaning that tuples are present on each node in a complete form with all the
> column data.
> So let's call call your feature partial horizontal partitioning... Or
> something like this.

Maybe multiple distribution, for example, CREATE TABLE T ...
DISTRIBUTE BY HASH(a) TO (node1, node2), REPLICATE TO (node3);

This has another application like

CREATE TABLE T ... DISTRIBUTED BY HASH(a), HASH(b);

In this case, we can choose what distribution is more suitable for
SELECT statement.   If WHERE T.a = xxx, then we can choose HASH(a)
distribution and if WHERE T.b=yyy, then choose HASH(b).

This is not only for HA arrangement but can enable more sophisticated
query planning.

Vertical partitioning is another issue and could be very challenging.

>
>>
>> This is interesting becaues it can provide a flexible tradeoff between
>> full write scalability (current PostgresXC distribute) and full read
>> scalability (PostgresXC replicate or other slave solutions)
>> What is most useful about this setup is using PostgresXC this can be
>> maintained transparently without middleware and configured to be fully
>> sync multi-master etc.
>
> Do you have some example of applications that may require that?
>
>>
>>
>> Are there significant technical challenges to the above and is this
>> something the PostgresXC team would be interested in?
>
> The code would need to be changed at many places and might require some
> effort especially for cursors and join determination at planner side.
>
> Another critical choice I see here is related to the preferential strategy
> for node choice.
> For example, in your case, the table is replicated on 3 nodes, and
> distributed on 3 nodes by hash.
> When a simple read query arrives at XC level, we need to make XC aware of
> which set of nodes to choose in priority.
> A simple session parameter which is table-based could manage that though,
> but is it user-friendly?
> A way to choose the set of nodes automatically would be to evaluate with a
> global system of statistics the load on each table of read/write operations
> for each set of nodes and choose the set of nodes the less loaded at the
> moment query is fired when planning it. This is largely more complicated
> however.
> --
> Michael Paquier
> http://michael.otacoo.com
>
> ------------------------------------------------------------------------------
> Live Security Virtual Conference
> Exclusive live event will cover all the ways today's security and
> threat landscape has changed and how IT managers can respond. Discussions
> will include endpoint security, mobile security and the latest in malware
> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
> _______________________________________________
> Postgres-xc-general mailing list
> Pos...@li...
> https://lists.sourceforge.net/lists/listinfo/postgres-xc-general
>