|
From: Aris S. <ari...@gm...> - 2012-07-04 12:16:08
|
Hi Koichi, > Maybe multiple distribution, for example, CREATE TABLE T ... > DISTRIBUTE BY HASH(a) TO (node1, node2), REPLICATE TO (node3); We can declare a node explicitly here, because in the future we must support "online new joined node on the fly". This is my suggestion: CREATE TABLE T ... DISTRIBUTE BY HASH(a), HASH(b), K-SAFETY 1; With K-SAFETY=1, its mean that we have 1 replica in each partition. With K-SAFETY=2, its mean that we have 2 replica in each partition. With K-SAFETY=3, its mean that we have 3 replica in each partition. This terminology is used in h-store: in-memory, ACID, cluster. H-store achieve durability not using disk write, but with replication. With K-SAFETY=1, a row is considered durable if that row already (in memory) written to at least 2 node. May be we can get some input from h-store (or voltdb) design. http://hstore.cs.brown.edu/publications/ What do you think? On 7/4/12, Koichi Suzuki <koi...@gm...> wrote: > 2012/7/4 Michael Paquier <mic...@gm...>: >> >> >> On Wed, Jul 4, 2012 at 2:31 PM, Joseph Glanville >> <jos...@or...> wrote: >>> >>> Hey guys, >>> >>> This is more of a feature request/question regarding how HA could be >>> implemented with PostgreXC in the future. >>> >>> Could it be possible to have a composite table type which could >>> replicate to X nodes and distribute to Y nodes in such a way that >>> atleast X copies of every row is maintained but the table is shareded >>> across Y data nodes. >> >> The answer is yes. It is possible. >>> >>> >>> For example in a cluster of 6 nodes one would be able configure at >>> table with REPLICATION 2, DISTRIBUTE 3 BY HASH etc (I can't remember >>> what the table definitions look like) as such that the table would be >>> replicated to 2 sets of 3 nodes. >> >> As you seem to be aware of, now XC only supports horizontal partitioning, >> meaning that tuples are present on each node in a complete form with all >> the >> column data. >> So let's call call your feature partial horizontal partitioning... Or >> something like this. > > Maybe multiple distribution, for example, CREATE TABLE T ... > DISTRIBUTE BY HASH(a) TO (node1, node2), REPLICATE TO (node3); > > This has another application like > > CREATE TABLE T ... DISTRIBUTED BY HASH(a), HASH(b); > > In this case, we can choose what distribution is more suitable for > SELECT statement. If WHERE T.a = xxx, then we can choose HASH(a) > distribution and if WHERE T.b=yyy, then choose HASH(b). > > This is not only for HA arrangement but can enable more sophisticated > query planning. > > Vertical partitioning is another issue and could be very challenging. > >> >>> >>> This is interesting becaues it can provide a flexible tradeoff between >>> full write scalability (current PostgresXC distribute) and full read >>> scalability (PostgresXC replicate or other slave solutions) >>> What is most useful about this setup is using PostgresXC this can be >>> maintained transparently without middleware and configured to be fully >>> sync multi-master etc. >> >> Do you have some example of applications that may require that? >> >>> >>> >>> Are there significant technical challenges to the above and is this >>> something the PostgresXC team would be interested in? >> >> The code would need to be changed at many places and might require some >> effort especially for cursors and join determination at planner side. >> >> Another critical choice I see here is related to the preferential >> strategy >> for node choice. >> For example, in your case, the table is replicated on 3 nodes, and >> distributed on 3 nodes by hash. >> When a simple read query arrives at XC level, we need to make XC aware of >> which set of nodes to choose in priority. >> A simple session parameter which is table-based could manage that though, >> but is it user-friendly? >> A way to choose the set of nodes automatically would be to evaluate with >> a >> global system of statistics the load on each table of read/write >> operations >> for each set of nodes and choose the set of nodes the less loaded at the >> moment query is fired when planning it. This is largely more complicated >> however. >> -- >> Michael Paquier >> http://michael.otacoo.com >> >> ------------------------------------------------------------------------------ >> Live Security Virtual Conference >> Exclusive live event will cover all the ways today's security and >> threat landscape has changed and how IT managers can respond. Discussions >> will include endpoint security, mobile security and the latest in malware >> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ >> _______________________________________________ >> Postgres-xc-general mailing list >> Pos...@li... >> https://lists.sourceforge.net/lists/listinfo/postgres-xc-general >> > > ------------------------------------------------------------------------------ > Live Security Virtual Conference > Exclusive live event will cover all the ways today's security and > threat landscape has changed and how IT managers can respond. Discussions > will include endpoint security, mobile security and the latest in malware > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > _______________________________________________ > Postgres-xc-general mailing list > Pos...@li... > https://lists.sourceforge.net/lists/listinfo/postgres-xc-general > |