|
From: Koichi S. <koi...@gm...> - 2014-05-24 23:03:06
|
2014-05-24 17:10 GMT-04:00 Josh Berkus <jo...@ag...>: > Koichi, > >> 1. To allow async., when a node fails, fall back whole cluster status >> to the latest consistent state, such as pointed by a barrier. I can >> provide some detailed thought on this if interesting. > > This is not interesting to me. If I have to accept major data loss for > a single node failure, then I can use solutions which do not require an GTM. > >> 2. Allow to have a copy of shards to another node at planner/executor level. > > Yes. This should be at the executor level, in my opinion. All writes > go to all shards and do not complete until they all succeed or the shard > times out (and then is marked disabled). > > What to do with reads is more nuanced. If we load-balance reads, then > we are increasing throughput of the cluster. If we send each read to > all duplicate shards, then we are improving response times while > decreasing throughput. I think that deserves some testing. Planner needs some more to choose the best one which pushdown is the best path to do. Also, to handle conflicting writes in different coordinators, we may need to define node priority where to go first. > >> 3. Implement another replication better for XC using BDR, just for >> distributed tables, for example. > > This has the same problems as solution #1. We can implement better synchronization suitable for XC need. Also, only shards can be replicated to reduce the overhead. I think this has better potential than streaming replication. Regards; --- Koichi Suzuki > >> At present, XC uses hash value of the node name to determine each row >> location for distributed tables. For ideas 2 and 3, we need to add >> some infrastructure to make this allocation more flexible. > > Yes. We would need a shard ID which is separate from the node name. > > -- > Josh Berkus > PostgreSQL Experts Inc. > http://pgexperts.com |