|
From: Josh B. <jo...@ag...> - 2014-05-26 18:03:47
|
On 05/24/2014 04:02 PM, Koichi Suzuki wrote: > 2014-05-24 17:10 GMT-04:00 Josh Berkus <jo...@ag...>: >> Koichi, >> >>> 1. To allow async., when a node fails, fall back whole cluster status >>> to the latest consistent state, such as pointed by a barrier. I can >>> provide some detailed thought on this if interesting. >> >> This is not interesting to me. If I have to accept major data loss for >> a single node failure, then I can use solutions which do not require an GTM. >> >>> 2. Allow to have a copy of shards to another node at planner/executor level. >> >> Yes. This should be at the executor level, in my opinion. All writes >> go to all shards and do not complete until they all succeed or the shard >> times out (and then is marked disabled). >> >> What to do with reads is more nuanced. If we load-balance reads, then >> we are increasing throughput of the cluster. If we send each read to >> all duplicate shards, then we are improving response times while >> decreasing throughput. I think that deserves some testing. > > Planner needs some more to choose the best one which pushdown is the > best path to do. Also, to handle conflicting writes in different > cooI'rdinators, we may need to define node priority where to go first. I guess I'm not clear on how we could have a conflict in the first place? As far as reads are concerned, I can only see two options: 1) Push read down to one random shard -- maximizes throughput 2) Push read down to all shards, take first response -- minimizes response time The choice of (1) or (2) is application-specific, so ultimately I think we will need to implement both and allow the user to choose, maybe as a config option. The best functionality, of course, would be to provide the user an option to choose as a userset GUC, so that they could switch on a per-query basis. >> >>> 3. Implement another replication better for XC using BDR, just for >>> distributed tables, for example. >> >> This has the same problems as solution #1. > > We can implement better synchronization suitable for XC need. Also, > only shards can be replicated to reduce the overhead. I think this > has better potential than streaming replication. Ah, ok. When you get back home, maybe you can sketch out what you're thinking with BDR? -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com |