|
From: 鈴木 幸市 <ko...@in...> - 2014-05-29 04:59:32
|
2014/05/29 12:29、Ashutosh Bapat <ash...@en...<mailto:ash...@en...>> のメール: On Thu, May 29, 2014 at 6:41 AM, 鈴木 幸市 <ko...@in...<mailto:ko...@in...>> wrote: 2014/05/29 1:27、Josh Berkus <jo...@ag...<mailto:jo...@ag...>> のメール: > On 05/27/2014 07:33 PM, Koichi Suzuki wrote: >> We can reduce the additional latency by performing prepare and commit >> in parallel, I mean, sending command to all the target remote nodes >> first and then receive their response afterwords. >> >> As I supposed, alternative is to use BDR. This has very small >> overhead. We can detect conflicting writes among transactions and if >> each transaction does not conflict, we can apply these writes in >> parallel, not in a single thread as we're doing in streaming >> replication. > > Mind you, like BDR itself, we'll still need to figure out how to handle DDL. Thank you for the info. I believe this issue is shared with other use cases such as SLONY. Are there any other discussion how to handle this? > >> >> This needs some more work and I think this worth to spend some time. > > Yes. Otherwise we have two unpalatable choices: > > - Massive data loss (roll back to barrier) every time we lose a node, or > - Doubling write latency (at least) In the case of statement-based redundancy, we need to determine what node to write first (at least) and this should be the same in all the nodes. It is needed to handle conflicting wrights consistently. This means the first write has to be done synchronously and the rest can be done asynchronously. Because most of the i/o work is done at prepare/commit, I hope this does not impact the whole throughput or latency badly. I am doubtful as to whether the eventual consistency scheme will work in XC. If we commit a change to two nodes, and do not wait for the third node and the third node is not able to apply the change, we will have an inconsistency on the third node. This inconsistency may not be repairable depending upon the failure to write to that node. If the two nodes fail, the data that we will retrieve from the third node would be inconsistent. So, probably there is no point in keeping the third copy. That means that the cluster can tolerate N failures, where N is the number of synchronous copies made. The above is synchronous because prepares and commits are done in synchronous way, by receiving all the response at last. No prepare/commit response is left outstanding. All prepare responses will be handled before subsequent commot/abort. All commit/abort will be handled before beginning next transaction. The idea is to do things in parallel only when it makes sense. When an error occurs in prepare, we can abort it and no inconsistency will be left. If one of following commit fails, this is not expected but may happen when system level failure occurs, such node should be detached from the cluster and need repair. If abort fails during aborting failed prepare, this is also a failure to the node and such node should be detached from the cluster. One thing we should think here is: we don’t have good means to simply repair such failed node. It seems to me that we have to copy everything from a model node. This takes time. Maybe we need something like pg_rewind in statement-based replication or other tool to make a replica quickly. Regards; --- Koichi Suzuki Regards; — Koichi Suzuki > > -- > Josh Berkus > PostgreSQL Experts Inc. > http://pgexperts.com<http://pgexperts.com/> > > ------------------------------------------------------------------------------ > Time is money. Stop wasting it! Get your web API in 5 minutes. > www.restlet.com/download<http://www.restlet.com/download> > http://p.sf.net/sfu/restlet > _______________________________________________ > Postgres-xc-general mailing list > Pos...@li...<mailto:Pos...@li...> > https://lists.sourceforge.net/lists/listinfo/postgres-xc-general > ------------------------------------------------------------------------------ Time is money. Stop wasting it! Get your web API in 5 minutes. www.restlet.com/download<http://www.restlet.com/download> http://p.sf.net/sfu/restlet _______________________________________________ Postgres-xc-general mailing list Pos...@li...<mailto:Pos...@li...> https://lists.sourceforge.net/lists/listinfo/postgres-xc-general -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company |