|
From: Josh B. <jo...@ag...> - 2014-05-24 18:53:13
|
All: So, in addition to the stability issues raised at the PostgresXC summit, I need to raise something which is a deficiency of both XC and XL and should be (in my opinion) our #2 priority after stability. And that's node/shard redundancy. Right now, if single node fails, the cluster is frozen for writes ... and fails some reads ... until the node is replaced by the user from a replica. It's also not clear that we *can* actually replace a node from a replica because the replica will be async rep, and thus not at exactly the same GXID as the rest of the cluster. This makes XC a low-availability solution. The answer for this is to do the same thing which every other clustering system has done: write each shard to multiple locations. Default would be two. If each shard is present on two different nodes, then losing a node is just a performance problem, not a downtime event. Thoughts? -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com |
|
From: Koichi S. <koi...@gm...> - 2014-05-24 20:04:56
|
At present, XC advises to make a replica with synchronize replication. Pgxc_ctl configures slaves in this way. I understand that this is not for performance and we may need some other solution for this. To begin with, there are a couple of ideas for this. 1. To allow async., when a node fails, fall back whole cluster status to the latest consistent state, such as pointed by a barrier. I can provide some detailed thought on this if interesting. 2. Allow to have a copy of shards to another node at planner/executor level. 3. Implement another replication better for XC using BDR, just for distributed tables, for example. At present, XC uses hash value of the node name to determine each row location for distributed tables. For ideas 2 and 3, we need to add some infrastructure to make this allocation more flexible. Further input is welcome. Thank you. --- Koichi Suzuki 2014-05-24 14:53 GMT-04:00 Josh Berkus <jo...@ag...>: > All: > > So, in addition to the stability issues raised at the PostgresXC summit, > I need to raise something which is a deficiency of both XC and XL and > should be (in my opinion) our #2 priority after stability. And that's > node/shard redundancy. > > Right now, if single node fails, the cluster is frozen for writes ... > and fails some reads ... until the node is replaced by the user from a > replica. It's also not clear that we *can* actually replace a node from > a replica because the replica will be async rep, and thus not at exactly > the same GXID as the rest of the cluster. This makes XC a > low-availability solution. > > The answer for this is to do the same thing which every other clustering > system has done: write each shard to multiple locations. Default would > be two. If each shard is present on two different nodes, then losing a > node is just a performance problem, not a downtime event. > > Thoughts? > > -- > Josh Berkus > PostgreSQL Experts Inc. > http://pgexperts.com > > ------------------------------------------------------------------------------ > "Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE > Instantly run your Selenium tests across 300+ browser/OS combos. > Get unparalleled scalability from the best Selenium testing platform available > Simple to use. Nothing to install. Get started now for free." > http://p.sf.net/sfu/SauceLabs > _______________________________________________ > Postgres-xc-general mailing list > Pos...@li... > https://lists.sourceforge.net/lists/listinfo/postgres-xc-general |
|
From: Josh B. <jo...@ag...> - 2014-05-24 21:10:47
|
Koichi, > 1. To allow async., when a node fails, fall back whole cluster status > to the latest consistent state, such as pointed by a barrier. I can > provide some detailed thought on this if interesting. This is not interesting to me. If I have to accept major data loss for a single node failure, then I can use solutions which do not require an GTM. > 2. Allow to have a copy of shards to another node at planner/executor level. Yes. This should be at the executor level, in my opinion. All writes go to all shards and do not complete until they all succeed or the shard times out (and then is marked disabled). What to do with reads is more nuanced. If we load-balance reads, then we are increasing throughput of the cluster. If we send each read to all duplicate shards, then we are improving response times while decreasing throughput. I think that deserves some testing. > 3. Implement another replication better for XC using BDR, just for > distributed tables, for example. This has the same problems as solution #1. > At present, XC uses hash value of the node name to determine each row > location for distributed tables. For ideas 2 and 3, we need to add > some infrastructure to make this allocation more flexible. Yes. We would need a shard ID which is separate from the node name. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com |
|
From: Koichi S. <koi...@gm...> - 2014-05-24 23:03:06
|
2014-05-24 17:10 GMT-04:00 Josh Berkus <jo...@ag...>: > Koichi, > >> 1. To allow async., when a node fails, fall back whole cluster status >> to the latest consistent state, such as pointed by a barrier. I can >> provide some detailed thought on this if interesting. > > This is not interesting to me. If I have to accept major data loss for > a single node failure, then I can use solutions which do not require an GTM. > >> 2. Allow to have a copy of shards to another node at planner/executor level. > > Yes. This should be at the executor level, in my opinion. All writes > go to all shards and do not complete until they all succeed or the shard > times out (and then is marked disabled). > > What to do with reads is more nuanced. If we load-balance reads, then > we are increasing throughput of the cluster. If we send each read to > all duplicate shards, then we are improving response times while > decreasing throughput. I think that deserves some testing. Planner needs some more to choose the best one which pushdown is the best path to do. Also, to handle conflicting writes in different coordinators, we may need to define node priority where to go first. > >> 3. Implement another replication better for XC using BDR, just for >> distributed tables, for example. > > This has the same problems as solution #1. We can implement better synchronization suitable for XC need. Also, only shards can be replicated to reduce the overhead. I think this has better potential than streaming replication. Regards; --- Koichi Suzuki > >> At present, XC uses hash value of the node name to determine each row >> location for distributed tables. For ideas 2 and 3, we need to add >> some infrastructure to make this allocation more flexible. > > Yes. We would need a shard ID which is separate from the node name. > > -- > Josh Berkus > PostgreSQL Experts Inc. > http://pgexperts.com |
|
From: Josh B. <jo...@ag...> - 2014-05-26 18:03:47
|
On 05/24/2014 04:02 PM, Koichi Suzuki wrote: > 2014-05-24 17:10 GMT-04:00 Josh Berkus <jo...@ag...>: >> Koichi, >> >>> 1. To allow async., when a node fails, fall back whole cluster status >>> to the latest consistent state, such as pointed by a barrier. I can >>> provide some detailed thought on this if interesting. >> >> This is not interesting to me. If I have to accept major data loss for >> a single node failure, then I can use solutions which do not require an GTM. >> >>> 2. Allow to have a copy of shards to another node at planner/executor level. >> >> Yes. This should be at the executor level, in my opinion. All writes >> go to all shards and do not complete until they all succeed or the shard >> times out (and then is marked disabled). >> >> What to do with reads is more nuanced. If we load-balance reads, then >> we are increasing throughput of the cluster. If we send each read to >> all duplicate shards, then we are improving response times while >> decreasing throughput. I think that deserves some testing. > > Planner needs some more to choose the best one which pushdown is the > best path to do. Also, to handle conflicting writes in different > cooI'rdinators, we may need to define node priority where to go first. I guess I'm not clear on how we could have a conflict in the first place? As far as reads are concerned, I can only see two options: 1) Push read down to one random shard -- maximizes throughput 2) Push read down to all shards, take first response -- minimizes response time The choice of (1) or (2) is application-specific, so ultimately I think we will need to implement both and allow the user to choose, maybe as a config option. The best functionality, of course, would be to provide the user an option to choose as a userset GUC, so that they could switch on a per-query basis. >> >>> 3. Implement another replication better for XC using BDR, just for >>> distributed tables, for example. >> >> This has the same problems as solution #1. > > We can implement better synchronization suitable for XC need. Also, > only shards can be replicated to reduce the overhead. I think this > has better potential than streaming replication. Ah, ok. When you get back home, maybe you can sketch out what you're thinking with BDR? -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com |
|
From: Tim U. <tim...@gm...> - 2014-05-27 00:10:10
|
Would it be possible to keep the shards in multiple data nodes so that if one data node failed you'd just replace it when you can get around to it. Elasticsearch uses this strategy. On Sun, May 25, 2014 at 8:04 AM, Koichi Suzuki <koi...@gm...>wrote: > At present, XC advises to make a replica with synchronize replication. > Pgxc_ctl configures slaves in this way. > > I understand that this is not for performance and we may need some > other solution for this. > > To begin with, there are a couple of ideas for this. > > 1. To allow async., when a node fails, fall back whole cluster status > to the latest consistent state, such as pointed by a barrier. I can > provide some detailed thought on this if interesting. > > 2. Allow to have a copy of shards to another node at planner/executor > level. > > 3. Implement another replication better for XC using BDR, just for > distributed tables, for example. > > At present, XC uses hash value of the node name to determine each row > location for distributed tables. For ideas 2 and 3, we need to add > some infrastructure to make this allocation more flexible. > > Further input is welcome. > > Thank you. > --- > Koichi Suzuki > > > 2014-05-24 14:53 GMT-04:00 Josh Berkus <jo...@ag...>: > > All: > > > > So, in addition to the stability issues raised at the PostgresXC summit, > > I need to raise something which is a deficiency of both XC and XL and > > should be (in my opinion) our #2 priority after stability. And that's > > node/shard redundancy. > > > > Right now, if single node fails, the cluster is frozen for writes ... > > and fails some reads ... until the node is replaced by the user from a > > replica. It's also not clear that we *can* actually replace a node from > > a replica because the replica will be async rep, and thus not at exactly > > the same GXID as the rest of the cluster. This makes XC a > > low-availability solution. > > > > The answer for this is to do the same thing which every other clustering > > system has done: write each shard to multiple locations. Default would > > be two. If each shard is present on two different nodes, then losing a > > node is just a performance problem, not a downtime event. > > > > Thoughts? > > > > -- > > Josh Berkus > > PostgreSQL Experts Inc. > > http://pgexperts.com > > > > > ------------------------------------------------------------------------------ > > "Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE > > Instantly run your Selenium tests across 300+ browser/OS combos. > > Get unparalleled scalability from the best Selenium testing platform > available > > Simple to use. Nothing to install. Get started now for free." > > http://p.sf.net/sfu/SauceLabs > > _______________________________________________ > > Postgres-xc-general mailing list > > Pos...@li... > > https://lists.sourceforge.net/lists/listinfo/postgres-xc-general > > > ------------------------------------------------------------------------------ > "Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE > Instantly run your Selenium tests across 300+ browser/OS combos. > Get unparalleled scalability from the best Selenium testing platform > available > Simple to use. Nothing to install. Get started now for free." > http://p.sf.net/sfu/SauceLabs > _______________________________________________ > Postgres-xc-general mailing list > Pos...@li... > https://lists.sourceforge.net/lists/listinfo/postgres-xc-general > |
|
From: Koichi S. <koi...@gm...> - 2014-05-27 01:59:03
|
Year, this is what needed now. I think we should make some more discussion on it to get the best approach. I understood that streaming replication is not for this usecase and we need something better. As Josh suggested, I'd like to have much more idea/requirement on this. Thank you; --- Koichi Suzuki 2014-05-27 9:10 GMT+09:00 Tim Uckun <tim...@gm...>: > Would it be possible to keep the shards in multiple data nodes so that if > one data node failed you'd just replace it when you can get around to it. > > Elasticsearch uses this strategy. > > > On Sun, May 25, 2014 at 8:04 AM, Koichi Suzuki <koi...@gm...> > wrote: >> >> At present, XC advises to make a replica with synchronize replication. >> Pgxc_ctl configures slaves in this way. >> >> I understand that this is not for performance and we may need some >> other solution for this. >> >> To begin with, there are a couple of ideas for this. >> >> 1. To allow async., when a node fails, fall back whole cluster status >> to the latest consistent state, such as pointed by a barrier. I can >> provide some detailed thought on this if interesting. >> >> 2. Allow to have a copy of shards to another node at planner/executor >> level. >> >> 3. Implement another replication better for XC using BDR, just for >> distributed tables, for example. >> >> At present, XC uses hash value of the node name to determine each row >> location for distributed tables. For ideas 2 and 3, we need to add >> some infrastructure to make this allocation more flexible. >> >> Further input is welcome. >> >> Thank you. >> --- >> Koichi Suzuki >> >> >> 2014-05-24 14:53 GMT-04:00 Josh Berkus <jo...@ag...>: >> > All: >> > >> > So, in addition to the stability issues raised at the PostgresXC summit, >> > I need to raise something which is a deficiency of both XC and XL and >> > should be (in my opinion) our #2 priority after stability. And that's >> > node/shard redundancy. >> > >> > Right now, if single node fails, the cluster is frozen for writes ... >> > and fails some reads ... until the node is replaced by the user from a >> > replica. It's also not clear that we *can* actually replace a node from >> > a replica because the replica will be async rep, and thus not at exactly >> > the same GXID as the rest of the cluster. This makes XC a >> > low-availability solution. >> > >> > The answer for this is to do the same thing which every other clustering >> > system has done: write each shard to multiple locations. Default would >> > be two. If each shard is present on two different nodes, then losing a >> > node is just a performance problem, not a downtime event. >> > >> > Thoughts? >> > >> > -- >> > Josh Berkus >> > PostgreSQL Experts Inc. >> > http://pgexperts.com >> > >> > >> > ------------------------------------------------------------------------------ >> > "Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE >> > Instantly run your Selenium tests across 300+ browser/OS combos. >> > Get unparalleled scalability from the best Selenium testing platform >> > available >> > Simple to use. Nothing to install. Get started now for free." >> > http://p.sf.net/sfu/SauceLabs >> > _______________________________________________ >> > Postgres-xc-general mailing list >> > Pos...@li... >> > https://lists.sourceforge.net/lists/listinfo/postgres-xc-general >> >> >> ------------------------------------------------------------------------------ >> "Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE >> Instantly run your Selenium tests across 300+ browser/OS combos. >> Get unparalleled scalability from the best Selenium testing platform >> available >> Simple to use. Nothing to install. Get started now for free." >> http://p.sf.net/sfu/SauceLabs >> _______________________________________________ >> Postgres-xc-general mailing list >> Pos...@li... >> https://lists.sourceforge.net/lists/listinfo/postgres-xc-general > > |
|
From: Josh B. <jo...@ag...> - 2014-05-27 23:42:38
|
On 05/26/2014 05:10 PM, Tim Uckun wrote: > Would it be possible to keep the shards in multiple data nodes so that if > one data node failed you'd just replace it when you can get around to it. Right, that's what I was describing. It's the standard way to deal with redundancy in any parallel sharded database. The drawback is that it increases write times, because you can't count a write as saved unless all shards return. This is why some databases (e.g. Cassandra) go one step further and have, say, each shard in 3 places and only 2 need to return. That gets way complicated though. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com |
|
From: Michael P. <mic...@gm...> - 2014-05-28 00:50:38
|
On Wed, May 28, 2014 at 8:42 AM, Josh Berkus <jo...@ag...> wrote: > On 05/26/2014 05:10 PM, Tim Uckun wrote: >> Would it be possible to keep the shards in multiple data nodes so that if >> one data node failed you'd just replace it when you can get around to it. > > Right, that's what I was describing. It's the standard way to deal with > redundancy in any parallel sharded database. > > The drawback is that it increases write times, because you can't count a > write as saved unless all shards return. This is why some databases > (e.g. Cassandra) go one step further and have, say, each shard in 3 > places and only 2 need to return. That gets way complicated though. That's an additional overhead for 2PC as you need to wait for all the nodes where a write occurred to have executed PREPARE. This also drops any hopes to have autocommit write transactions, sinking performance down particularly for OLTP apps, smth that XC plays well with by design. -- Michael |
|
From: Koichi S. <koi...@gm...> - 2014-05-28 02:33:43
|
We can reduce the additional latency by performing prepare and commit in parallel, I mean, sending command to all the target remote nodes first and then receive their response afterwords. As I supposed, alternative is to use BDR. This has very small overhead. We can detect conflicting writes among transactions and if each transaction does not conflict, we can apply these writes in parallel, not in a single thread as we're doing in streaming replication. This needs some more work and I think this worth to spend some time. Regards; --- Koichi Suzuki 2014-05-28 9:50 GMT+09:00 Michael Paquier <mic...@gm...>: > On Wed, May 28, 2014 at 8:42 AM, Josh Berkus <jo...@ag...> wrote: >> On 05/26/2014 05:10 PM, Tim Uckun wrote: >>> Would it be possible to keep the shards in multiple data nodes so that if >>> one data node failed you'd just replace it when you can get around to it. >> >> Right, that's what I was describing. It's the standard way to deal with >> redundancy in any parallel sharded database. >> >> The drawback is that it increases write times, because you can't count a >> write as saved unless all shards return. This is why some databases >> (e.g. Cassandra) go one step further and have, say, each shard in 3 >> places and only 2 need to return. That gets way complicated though. > That's an additional overhead for 2PC as you need to wait for all the > nodes where a write occurred to have executed PREPARE. This also drops > any hopes to have autocommit write transactions, sinking performance > down particularly for OLTP apps, smth that XC plays well with by > design. > -- > Michael |
|
From: Josh B. <jo...@ag...> - 2014-05-28 16:27:25
|
On 05/27/2014 07:33 PM, Koichi Suzuki wrote: > We can reduce the additional latency by performing prepare and commit > in parallel, I mean, sending command to all the target remote nodes > first and then receive their response afterwords. > > As I supposed, alternative is to use BDR. This has very small > overhead. We can detect conflicting writes among transactions and if > each transaction does not conflict, we can apply these writes in > parallel, not in a single thread as we're doing in streaming > replication. Mind you, like BDR itself, we'll still need to figure out how to handle DDL. > > This needs some more work and I think this worth to spend some time. Yes. Otherwise we have two unpalatable choices: - Massive data loss (roll back to barrier) every time we lose a node, or - Doubling write latency (at least) -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com |
|
From: 鈴木 幸市 <ko...@in...> - 2014-05-29 01:12:12
|
2014/05/29 1:27、Josh Berkus <jo...@ag...> のメール: > On 05/27/2014 07:33 PM, Koichi Suzuki wrote: >> We can reduce the additional latency by performing prepare and commit >> in parallel, I mean, sending command to all the target remote nodes >> first and then receive their response afterwords. >> >> As I supposed, alternative is to use BDR. This has very small >> overhead. We can detect conflicting writes among transactions and if >> each transaction does not conflict, we can apply these writes in >> parallel, not in a single thread as we're doing in streaming >> replication. > > Mind you, like BDR itself, we'll still need to figure out how to handle DDL. Thank you for the info. I believe this issue is shared with other use cases such as SLONY. Are there any other discussion how to handle this? > >> >> This needs some more work and I think this worth to spend some time. > > Yes. Otherwise we have two unpalatable choices: > > - Massive data loss (roll back to barrier) every time we lose a node, or > - Doubling write latency (at least) In the case of statement-based redundancy, we need to determine what node to write first (at least) and this should be the same in all the nodes. It is needed to handle conflicting wrights consistently. This means the first write has to be done synchronously and the rest can be done asynchronously. Because most of the i/o work is done at prepare/commit, I hope this does not impact the whole throughput or latency badly. Regards; — Koichi Suzuki > > -- > Josh Berkus > PostgreSQL Experts Inc. > http://pgexperts.com > > ------------------------------------------------------------------------------ > Time is money. Stop wasting it! Get your web API in 5 minutes. > www.restlet.com/download > http://p.sf.net/sfu/restlet > _______________________________________________ > Postgres-xc-general mailing list > Pos...@li... > https://lists.sourceforge.net/lists/listinfo/postgres-xc-general > |
|
From: Ashutosh B. <ash...@en...> - 2014-05-29 03:53:17
|
On Thu, May 29, 2014 at 6:41 AM, 鈴木 幸市 <ko...@in...> wrote: > 2014/05/29 1:27、Josh Berkus <jo...@ag...> のメール: > > > On 05/27/2014 07:33 PM, Koichi Suzuki wrote: > >> We can reduce the additional latency by performing prepare and commit > >> in parallel, I mean, sending command to all the target remote nodes > >> first and then receive their response afterwords. > >> > >> As I supposed, alternative is to use BDR. This has very small > >> overhead. We can detect conflicting writes among transactions and if > >> each transaction does not conflict, we can apply these writes in > >> parallel, not in a single thread as we're doing in streaming > >> replication. > > > > Mind you, like BDR itself, we'll still need to figure out how to handle > DDL. > > Thank you for the info. I believe this issue is shared with other use > cases such as SLONY. Are there any other discussion how to handle this? > > > > >> > >> This needs some more work and I think this worth to spend some time. > > > > Yes. Otherwise we have two unpalatable choices: > > > > - Massive data loss (roll back to barrier) every time we lose a node, or > > - Doubling write latency (at least) > > In the case of statement-based redundancy, we need to determine what node > to write first (at least) and this should be the same in all the nodes. > It is needed to handle conflicting wrights consistently. This means the > first write has to be done synchronously and the rest can be done > asynchronously. Because most of the i/o work is done at prepare/commit, I > hope this does not impact the whole throughput or latency badly. > > I am doubtful as to whether the eventual consistency scheme will work in XC. If we commit a change to two nodes, and do not wait for the third node and the third node is not able to apply the change, we will have an inconsistency on the third node. This inconsistency may not be repairable depending upon the failure to write to that node. If the two nodes fail, the data that we will retrieve from the third node would be inconsistent. So, probably there is no point in keeping the third copy. That means that the cluster can tolerate N failures, where N is the number of synchronous copies made. > Regards; > — > Koichi Suzuki > > > > -- > > Josh Berkus > > PostgreSQL Experts Inc. > > http://pgexperts.com > > > > > ------------------------------------------------------------------------------ > > Time is money. Stop wasting it! Get your web API in 5 minutes. > > www.restlet.com/download > > http://p.sf.net/sfu/restlet > > _______________________________________________ > > Postgres-xc-general mailing list > > Pos...@li... > > https://lists.sourceforge.net/lists/listinfo/postgres-xc-general > > > > > ------------------------------------------------------------------------------ > Time is money. Stop wasting it! Get your web API in 5 minutes. > www.restlet.com/download > http://p.sf.net/sfu/restlet > _______________________________________________ > Postgres-xc-general mailing list > Pos...@li... > https://lists.sourceforge.net/lists/listinfo/postgres-xc-general > -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company |
|
From: 鈴木 幸市 <ko...@in...> - 2014-05-29 04:59:32
|
2014/05/29 12:29、Ashutosh Bapat <ash...@en...<mailto:ash...@en...>> のメール: On Thu, May 29, 2014 at 6:41 AM, 鈴木 幸市 <ko...@in...<mailto:ko...@in...>> wrote: 2014/05/29 1:27、Josh Berkus <jo...@ag...<mailto:jo...@ag...>> のメール: > On 05/27/2014 07:33 PM, Koichi Suzuki wrote: >> We can reduce the additional latency by performing prepare and commit >> in parallel, I mean, sending command to all the target remote nodes >> first and then receive their response afterwords. >> >> As I supposed, alternative is to use BDR. This has very small >> overhead. We can detect conflicting writes among transactions and if >> each transaction does not conflict, we can apply these writes in >> parallel, not in a single thread as we're doing in streaming >> replication. > > Mind you, like BDR itself, we'll still need to figure out how to handle DDL. Thank you for the info. I believe this issue is shared with other use cases such as SLONY. Are there any other discussion how to handle this? > >> >> This needs some more work and I think this worth to spend some time. > > Yes. Otherwise we have two unpalatable choices: > > - Massive data loss (roll back to barrier) every time we lose a node, or > - Doubling write latency (at least) In the case of statement-based redundancy, we need to determine what node to write first (at least) and this should be the same in all the nodes. It is needed to handle conflicting wrights consistently. This means the first write has to be done synchronously and the rest can be done asynchronously. Because most of the i/o work is done at prepare/commit, I hope this does not impact the whole throughput or latency badly. I am doubtful as to whether the eventual consistency scheme will work in XC. If we commit a change to two nodes, and do not wait for the third node and the third node is not able to apply the change, we will have an inconsistency on the third node. This inconsistency may not be repairable depending upon the failure to write to that node. If the two nodes fail, the data that we will retrieve from the third node would be inconsistent. So, probably there is no point in keeping the third copy. That means that the cluster can tolerate N failures, where N is the number of synchronous copies made. The above is synchronous because prepares and commits are done in synchronous way, by receiving all the response at last. No prepare/commit response is left outstanding. All prepare responses will be handled before subsequent commot/abort. All commit/abort will be handled before beginning next transaction. The idea is to do things in parallel only when it makes sense. When an error occurs in prepare, we can abort it and no inconsistency will be left. If one of following commit fails, this is not expected but may happen when system level failure occurs, such node should be detached from the cluster and need repair. If abort fails during aborting failed prepare, this is also a failure to the node and such node should be detached from the cluster. One thing we should think here is: we don’t have good means to simply repair such failed node. It seems to me that we have to copy everything from a model node. This takes time. Maybe we need something like pg_rewind in statement-based replication or other tool to make a replica quickly. Regards; --- Koichi Suzuki Regards; — Koichi Suzuki > > -- > Josh Berkus > PostgreSQL Experts Inc. > http://pgexperts.com<http://pgexperts.com/> > > ------------------------------------------------------------------------------ > Time is money. Stop wasting it! Get your web API in 5 minutes. > www.restlet.com/download<http://www.restlet.com/download> > http://p.sf.net/sfu/restlet > _______________________________________________ > Postgres-xc-general mailing list > Pos...@li...<mailto:Pos...@li...> > https://lists.sourceforge.net/lists/listinfo/postgres-xc-general > ------------------------------------------------------------------------------ Time is money. Stop wasting it! Get your web API in 5 minutes. www.restlet.com/download<http://www.restlet.com/download> http://p.sf.net/sfu/restlet _______________________________________________ Postgres-xc-general mailing list Pos...@li...<mailto:Pos...@li...> https://lists.sourceforge.net/lists/listinfo/postgres-xc-general -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company |