Menu

#436 Availability improvement needed

Development Queue
open
nobody
6
2013-07-16
2013-07-16
No

Pavan Deolasee reported the following issue:

I wonder how do we currently handle node failures in XC setup. I understand, we recommend setting up replicated slaves for all the datanodes and adjust the coordinator node information in case of fail over, say by using ALTER NODE. What I am more curious is about working at a reduced availability. Let me give an example:

Say I have a table distributed by HASH on 4 nodes. If one of those nodes go down, I would still like to use the table by somehow adjusting the metadata at the coordinator. The data stored on the failed node will not be accessible, but I should be able to query the data from the remaining nodes. So SELECTs should not fail or at the least SELECT with a qualification which filters out data from the failed node should not fail. Also, INSERT/UPDATE/DELETE should succeed as long as the target node is up and running.

I tried to run a few queries to see if this is possible, but could not get the desired result. I only had limited success with replicated tables, in a sense that I could query the table after dropping the failed node at the coordinator. But even that looks fragile.

Discussion


Log in to post a comment.