|
From: Yonik S. <yo...@ap...> - 2008-02-27 23:04:48
|
On Wed, Feb 27, 2008 at 3:58 PM, Ning Li <nin...@gm...> wrote: > At the same time, should we also discuss what the update > model should be: > 1 One updatable replica vs. all updatable replicas. The former > is simple. The latter is powerful. Is there sufficient need for > the latter to justify its complexity? We should always be able to update (so if the "updateable" replica is down, we need to be able to update another replica). Given that, what is the extra complexity of those two choices? > 2 The atomicity of an insert/delete/update operation. When > an insert/delete/update operation is done, does it mean: > 1) the new doc is indexed in the memory of the node > 2) the new doc is indexed on the local disk of the node > 3) the new doc is logged on the local disk of the node > 4) the new doc is logged in some fault-tolerant shared FS > (e.g. HDFS) > 5) the new doc is indexed in the memory of at least X > nodes > The probability of the operation getting lost is from high > to low: 1), then 2) and 3), then 4) and 5). Very good questions...and the answers are probably tied into how we solve question #1. #5 would seem to be higher performance than #3, but is it a strong enough guarantee? The power could still go out to a whole rack of systems at once. I think the decision partially depends on how much of a document storage system this is, vs just an index that can be rebuilt. Of course, the bigger an index gets, the more infeasible it becomes to do a complete rebuild. I'm tempted to lean toward #3 since logs are needed to sync up nodes (back to question #1). -Yonik |