Re: [bailey-developers] lattice master

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

On Wed, Feb 27, 2008 at 3:58 PM, Ning Li <nin...@gm...> wrote:
>  At the same time, should we also discuss what the update
>  model should be:
>  1 One updatable replica vs. all updatable replicas. The former
>    is simple. The latter is powerful. Is there sufficient need for
>    the latter to justify its complexity?

We should always be able to update (so if the "updateable" replica is
down, we need to be able to update another replica).  Given that, what
is the extra complexity of those two choices?

>  2 The atomicity of an insert/delete/update operation. When
>    an insert/delete/update operation is done, does it mean:
>    1) the new doc is indexed in the memory of the node
>    2) the new doc is indexed on the local disk of the node
>    3) the new doc is logged on the local disk of the node
>    4) the new doc is logged in some fault-tolerant shared FS
>       (e.g. HDFS)
>    5) the new doc is indexed in the memory of at least X
>       nodes
>    The probability of the operation getting lost is from high
>    to low: 1), then 2) and 3), then 4) and 5).

Very good questions...and the answers are probably tied into how we
solve question #1.
#5 would seem to be higher performance than #3, but is it a strong
enough guarantee?  The power could still go out to a whole rack of
systems at once. I think the decision partially depends on how much of
a document storage system this is, vs just an index that can be
rebuilt.  Of course, the bigger an index gets, the more infeasible it
becomes to do a complete rebuild.

I'm tempted to lean toward #3 since logs are needed to sync up nodes
(back to question #1).

-Yonik