Re: [bailey-developers] lattice master

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Yonik Seeley wrote:
> On Wed, Feb 27, 2008 at 3:58 PM, Ning Li <nin...@gm...> wrote:
>>  At the same time, should we also discuss what the update
>>  model should be:
>>  1 One updatable replica vs. all updatable replicas. The former
>>    is simple. The latter is powerful. Is there sufficient need for
>>    the latter to justify its complexity?
> 
> We should always be able to update (so if the "updateable" replica is
> down, we need to be able to update another replica).

I agree.  To support network partitions, all nodes must accept updates 
for documents in their range.

>>  2 The atomicity of an insert/delete/update operation. When
>>    an insert/delete/update operation is done, does it mean:
>>    1) the new doc is indexed in the memory of the node
>>    2) the new doc is indexed on the local disk of the node
>>    3) the new doc is logged on the local disk of the node
>>    4) the new doc is logged in some fault-tolerant shared FS
>>       (e.g. HDFS)
>>    5) the new doc is indexed in the memory of at least X
>>       nodes
>>    The probability of the operation getting lost is from high
>>    to low: 1), then 2) and 3), then 4) and 5).
> 
> [ ... ] I think the decision partially depends on how much of
> a document storage system this is, vs just an index that can be
> rebuilt.

I'd like to position this against document databases, so I'm hoping it 
can be used as a primary storage.

> I'm tempted to lean toward #3 since logs are needed to sync up nodes
> (back to question #1).

It would be a nice feature if we could arrange so that, in most cases, 
the client that adds a document sees it in search results immediately. 
We cannot guarantee that all other clients will see it.  Some sort of 
immediate indexing of the document is required to support this feature, 
but in-memory is sufficient.  We may not implement this feature right 
off, but we should keep it in mind.

Logging is attractive, since it permits easy replaying of logs when 
shipping updates between nodes.  Perhaps we can instead use queries to 
enumerate changes, but that requires more thought.

As for disk versus memory: if we only send updates to a single node in a 
document's range, then we should sync them to disk.  If we instead send 
updates to multiple nodes in the range, then it's probably okay not to 
sync, since we already assume that not all nodes in a range will fail at 
once.  The downside of this is that documents could be lost in the case 
of a datacenter-wide powerfailure, but I think that's acceptable.

Performance will suffer considerably if we have to sync on each add.  So 
my inclination is to attempt to add documents to several nodes in the 
range and not require a sync per add, buffering things in memory as 
required for good performance.  Replication in memory provides 
fault-tolerance.

Doug