Re: [bailey-developers] lattice master

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

On Tue, Feb 26, 2008 at 1:21 PM, Yonik Seeley <yo...@ap...> wrote:
> I think one problem we are running into is due to using the same hash
>  ring for both partitioning the key space and selecting placement of
>  those keys.  Putting many virtual nodes on the hash ring to make the
>  hash function fair, in conjunction with using that hash ring to do
>  replication, causes all the indicies to be mixed together.

What we have discussed addressed partitioning and replication:
1 We used consistent hashing to partition keys into ranges.
   We may call a range "shard". The important thing is, it is
   a one-to-one mapping between ranges and virtual nodes.
2 We described one replication scheme on how a virtual node
   replicates its range/shard on N-1 virtual nodes with replication
   level set to N.

You described how to partition keys into ranges/shards. And
how ranges/shards map to virtual nodes is to be figured out.

I think what we have discussed is a reasonable scheme. But
I'm always open to new ideas. :)

At the same time, should we also discuss what the update
model should be:
1 One updatable replica vs. all updatable replicas. The former
   is simple. The latter is powerful. Is there sufficient need for
   the latter to justify its complexity?
2 The atomicity of an insert/delete/update operation. When
   an insert/delete/update operation is done, does it mean:
   1) the new doc is indexed in the memory of the node
   2) the new doc is indexed on the local disk of the node
   3) the new doc is logged on the local disk of the node
   4) the new doc is logged in some fault-tolerant shared FS
      (e.g. HDFS)
   5) the new doc is indexed in the memory of at least X
      nodes
   The probability of the operation getting lost is from high
   to low: 1), then 2) and 3), then 4) and 5).

Regards,
Ning