bailey-developers Mailing List for Bailey (Page 7)

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Getting back to this at long last!  Sorry for the delays...

Ning Li wrote:
> First, let's sync a couple of index-related terms:

Good idea.  I think "node" and "shard" are confusing, and perhaps can be 
avoided in this context.

> 1) Shard: The distinct starting positions of all the virtual nodes
> divide the ring into shards. For example, starting positions
> (A B C D E) divide the ring into 5 shards: AB, BC, CD, DE, DA.

I prefer to call these "ranges".  "Shard" to me sounds like something 
physical, and ranges are not physical.

> 2) Index on a virtual node (suggest a name?): A virtual node
> serves a number of continuous shards. For example, with
> 3-way replication, the indexes on the virtual nodes are:
> AB-BC-CD, BC-CD-DE, CD-DE-EA, DE-EA-AB, EA-AB-BC.

I was just using "index" for this.  An index is a set of files on a host 
that corresponds to a range of ids.

Let's also use "node" for "virtual node" and "host" for a set of virtual 
nodes running on the same host.  A node corresponds to a point on the 
ring and may be assigned a range and maintain an index for that range. 
The range assigned to a node may change over time and it will have to 
adjust its index accordingly.

> Now, should an index on a virtual node be implemented as
> one Lucene index or N Lucene indexes (one per shard)?

My hunch is one index per range.  That way we can search a set of 
indexes that completes the ring, and search maximally large segments.

Doug