|
From: Doug C. <cu...@ap...> - 2008-02-28 21:31:18
|
Getting back to this at long last! Sorry for the delays... Ning Li wrote: > First, let's sync a couple of index-related terms: Good idea. I think "node" and "shard" are confusing, and perhaps can be avoided in this context. > 1) Shard: The distinct starting positions of all the virtual nodes > divide the ring into shards. For example, starting positions > (A B C D E) divide the ring into 5 shards: AB, BC, CD, DE, DA. I prefer to call these "ranges". "Shard" to me sounds like something physical, and ranges are not physical. > 2) Index on a virtual node (suggest a name?): A virtual node > serves a number of continuous shards. For example, with > 3-way replication, the indexes on the virtual nodes are: > AB-BC-CD, BC-CD-DE, CD-DE-EA, DE-EA-AB, EA-AB-BC. I was just using "index" for this. An index is a set of files on a host that corresponds to a range of ids. Let's also use "node" for "virtual node" and "host" for a set of virtual nodes running on the same host. A node corresponds to a point on the ring and may be assigned a range and maintain an index for that range. The range assigned to a node may change over time and it will have to adjust its index accordingly. > Now, should an index on a virtual node be implemented as > one Lucene index or N Lucene indexes (one per shard)? My hunch is one index per range. That way we can search a set of indexes that completes the ring, and search maximally large segments. Doug |