|
From: Ning L. <nin...@gm...> - 2008-02-15 21:35:21
|
First, let's sync a couple of index-related terms: 1) Shard: The distinct starting positions of all the virtual nodes divide the ring into shards. For example, starting positions (A B C D E) divide the ring into 5 shards: AB, BC, CD, DE, DA. 2) Index on a virtual node (suggest a name?): A virtual node serves a number of continuous shards. For example, with 3-way replication, the indexes on the virtual nodes are: AB-BC-CD, BC-CD-DE, CD-DE-EA, DE-EA-AB, EA-AB-BC. Now, should an index on a virtual node be implemented as one Lucene index or N Lucene indexes (one per shard)? Using N Lucene indexes has its up side. It's less expensive when a virtual node is added or removed - often a shard is shipped or deleted instead of shipping a virtual node index or splitting and shipping... And often we don't have to filter when querying a sub-range of a virtual node index. The down side is that a virtual node has to manage multiple Lucene indexes... What do you think? Regards, Ning |