|
From: Ning L. <nin...@gm...> - 2008-02-10 19:07:15
|
On Feb 8, 2008 3:37 PM, Doug Cutting <cu...@ap...> wrote: > So the master is the manager of a lattice While I'm trying to figure out completely how it would work, would consistent hashing and making the master the manager of a ring work and be simpler? (Reference consistent hashing and Dynamo's partitioning algorithm.) As in consistent hashing, the entire range of docids (or their hashes) is treated as a ring. There is no clear concept of a shard here. A node is assigned a value (starting position) on the ring and it hosts an index for a range of docids starting from the starting position. The system cannot start until any range of docids is covered by at least one node. During normal state, any range of docids is covered by multiple nodes, depending on the level of replication. So the master becomes the manager of a ring in this case. Given the ring, a search client sends a query to a (hopefully minimal) set of nodes whose ranges cover the ring. When sending the query to a node in the set, the search client also specifies the sub-range of docids on which the query should be executed. This is to make sure that any range of docids is queried and only queried once. The above requires that a Lucene index can efficiently support query on a sub- range of docids - application/system docids, not Lucene docids. That can be achieved by a bit modification to Lucene so that within a segment, Lucene docids are assigned in the same order as their corresponding application/system docids during build/merge. A nice side-effect of this is that it becomes efficient to delete a sub-range of docids from an index as well. Thoughts? Regards, Ning |