Re: [bailey-developers] lattice master

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

On Feb 8, 2008 3:37 PM, Doug Cutting <cu...@ap...> wrote:

> So the master is the manager of a lattice

While I'm trying to figure out completely how it would work, would
consistent
hashing and making the master the manager of a ring work and be simpler?
(Reference consistent hashing and Dynamo's partitioning algorithm.)

As in consistent hashing, the entire range of docids (or their hashes) is
treated
as a ring. There is no clear concept of a shard here. A node is assigned a
value
(starting position) on the ring and it hosts an index for a range of docids
starting
from the starting position. The system cannot start until any range of
docids is
covered by at least one node. During normal state, any range of docids is
covered
by multiple nodes, depending on the level of replication.

So the master becomes the manager of a ring in this case. Given the ring, a
search client sends a query to a (hopefully minimal) set of nodes whose
ranges
cover the ring. When sending the query to a node in the set, the search
client
also specifies the sub-range of docids on which the query should be
executed.
This is to make sure that any range of docids is queried and only queried
once.

The above requires that a Lucene index can efficiently support query on a
sub-
range of docids - application/system docids, not Lucene docids. That can be
achieved by a bit modification to Lucene so that within a segment, Lucene
docids
are assigned in the same order as their corresponding application/system
docids
during build/merge. A nice side-effect of this is that it becomes efficient
to delete
a sub-range of docids from an index as well.

Thoughts?

Regards,
Ning