Re: [bailey-developers] lattice master

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Yonik Seeley wrote:
>>  I'm assuming a 1:1 mapping between index and node.  So if a host has 100
>>  nodes on it, then it would have 100 indexes.  I'm also imagining more
>>  like 4 nodes/indexes per host, not 100.  Does that address your concern?
> 
> Then the issue then would be that there are not enough nodes per host
> to evenly partition the keys.

We'll see... I'm hoping we can get the master to help out with 
allocations to keep things fairly even.  The master could even 
periodically reassign nodes that are underloaded to overloaded areas of 
the ring.

> I'm still missing what we gain by having them coupled.

I don't understand the alternative.  We want to divide things evenly, 
but also be able to gracefully incorporate new hosts and dead hosts 
without re-allocating everything.  We must arrange things so that the 
ranges served by a node remain disjoint for this reason too, so that a 
host failure doesn't increase more than around 25% the load of any other 
single host.

Are you arguing that non-random range boundaries would work better than 
random ones?  (Maybe I should go re-read your message...)  I don't think 
that's at odds with anything I've proposed.  I think the master needs to 
be in charge of placing nodes on the ring, determining how many nodes a 
host serves, etc.

> This isn't consistent hashing, in my understanding of it though, since
> placement of some nodes depends on others.

Okay, it's not consistent hashing, whatever.

I don't see how to do this without a master coordinating things (e.g., 
telling clients what to search).  The master shouldn't be involved in 
individual searches or document additions, nor should it ideally have 
much persistent state, but beyond that, I don't see a reason not to let 
it arrange things optimally.

> With just a list of nodes, it doesn't seem like a different host would
> be able to reconstruct the ring.

Right, the master would need to give clients the list of nodes, each 
with the range of documents that its currently able to search, and 
(perhaps separately) the range of documents its currently able to index. 
  I guess, if we wanted to diverge more from consistent hashing, we 
could have each node serve a set of ranges, to give the master more 
freedom to shuffle things when a node fails.  That'd be okay with me.

If we don't use consistent hashing then we have to develop some other 
scheme for allocation and deallocation of ids to nodes that has all of 
the good properties we need from consistent hashing and additional 
improvements.  So it seems reasonable to start with consistent hashing 
and diverge only as needed.

Doug