|
From: Doug C. <cu...@ap...> - 2008-02-29 00:08:36
|
Yonik Seeley wrote: >> I'm assuming a 1:1 mapping between index and node. So if a host has 100 >> nodes on it, then it would have 100 indexes. I'm also imagining more >> like 4 nodes/indexes per host, not 100. Does that address your concern? > > Then the issue then would be that there are not enough nodes per host > to evenly partition the keys. We'll see... I'm hoping we can get the master to help out with allocations to keep things fairly even. The master could even periodically reassign nodes that are underloaded to overloaded areas of the ring. > I'm still missing what we gain by having them coupled. I don't understand the alternative. We want to divide things evenly, but also be able to gracefully incorporate new hosts and dead hosts without re-allocating everything. We must arrange things so that the ranges served by a node remain disjoint for this reason too, so that a host failure doesn't increase more than around 25% the load of any other single host. Are you arguing that non-random range boundaries would work better than random ones? (Maybe I should go re-read your message...) I don't think that's at odds with anything I've proposed. I think the master needs to be in charge of placing nodes on the ring, determining how many nodes a host serves, etc. > This isn't consistent hashing, in my understanding of it though, since > placement of some nodes depends on others. Okay, it's not consistent hashing, whatever. I don't see how to do this without a master coordinating things (e.g., telling clients what to search). The master shouldn't be involved in individual searches or document additions, nor should it ideally have much persistent state, but beyond that, I don't see a reason not to let it arrange things optimally. > With just a list of nodes, it doesn't seem like a different host would > be able to reconstruct the ring. Right, the master would need to give clients the list of nodes, each with the range of documents that its currently able to search, and (perhaps separately) the range of documents its currently able to index. I guess, if we wanted to diverge more from consistent hashing, we could have each node serve a set of ranges, to give the master more freedom to shuffle things when a node fails. That'd be okay with me. If we don't use consistent hashing then we have to develop some other scheme for allocation and deallocation of ids to nodes that has all of the good properties we need from consistent hashing and additional improvements. So it seems reasonable to start with consistent hashing and diverge only as needed. Doug |