|
From: Yonik S. <yo...@ap...> - 2008-02-29 13:27:43
|
On Thu, Feb 28, 2008 at 7:08 PM, Doug Cutting <cu...@ap...> wrote: > Yonik Seeley wrote: > > I'm still missing what we gain by having them coupled. > > I don't understand the alternative. We want to divide things evenly, > but also be able to gracefully incorporate new hosts and dead hosts > without re-allocating everything. We must arrange things so that the > ranges served by a node remain disjoint for this reason too, so that a > host failure doesn't increase more than around 25% the load of any other > single host. Here is an example of how to separate key partitioning from replication (single node per host to make things easier for now): Key partitioning: when hostX is added to the cluster, add 100 nodes (vnodeX1, vnodeX2, vnodeX3...) to the hash ring for partitioning keys. This steals a little bit from every other node. There is no replication concept in this ring. Replication: add a single nodeX to the replication hash ring. The two clockwise nodes replicate this node, and the two nodes in the other direction should be replicated by this node. This still works if each node serves M indexes (4 is what you suggested I think). Add 4 nodes to the replication ring, and for each of those nodes, determine it's key-space with the key ring. With these concerns separated, either could be changed or adjusted independently. I'm not advocating that specific method of key partitioning though... while it is more fair (a new node steals documents from all other nodes), that could also be a drawback depending on exactly how the system (rebalancing) works. > > This isn't consistent hashing, in my understanding of it though, since > > placement of some nodes depends on others. > > Okay, it's not consistent hashing, whatever. > > I don't see how to do this without a master coordinating things (e.g., > telling clients what to search). The master shouldn't be involved in > individual searches or document additions, nor should it ideally have > much persistent state, but beyond that, I don't see a reason not to let > it arrange things optimally. OK, that changes things... I wasn't clear on exactly why consistent hashing was used. In this case it sounds like more of an implementation detail rather than part of the interface (which it is in some other systems). > > With just a list of nodes, it doesn't seem like a different host would > > be able to reconstruct the ring. > > Right, the master would need to give clients the list of nodes, each > with the range of documents that its currently able to search, and > (perhaps separately) the range of documents its currently able to index. Would this info be obtainable from any node? It seems like this should be possible. > I guess, if we wanted to diverge more from consistent hashing, we > could have each node serve a set of ranges, to give the master more > freedom to shuffle things when a node fails. That'd be okay with me. When a new master comes up, it seems like nodes would need to report their actual ranges anyway. > If we don't use consistent hashing then we have to develop some other > scheme for allocation and deallocation of ids to nodes that has all of > the good properties we need from consistent hashing and additional > improvements. So it seems reasonable to start with consistent hashing > and diverge only as needed. That's fine... it sounds like from everything above that this won't be exposed to anyone but the master anyway. -Yonik |