|
From: Doug C. <cu...@ap...> - 2008-03-31 21:41:34
|
Ning Li wrote: > There are two types of load balancing: moving a node from one > host to another, and adding/removing a node. Let's see if we > can agree on the simpler one first. I'd thought mostly in terms of adding/removing. If a hot spot is identified on the ring, then it should be split by adding a new node in its range. The node can be allocated by finding a cool spot and removing a node there. But moving a node from a hot host to a cooler host might be faster, since the index could be copied rather than reconstructed from logs. However this might not be enough to handle uneven loading of the ring. For that we really need to be able to change the partitioning by placing new nodes in hot spots. > Protocol for moving a node (N) from one host (A) to another (B): > Note: No other load balancing can be started for the nodes > overlapping with node N. > 1 Copy the latest checkpoint C of node N to host B. > 2 Node N on host B has an empty log. It starts to process logs > of the nodes with overlapping ranges, including node N on > host A. > 3 The master commits the change to the ring - officially node N > is on host B, not on host A. > 4 Node N on host A stops processing logs of the nodes with > overlapping ranges. > 5 After all the overlapping nodes finish processing node N's > log on host A, node N is removed from host A. > > Step 3 is the commit point. If host A or host B goes down > before that, the move fails. Otherwise, the move succeeds. > The move should also be considered fail if any overlapping > nodes goes down before the commit point to reduce the > complexity? That sounds right. In terms of hostToMaster protocol, there are four messages: - Master tells B to start copying N (in response to heartbeat) - B tells Master that it now serves N - Master tells A to remove N (in heartbeat response) - A tells Master it no longer serves N The master will have to keep track of what moves are in flight, so that it doesn't schedule too many in one region and risk losing data, right? I think I'd rather see us implement add/remove balancing rather than copying first, since, as mentioned above, I think it is more general. It also means that there's much less chance that the same range/node will exist on two hosts at once. The communication with the master would be much the same. Assume that N is an underloaded node, and that M is an overloaded point on the ring. 1. Master tells N's neighbors to expand to cover N 2. N's neighbors tell Master that N is now redundant 3. Master tells A to stop serving N 4. Master tells A to start serving M 5. A tells master it's now serving M The master must keep track of what nodes it is trying to decommission, and avoid trying to decommission more than one in any overlapping region at a time. It also has to keep track of new nodes requests that it has made, so that it doesn't issue them to multiple nodes. Heartbeats should refresh this state, i.e., a host should list its pending ranges as well as its complete ranges. I think that's sufficient state for the master. Does that sound right? The master might try to keep the cluster less than fully occupied so that it can more quickly respond to hotspots and failures. For example, if each host can serve five nodes, it might normally use only four, so that each always has a spare slot. Then the above steps would happen as 4, 5, 1, 2, 3, first addressing the hot spot, then the cool spot. Doug |