|
From: Ning L. <nin...@gm...> - 2008-03-31 20:39:50
|
First, summarize the terms used: 1 A node consists of a key/hash range, a database, and a log. 2 A node periodically processes logs of the nodes with overlapping ranges. A node remembers the entry number Ei it processes of an overlapping node Ni and would start from (Ei)+1 next time. 3 A node checkpoint consists of the node's range, a database checkpoint, a log entry number E and log entry numbers Ei's for the overlapping nodes. A checkpoint provides a consistent view of the node. There are two types of load balancing: moving a node from one host to another, and adding/removing a node. Let's see if we can agree on the simpler one first. Protocol for moving a node (N) from one host (A) to another (B): Note: No other load balancing can be started for the nodes overlapping with node N. 1 Copy the latest checkpoint C of node N to host B. 2 Node N on host B has an empty log. It starts to process logs of the nodes with overlapping ranges, including node N on host A. 3 The master commits the change to the ring - officially node N is on host B, not on host A. 4 Node N on host A stops processing logs of the nodes with overlapping ranges. 5 After all the overlapping nodes finish processing node N's log on host A, node N is removed from host A. Step 3 is the commit point. If host A or host B goes down before that, the move fails. Otherwise, the move succeeds. The move should also be considered fail if any overlapping nodes goes down before the commit point to reduce the complexity? Sounds right? Ning |