|
From: Ning L. <nin...@gm...> - 2008-04-17 21:34:46
|
On Thu, Apr 10, 2008 at 7:42 PM, Doug Cutting <cu...@ap...> wrote: > So, with these two simple properties, lightweight master failover and > ability to run the master on a host, means that we can choose to either > dedicate a machine to the master, or run the master daemon on every > host, switching frequently. I'd certainly like to preserve the latter > option, so lets keep it in mind as we code. Sounds good. I like the design. :) > From A's perspective, when B comes online, B's data looks fine, since > B's log is complete. But B discovers, when it talks to A, that B's data > is obsolete. If A retrieves B's data before B discovers that it is > obsolete, then A would get stale adds. So B must block retrieval of its > data until it has determined whether it is valid to all of its > neighbors. At system startup this means that all nodes must wait for > their neighbors to come online so that they can determine whether their > own state is valid before permitting any synchronization. > > Does that make any sense? Yep. My point was also that a node needs to check the neighbors to decide the validity. > I think each node can determine that on its own at startup. It > > 1. Posts its range and log start number. > 2. Waits a bit, so all other nodes have had a chance to post their data. > 3. Decides if its index is valid, by checking all overlapping node's log > start numbers & compares them with the last sync'd log number to see if > they've compacted their log. > 4. Starts syncing with its neighbors. Data in Zookeeper is persistent, right? So there are records in Zookeeper on the range, log start number and log start numbers on neighbors of a node. So what does it mean if, at startup, a node's posted numbers are newer or older than those in Zookeeper? Can the data in Zookeeper help during startup? Or do we discard those records in Zookeeper? > I worry that there might be pathological cases where different nodes > were offline and/or compacted at different times causing all replicas of > a range to be discarded. :) Hopefully this won't happen because of replication. In addition, we only throw away/re-build a node when we know we can re-build its range from some other nodes, right? Ning |