Re: [bailey-developers] SF.net SVN: bailey: [17] trunk/src

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Ning Li wrote:
> Yes, this would work. We simply put the code for master into host
> and make sure only one host is executing the master code at a time.
> 
> Interesting. :) But let's use a master for now since it's easier to
> debug this way?

Keeping all the state in Zookeeper may also make it easy to debug, since 
it can be read from any client.

I actually don't think it should be hard to have it both ways.

We should write the master as a fail-over loop in any case, so that 
multiple master-daemon processes can be running on a cluster at once, 
but only one acting as master.  Switching masters frequently should be 
as simple as adding a counter in the top-level daemon loop to decide 
when to give up the lock.

The master should be a different class from the host, and should have 
it's own main() method, so that it can be run independently, but it 
should also be possible to run the master in the same JVM as a host. 
(We'll want this for testing anyway.)

So, with these two simple properties, lightweight master failover and 
ability to run the master on a host, means that we can choose to either 
dedicate a machine to the master, or run the master daemon on every 
host, switching frequently.  I'd certainly like to preserve the latter 
option, so lets keep it in mind as we code.

>>  How does this work at system startup?  How does B know not to go online,
>>  providing its stale adds, yet A is permitted to provide its data?
>>  Perhaps, at startup a node can respond to requests for its first log
>>  entry number, but refuse requests for the logs themselves.  Then B can
>>  see that it must reload, and refuse requests from A to sync until reload
>>  is complete.  Does that work?
> 
> What do you mean by "a node can respond to requests for its first log
> entry number, but refuse requests for the logs themselves"?

 From A's perspective, when B comes online, B's data looks fine, since 
B's log is complete.  But B discovers, when it talks to A, that B's data 
is obsolete.  If A retrieves B's data before B discovers that it is 
obsolete, then A would get stale adds.  So B must block retrieval of its 
data until it has determined whether it is valid to all of its 
neighbors.  At system startup this means that all nodes must wait for 
their neighbors to come online so that they can determine whether their 
own state is valid before permitting any synchronization.

Does that make any sense?

> Maybe at system startup, master (or a host) analyzes, for each node,
> its first log entry number and the log entry numbers it has synced with
> its overlapping nodes. Then it is derived from this analysis which nodes
> need complete reloads? Is this what you meant?

I think each node can determine that on its own at startup.  It

1. Posts its range and log start number.
2. Waits a bit, so all other nodes have had a chance to post their data.
3. Decides if its index is valid, by checking all overlapping node's log 
start numbers & compares them with the last sync'd log number to see if 
they've compacted their log.
4. Starts syncing with its neighbors.

I worry that there might be pathological cases where different nodes 
were offline and/or compacted at different times causing all replicas of 
a range to be discarded.

Does this make any sense?

Doug