|
From: Doug C. <cu...@ap...> - 2008-04-04 23:26:18
|
Ning Li wrote: > We were thinking storing the "metadata" in Zookeeper, right? > All the nodes (ever) created, their ranges, their current log > entry number, the starting entry numbers of their overlapping > nodes, the current ring... So when a master is restarted, > it starts from exactly where it was before it went down by > retrieving the "metadata" from Zookeeper. I've been going back-and-forth in my head to try to decide how we want to use Zookeeper and whether we should store the current ring there. This is certainly an argument for that. It makes for a more stateful master than I'd hoped, but that might not be bad. An extreme idea would be to make Zookeeper *be* the master. There would be no XxxToMaster protocols: instead everything would be coordinated through Zookeeper files. Letting nodes directly access Zookeeper makes the master less of a bottleneck, since Zookeeper itself replicates data. > When an offline node comes back up, we verify with Zookeeper > that it is a valid node in a valid state. Then we check if we > can patch up its data by checking whether the overlapping > nodes and their logs still have the entries back to the > starting entry numbers of the node. If not, we sync from > scratch. I don't see it yet. Log entry numbers are per-node, and can't be compared across nodes, right? Node A syncs from B log entries 1-10. Document X is added to C. A and B both sync X from C. B goes offline. X is deleted from C. A syncs X's deletion from C. A expunges its logs. B comes back online. A tries to sync events after 10 from B. How does A know to ignore the addition of X? Doug |