|
From: Ning L. <nin...@gm...> - 2008-02-29 19:17:15
|
On Fri, Feb 29, 2008 at 11:23 AM, Ning Li <nin...@gm...> wrote: > On Fri, Feb 29, 2008 at 9:08 AM, Yonik Seeley <yo...@ap...> wrote: > > On Thu, Feb 28, 2008 at 5:37 PM, Doug Cutting <cu...@ap...> wrote: > > > I'd imagined each node periodically querying its neighbors for changes > > > in the range they share. We shouldn't rely on clock synchronization, so > > > each node would keep the last revision of each neighbor that it has > > > sync'd with. So, the first time they connect, they pass revision zero > > > and receive all updates for their overlap. The next time they only need > > > to retrieve updates since the last. > > > > Sounds good! > > > > We'll need to make sure that it handles this case: > > Say C syncs from B, then B syncs from A and get some new docs. Now A > > goes away for a while... it seems like C will need to get the new docs > > from B next time it syncs. If the docs that B got from A are treated > > like any other new additions, this should work right? > > I think this is how it works with replication=3: > Node A serves range A-D, node B serves range B-E, node C serves range C-F... > Node C syncs with nodes A & B on range C-D, syncs with nodes B and D > on range D-E, and syncs with nodes D & E on range E-F. > > Now node D goes away. Node C continues to serve range C-F until it gets > range F-G from node E or node F and starts to serve range C-G. Then, > node C syncs with nodes A & B on range C-D (same as before), syncs with > nodes B and E on range D-F, and syncs with nodes E & F on range F-G. I see the following as a possible implementation: Each node logs its changes in a light-weight log. A log is a sequence of log entries, with the entry number monotonically increasing. Let's say node X periodically syncs with node Y. A sync works as follows: Node X starts with entry p in node Y's log where node X synced with node Y last time. After node X finishes syncing (node X puts actual changes into its own log), it sets its sync entry w/ node Y to entry q. When node D goes away, node C copies range F-G from node E or node F, let's say node E. For the copy, node E's own log is at entry s and node E is synced with node F at node F's entry t. Node C does the copy but does not log the copy. After node C finishes the copy and before it switches from serving range C-F to serving range C-G, it sets its sync entry with node E to min(its current sync entry with node E, entry s) and sets its sync entry with node F to entry t. Is this efficient enough? Ning |