|
From: Doug C. <cu...@ap...> - 2008-02-28 22:37:31
|
Ning Li wrote:
> In the case of one updateable replica, if the updateable replica is down,
> we need to make another replica updateable. The down side is that
> the shard is not updateable during the switch. But it should be short
> and thus be fine for some applications. The complexity involved with
> all updateable replicas is conflict resolution...
A unique updatetable index per document would be nice, but I'm not yet
entirely convinced it is practical.
Say your client fetches the ring of nodes from the master, then starts
adding documents. In the meantime, the master updates the list of
nodes, and a second client gets the new, modified ring. In this case,
two clients could update different nodes for the same document.
Another scenario is that a switch is congested, and some clients are
unable to connect to an index to update it, while other clients can
connect to it. The clients that cannot connect update a different node
for the same documents.
Perhaps we could guard against these by having clients "lease" the ring
from the master. Then the master could make sure that it doesn't issue
a new version of the ring until all leases to the old version have
expired. The master could choose to republish the ring every five
minutes. A node that leases the ring four minutes into a lease window
would only get a one minute lease. But all nodes would then hammer the
master for a new version at the same time, every five minutes. Can you
think of a better way?
Unless we can come up with a foolproof mechanism or somesuch, I think
we'll have to handle the case where multiple indexes are updated in
their overlap and must reconcile these updates.
I'd imagined each node periodically querying its neighbors for changes
in the range they share. We shouldn't rely on clock synchronization, so
each node would keep the last revision of each neighbor that it has
sync'd with. So, the first time they connect, they pass revision zero
and receive all updates for their overlap. The next time they only need
to retrieve updates since the last.
Documents could also have an application-specified revision. This would
greatly simplify reconciliation, since we could use these to resolve all
disputes in a predictable way: higher revision wins.
Perhaps we'd want two formats for updates sent between nodes: outline
and full, where outline just contains a sequence of <{ADD|DEL}, id,
revision>. Then the retrieving node can process these and determine
which revisions of which ids it needs, then retrieve those as a second step.
This approach is tolerant of network partitoning, and not too
complicated. What do you think?
Doug
|