Vesta Configuration Management System / Bugs / #142 Replication race => "destination object exists" error

#142 Replication race => "destination object exists" error

Status: open

Owner: nobody

Labels: Repository (41)

Priority: 5

Updated: 2007-08-15

Created: 2007-08-15

Creator: Kenneth C. Schalk

Private: No

If two clients simultaneously request replication of the same object (or if a second request is made any time while the first request it being processed), one one the replications will fail with the error "destination object exists" from the repository. One way this can happen is two parallel evaluations trying to replicate a shared model import.

Handling this should be transparent to users of the higher-level replication API (used by vrepl, the evaluator, and others). The server will need to keep a table of in-flight replications and have subsequent requests for the same object block until the preceding one completes. We probably need to have the server side ("Replicate" in Replication.C) block and return only after the replication attempt that started first completes. Unfortunately, this will take up a thread on the server. It's tempting to think that the server could instead return an indication to the client that a parallel replication of the same object is in progress and then have the client poll for the completion of the replication, but I think that will make it harder to handle the case where a replication attempt fails.

What should we do if the first replication attempt fails with an error (e.g. the repository we were replicating from reboots) and there are blocked subsequent attempts to replicate the same object? It wouldn't be correct to have them all fail just because the first request failed. A later request might have a different source repository that is still on the network. So if the first replication attempt fails an there is another one for the same object that has been blocked it should wake up and attempt to replicate the object.

Discussion

Kenneth C. Schalk - 2007-08-16

Logged In: YES
user_id=304837
Originator: YES

Scott Venier pointed out that this is very similar to the
problem the problem of suppressing multiple attempts to
evaluate the same function. In the evaluator there is some
code which, with a little re-work, we could re-use for the
replication case. Specifically, in ApplyCache.C: the
"rpt_cond" class and the "WaitForDuplicateEval" and
"WakeWaitingEvals" functions.

The first step would be to make a template class which could
do the same thing for any key type. It would need a pair of
virtual functions to be called when it's about to block and
after it stops blocking. The evaluator would use the
"before blocking" one to print the "waiting on possibly
identical ..." message. In the future the evaluator might
use both of them for allowing another parallel worker thread
to start while blocking and then waiting for the number of
parallel threads to drop low enough before proceeding (which
would fix a minor limitation of the duplicate call
suppression).

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Tim Mann - 2007-08-17

Logged In: YES
user_id=95236
Originator: NO

That's a good observation. It also sounds somewhat similar to the code in the evaluator to COW sid file contents. Well, that code also limits the number of simultaneous COWs (or used to anyway; didn't you change that somehow?), which doesn't apply here.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Replication race => "destination object exists" error

Group

Searches

Help

#142 Replication race => "destination object exists" error

Discussion