#142 Replication race => "destination object exists" error

open
nobody
Repository (41)
5
2007-08-15
2007-08-15
No

If two clients simultaneously request replication of the same object (or if a second request is made any time while the first request it being processed), one one the replications will fail with the error "destination object exists" from the repository. One way this can happen is two parallel evaluations trying to replicate a shared model import.

Handling this should be transparent to users of the higher-level replication API (used by vrepl, the evaluator, and others). The server will need to keep a table of in-flight replications and have subsequent requests for the same object block until the preceding one completes. We probably need to have the server side ("Replicate" in Replication.C) block and return only after the replication attempt that started first completes. Unfortunately, this will take up a thread on the server. It's tempting to think that the server could instead return an indication to the client that a parallel replication of the same object is in progress and then have the client poll for the completion of the replication, but I think that will make it harder to handle the case where a replication attempt fails.

What should we do if the first replication attempt fails with an error (e.g. the repository we were replicating from reboots) and there are blocked subsequent attempts to replicate the same object? It wouldn't be correct to have them all fail just because the first request failed. A later request might have a different source repository that is still on the network. So if the first replication attempt fails an there is another one for the same object that has been blocked it should wake up and attempt to replicate the object.

Discussion

  • Kenneth C. Schalk

    Logged In: YES
    user_id=304837
    Originator: YES

    Scott Venier pointed out that this is very similar to the
    problem the problem of suppressing multiple attempts to
    evaluate the same function. In the evaluator there is some
    code which, with a little re-work, we could re-use for the
    replication case. Specifically, in ApplyCache.C: the
    "rpt_cond" class and the "WaitForDuplicateEval" and
    "WakeWaitingEvals" functions.

    The first step would be to make a template class which could
    do the same thing for any key type. It would need a pair of
    virtual functions to be called when it's about to block and
    after it stops blocking. The evaluator would use the
    "before blocking" one to print the "waiting on possibly
    identical ..." message. In the future the evaluator might
    use both of them for allowing another parallel worker thread
    to start while blocking and then waiting for the number of
    parallel threads to drop low enough before proceeding (which
    would fix a minor limitation of the duplicate call
    suppression).

     
  • Tim Mann

    Tim Mann - 2007-08-17

    Logged In: YES
    user_id=95236
    Originator: NO

    That's a good observation. It also sounds somewhat similar to the code in the evaluator to COW sid file contents. Well, that code also limits the number of simultaneous COWs (or used to anyway; didn't you change that somehow?), which doesn't apply here.

     

Log in to post a comment.

Get latest updates about Open Source Projects, Conferences and News.

Sign up for the SourceForge newsletter:

JavaScript is required for this form.





No, thanks