Re: [bailey-developers] More detailed design on load balancing

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

It's been quiet for a while. Any comments/thoughts?

Cheers,
Ning

On Wed, Apr 16, 2008 at 7:12 PM, Ning Li <nin...@gm...> wrote:
> >  I'm coding the add-node process with the overlapping replication
>  >  right now. It's taking some time. After it's finished, we can review
>  >  the code and have a complete picture of the complexity and the
>  >  performance implications.
>
>  Finally checked in the first version for adding a node, with log
>  propagation and cleanup... :)
>
>
>
>  On Thu, Apr 10, 2008 at 6:38 PM, Doug Cutting <cu...@ap...> wrote:
>  >  Yes, that's one way to do it.  Perhaps it is even the best way.  But if
>  >  C reports it is ready to start serving [2 4) before D is, then this
>  >  might be revealed to clients, to more rapidly reduce the load on A and
>  >  B.  Might this also permit a less stateful master?  My hunch is that the
>  >  more queues of pending splits and such that the master has to maintain,
>  >  the greater the chances of confusion in the face of failure.
>  >
>  >  Perhaps the master could simply issue step (1), asking C and D to start
>  >  serving [2 4), and remember only that it had issued this request so that
>  >  it doesn't issue it again too soon.  Then, when C & D tell the master
>  >  they're ready, the master can notice that [2, 4) is over-replicated, and
>  >  issue requests to A and B to stop serving this.  In this model, the
>  >  master would not wait for both C & D to report, nor would it wait for A
>  >  and B to report their narrowed ranges before republishing the ring.
>  >
>  >  In this approach, if D fails while replicating [2, 4) then the master
>  >  could still trim A to [0, 2), but not trim B yet.  Also, if
>  >  previously-offline E comes back online already serving [0, 2) or [2, 3),
>  >  or some other overlapping range from some prior merge/split state, it
>  >  can be more easily re-integrated into the ring, without discarding all
>  >  of its data.
>  >
>  >  So, even in the case of a mostly-disjoint partitioning, there may be
>  >  benefits to permitting overlap.
>
>  You have a number of good points.
>
>  I was worried about the complexity of a state change "transaction"
>  (i.e. adding/removing a node and the changes to neighbors' ranges
>  and the changes to more neighbors' log propagation that come
>  with it). As you said, "the more queues of pending splits" (or
>  pending range changes) "and such that the master has to maintain,
>
> the greater the chances of confusion in the face of failure."
>
>  Your email made me think that we can make the complexity
>  manageable by breaking a state change "transaction" into
>  smaller ones.
>
>  Here is what I think of a state change "transaction":
>  A new node X' is added at the position X' between X and X+1.
>  The affected nodes switch from:
>
> node X-2 serves range [X-2, X+1)
>  node X-1 serves range [X-1, X+2)
>  node X serves range [X, X+3)
>  to:
>  node X-2 serves range [X-2, X')
>  node X-1 serves range [X-1, X+1)
>  node X serves range [X, X+2)
>  node X' serves range [X', X+3)
>  And the log propagation of even more nodes are affected.
>
>  We break this "transaction" into four smaller ones:
>  1 node X' serves range [X', X+3)
>      Now range [X', X+3) is replicated 4 times - over-replicated.
>  2 node X-2 from [X-2, X+1) to [X-2, X')
>      Now only range [X+1, X+3) is still over-replicated.
>  3 node X-1 from [X-1, X+2) to [X-1, X+1)
>      Now only range [X+2, X+3) is still over-replicated
>  4 node X from [X, X+3) to [X, X+2)
>      Now no range is over-replicated.
>  So in each of these smaller "transactions", only one node
>  and some neighbors' log propagation are affected. Failure
>  is now easier to deal with.
>
>  It's worth pointing out that the latter three "transactions"
>  are scheduled to balance the level of replication.
>
>  Is this a good idea?
>
>  Ning
>