Re: [bailey-developers] More detailed design on load balancing

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

On Wed, Apr 2, 2008 at 1:55 PM, Doug Cutting <cu...@ap...> wrote:
>  This looks great!  We can implement node copying with this too, right,
>  just by setting X'=X?  And 2.2 could be done as a database checkpoint
>  copy & filter.  In the case where X'=X, this would result in the same
>  algorithm as a node copy, since the filter would be a no-op.  Or am I
>  missing some fundamental difference?

Yes, copy and ranged copy could/should use the same mechanism.

>  Initially we can avoid implementing checkpoint copying, since I don't
>  think it easily fits into an RPC protocol.  Perhaps it will work by
>  specifying a set of URLs to fetch or something?

Actually, the mechanism I want copy and ranged copy to use is
not checkpoint copying. This is because ranged copy may only
want a small sub-range of the checkpoint (in removing a node).
So, what if a RangedDatabase supports getDocs(Range range)
where full documents in the range are retrieved. To be conservative,
the starting entry numbers are set to the ones at the beginning
of the retrieval. Of course we need re-parsing the documents.
But we do that in log propagation as well. So, which approach
do you think is better?

>  Another downside is that load balancing is trickier.  With disjoint
>  ranges, how would we split a hot range?  BTW, I don't think we'll
>  actually randomly assign nodes, but rather try to initially add them
>  evenly around the ring, then use load balancing to split hot spots.

Halfen a hot range? More some more sophisticated algorithm.
We can still have nodes evenly around the ring at the start.
The partitioning algorithm doesn't change. Only the replication
algorithm is different.

>  Also, with disjoint ranges, each search would require searching more
>  indexes.  The search performance impact would thus be significant.
>
>  So the reasons for overlap are: (1) we can't split ranges without
>  permitting it, and (2) it makes search faster.  Do you agree?

What do you mean by "without permitting it"?

As to 2), I'm not sure overlap makes search faster.
Assume the scope of the positions is [0, 400) and the replication
level is 2. Let's say we have 4 nodes with overlap:
  - node 0 serves [0, 200)
  - node 1 serves [100, 300)
  - node 2 serves [200, 400)
  - node 3 serves [300, 100)
Without overlap, it's only fair that nodes serve ranges of the same
size as with overlap, so we only have 2 nodes:
  - node 0 serves [0, 200)
  - node 1 serves [200, 400)
Each node is replicated twice. So it's 4 nodes in total.

I think this is a fair comparison so the search performance should
be comparable. What do you think?

Ning