Re: [bailey-developers] Features and their design

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

On Mon, Mar 10, 2008 at 11:39 AM, Ning Li <nin...@gm...> wrote:
> On Fri, Mar 7, 2008 at 8:36 PM, Yonik Seeley <yo...@ap...> wrote:
>  > On Thu, Mar 6, 2008 at 5:47 PM, Ning Li <nin...@gm...> wrote:
>  >
>  > >  5 A document database?
>  >  >   - We store documents anyway.
>  >  >   - We don't support sub-document updates.
>  >
>  >  Field updates?  We could if we store all the fields.  Solr has a patch
>  >  for this, but it might be more efficient to implement in Lucene.  It
>  >  requires being able to get the *latest* stored fields for a doc, even
>  >  if they are uncommitted.
>
>  Let's not worry about performance for now. As you pointed
>  out, if we update one stored field for a doc, we have to figure
>  out the "latest" of all the other stored fields for the doc - but
>  it's impossible because of distributed update and eventual
>  consistency. Well, we can keep a revision number for each
>  stored field, but...

Ah, right... I was talking more about just retrieving the latest
stored fields in a particular lucene index.
It's something we will need to do for replication anyway.

>  >  >  Here are a few comments on the features:
>  >  >  1 Consistent hashing uses hash values because hash values
>  >  >   distribute uniformly on the ring. Can we support
>  >  >   application-specified keys for the ring?
>  >
>  >  Seems like we could allow the user to specify their own hash value.
>  >  What's the usecase here?
>
>  An example application can be an online email system.
>  The keys of a user's emails are prefixed by the user name,
>  so a user's emails are located together on the ring. When
>  a user searches his/her emails, the query is only sent to
>  servers which cover that range, instead of the entire ring.

Great example!  This could really increase scalability for some systems.

>  >  > The difference
>  >  >  is that the distribution may not be uniform so we need
>  >  >  to rebalance sometimes (remove a virtual node and insert
>  >  >  it somewhere else).
>  >
>  >  I'll refer back again to my comments on separating replication (the
>  >  range of node X is replicated on nodes X-1 and X-2) from key
>  >  partitioning (the range of node X is 0-1000 + 5000-6000 for example).
>  >  One can change the key partitioning w/o touching the replication configuration.
>
>  I think your point is that we need re-balancing in any case?

More about what rebalancing means too... when rebalancing, can you
leave all the nodes in place (the replication configuration) and just
change what keys map to a node?

-Yonik