|
From: Yonik S. <yo...@ap...> - 2008-03-11 14:11:59
|
On Mon, Mar 10, 2008 at 11:39 AM, Ning Li <nin...@gm...> wrote: > On Fri, Mar 7, 2008 at 8:36 PM, Yonik Seeley <yo...@ap...> wrote: > > On Thu, Mar 6, 2008 at 5:47 PM, Ning Li <nin...@gm...> wrote: > > > > > 5 A document database? > > > - We store documents anyway. > > > - We don't support sub-document updates. > > > > Field updates? We could if we store all the fields. Solr has a patch > > for this, but it might be more efficient to implement in Lucene. It > > requires being able to get the *latest* stored fields for a doc, even > > if they are uncommitted. > > Let's not worry about performance for now. As you pointed > out, if we update one stored field for a doc, we have to figure > out the "latest" of all the other stored fields for the doc - but > it's impossible because of distributed update and eventual > consistency. Well, we can keep a revision number for each > stored field, but... Ah, right... I was talking more about just retrieving the latest stored fields in a particular lucene index. It's something we will need to do for replication anyway. > > > Here are a few comments on the features: > > > 1 Consistent hashing uses hash values because hash values > > > distribute uniformly on the ring. Can we support > > > application-specified keys for the ring? > > > > Seems like we could allow the user to specify their own hash value. > > What's the usecase here? > > An example application can be an online email system. > The keys of a user's emails are prefixed by the user name, > so a user's emails are located together on the ring. When > a user searches his/her emails, the query is only sent to > servers which cover that range, instead of the entire ring. Great example! This could really increase scalability for some systems. > > > The difference > > > is that the distribution may not be uniform so we need > > > to rebalance sometimes (remove a virtual node and insert > > > it somewhere else). > > > > I'll refer back again to my comments on separating replication (the > > range of node X is replicated on nodes X-1 and X-2) from key > > partitioning (the range of node X is 0-1000 + 5000-6000 for example). > > One can change the key partitioning w/o touching the replication configuration. > > I think your point is that we need re-balancing in any case? More about what rebalancing means too... when rebalancing, can you leave all the nodes in place (the replication configuration) and just change what keys map to a node? -Yonik |