Re: [bailey-developers] lattice master

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Ning Li wrote:
> On Thu, Feb 28, 2008 at 5:37 PM, Doug Cutting <cu...@ap...> wrote:
>>  A unique updatetable index per document would be nice, but I'm not yet
>>  entirely convinced it is practical.
> 
> Not if short glitches are not acceptable. In BigTable, a tablet is served
> by a single tablet server. I wonder if they find it to be a problem.

BigTable points towards a different architecture, where all 
modifications are logged to a shared filesystem, and a single node 
handles both updates and searches for that range of ids.  Perhaps we 
should consider this more seriously.

We want to scale flexibly both in collection size and in search traffic. 
  If search traffic is low, then indexes might be large, and if search 
traffic is high, indexes might be smaller and replication might be 
higher.  But, with no search node replication, system performance tops 
out a the rate that a node can process queries on a tiny index, which is 
not infinite.  So you'd probably want to add read-only replicas onto the 
BigTable model.  But then, when you have lots of writes, you don't fully 
utilize your cluster, and our writes are much more compute intensive 
than BigTable writes.  I think configuring a cluster in this model would 
be more complicated and less fluid.

Finally, as you observed, there would be hiccups whenever a node fails. 
  Hiccups affect a small percentage of BigTable clients, only those 
touching the tablet on the failed node.  But, in distributed search, 
every query touches a large portion of the nodes.  So, in a 1000 node 
cluster, a failure might delay .1% of BigTable users, but might delay 
33% of distributed search users (assuming 3-way replication).  So search 
can be much more sensitive to this.

So I'm not convinced that the BigTable model is as appropriate for 
distributed full-text search as consistent hashing.  Thoughts?

Doug