[bailey-developers] Introduction

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

This is list is still small enough that I feel that I should introduce  
myself.  I'm a Solr committer and CTO of a small internet search  
startup that is still in stealth mode.  Yonik pointed out the project  
to me.  It sounds quite interesting, not least of which because our  
company has built a large Solr-based search cluster running on EC2  
with at least some of the properties you are aiming for.

The one thing we don't have is dynamic automatic failover and  
replication.  One reason is that it's hard, and the second is that for  
us, it has always been better to use more boxes for a larger corpus  
rather than replicating a smaller one.  Instead, we store the indices  
in S3 and restore from backup when a machine fails (which is not  
rarely).

I do have a couple questions about the project:
- it isn't clear to me how the goals are substantially different from  
those of nutch.  Is it mostly in the relaxation of its application to  
web search?
- why sourceforge rather than an apache project?
- it sounds like the intent is to build upon Solr, which I think is a  
great idea, but isn't mentioned in the top-level goals section.  Is  
Solr an optional component?

cheers,
-Mike