|
From: Mike K. <mik...@gm...> - 2008-02-27 23:01:48
|
This is list is still small enough that I feel that I should introduce myself. I'm a Solr committer and CTO of a small internet search startup that is still in stealth mode. Yonik pointed out the project to me. It sounds quite interesting, not least of which because our company has built a large Solr-based search cluster running on EC2 with at least some of the properties you are aiming for. The one thing we don't have is dynamic automatic failover and replication. One reason is that it's hard, and the second is that for us, it has always been better to use more boxes for a larger corpus rather than replicating a smaller one. Instead, we store the indices in S3 and restore from backup when a machine fails (which is not rarely). I do have a couple questions about the project: - it isn't clear to me how the goals are substantially different from those of nutch. Is it mostly in the relaxation of its application to web search? - why sourceforge rather than an apache project? - it sounds like the intent is to build upon Solr, which I think is a great idea, but isn't mentioned in the top-level goals section. Is Solr an optional component? cheers, -Mike |