|
From: Yonik S. <yo...@ap...> - 2008-03-20 13:51:41
|
Performance is certainly critical to many people and they don't always have the option to just add more boxes. >From what I know of some Solr users, it is also be very important to scale down well (all the way down to 1-2 nodes). I would expect a 2 node configuration to be very popular for minimum fault tolerance. Some users are also very sensitive to RAM consumption (they apparently run on hosted servers where more memory == more $$$ per month). -Yonik On Wed, Mar 19, 2008 at 4:13 PM, Doug Cutting <cu...@ap...> wrote: > Someone at Y! last week asked why Bailey doesn't use HDFS. I gave the > following reasons: > > - performance: by keeping indexes local search & indexing will be faster > - reliability: bailey replicates already, so hdfs replication is redundant > - continuous growth: consistent hashing lets us add and remove nodes > without fundamentally changing the way the index is partitioned. a > host-independent partitioning in HDFS would be too static. > > He countered: > - for decent search performance, the majority of the index must be in > memory anyway. i conceded that much of the benefit of local indexes > might come from the filesystem buffer cache, which hdfs lacks. > - for decent indexing performance, we could persist only logs + index > checkpoints to HDFS (once it supports append). > - even consistent hashing will require the master to be somewhat > involved in indexing as nodes are added and removed. is that really > inherently more complicated than having the master dole out > subdirectories from a central hdfs repository, merging and splitting > them as needed? > > The advantage of HDFS-based indexes is that nodes have less state. The > disadvantage is that you have to run HDFS (if you're not already), and > that performance will probably always be a bit less. I don't see a > clear advantage either way, and thus tend towards fewer dependencies and > better performance. > > Other thoughts? > > Doug > > ------------------------------------------------------------------------- > This SF.net email is sponsored by: Microsoft > Defy all challenges. Microsoft(R) Visual Studio 2008. > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > _______________________________________________ > bailey-developers mailing list > bai...@li... > https://lists.sourceforge.net/lists/listinfo/bailey-developers > |