|
From: Yonik S. <yo...@ap...> - 2008-03-19 22:07:35
|
On Wed, Mar 19, 2008 at 5:38 PM, Doug Cutting <cu...@ap...> wrote: > Yonik Seeley wrote: > > I'm not exactly sure yet either... but it seems like a node does need > > to identify all documents within certain arbitrary ranges at some > > point for rebalancing (and perhaps for filtering too). > > Will the hash be indexed, stored somehow, or calculated on the fly on the node? > > The naive way to implement this on Lucene is to make the position be an > indexed field, then use a RangeFilter to constrain queries, if that's > what you're asking. Right. If it's indexed, it seems advantageous to use 32 bits rather than 64 (esp thinking about the .tii file size). Longer term it might make sense to use a column-stride field: https://issues.apache.org/jira/browse/LUCENE-1231 > Mostly we hope that queries will span the entire > range of an index & with no need for filtering. But sometimes the > Filter will be needed. > > When replaying the log to a neighbor node we'll also need to filter by > position range. Or just log the position along with the Id and version. > So we'll be touching these a lot, but I don't yet see a > case where we'd, e.g., want to create a bit vector of occupied > positions. They'll be pretty sparse for that, even if only 32 bit. Sorry for the confusion, I never meant that. I just meant the ability to map from range to documents in that range on the node. -Yonik |