|
From: Doug C. <cu...@ap...> - 2008-03-19 21:39:10
|
Yonik Seeley wrote: > On Wed, Mar 19, 2008 at 4:46 PM, Doug Cutting <cu...@ap...> wrote: >> A Document is uniquely identified by a String externalId and a long >> version. The position is not assumed to uniquely identify it. So I'm >> not sure where the size of the position will be significant. > > I'm not exactly sure yet either... but it seems like a node does need > to identify all documents within certain arbitrary ranges at some > point for rebalancing (and perhaps for filtering too). > Will the hash be indexed, stored somehow, or calculated on the fly on the node? The naive way to implement this on Lucene is to make the position be an indexed field, then use a RangeFilter to constrain queries, if that's what you're asking. Mostly we hope that queries will span the entire range of an index & with no need for filtering. But sometimes the Filter will be needed. When replaying the log to a neighbor node we'll also need to filter by position range. So we'll be touching these a lot, but I don't yet see a case where we'd, e.g., want to create a bit vector of occupied positions. They'll be pretty sparse for that, even if only 32 bit. > True... I guess it depends on if those 32 bits will be used for > anything other than a hash (or position) by the application level. Well, we've talked of applications encoding the user in the high 29 bits and message in the low three. Frankly, I have a hard time seeing where 32 bits would be a problem here. Typically what you'll want to do is have a primary field (e.g., user) that can limit the amount of the ring that must be queried, and trade that against the chances that a a single value of that field will overwhelm that portion of the ring. If the master rebalances by load, this will be easier. A single node should probably never index more than a few million items, so if you know that you might have, e.g., 100M items with a given primary field value, then you'd want to make sure that there are at least 100 distinct values within that. But I think 32 bits gives plenty of room for such things. Hey, did I just switch sides again? Doug |