|
From: Doug C. <cu...@ap...> - 2008-03-19 20:28:23
|
Ning Li wrote: > The ring distribution won't be uniform in this case. But we have > to deal with this case anyway. So the main downside I see is > the performance cost with strings - computation, memory... > That's why I'm fine with a separate 'position' value. Also, having a well-known place for the application-specified external id is useful too. Lucene lacks this, which makes things like deletion and duplicate detection more complicated than they ought to be. So I think <externalId, version, position, <field>*> better than Lucene's minimalist <field>*. > One user may have one document. Another may have a lot. > Is 29 bits for username enough? Maybe. But is 3 bits for the > documents of a user enough? That means a user's documents > cannot span more than 8 nodes. I only have 50k emails in my archives. Even if I had 500k, one node would be plenty. I've heard that gmail handles all per-user requests on a single node, and gmail allows up to around 500k messages. On the other hand, squeezing the most out of bits is often a premature optimization that's later regretted. Long might be more future-proof. Doug |