bailey-developers Mailing List for Bailey (Page 4)

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

On Wed, Mar 19, 2008 at 2:06 PM, Doug Cutting <cu...@ap...> wrote:
>  I had a slightly different idea.  I thought that the id string would be
>  the external id provided by the application that we return with hits,
>  e.g., a uri, a filename, etc.  We'd also have a numeric 'position' value
>  that places the document on the ring.  The position would, by default,
>  be the hash of the id, but an application might override that.  It would
>  be a bug for an application to ever provide different positions for the
>  same id.

Originally, I was thinking simply using the application-specified
external id as its 'position' value on the ring. We'd have one value
instead of two. No need to check if different positions are ever
provided for the same id.

The ring distribution won't be uniform in this case. But we have
to deal with this case anyway. So the main downside I see is
the performance cost with strings - computation, memory...
That's why I'm fine with a separate 'position' value.

>  I'd imagined that positions would be longs, but Yonik has argued that
>  they might as well be ints, and I can't think why they couldn't, if
>  we're going to keep the string id too.  That makes the default
>  implementation in Java much easier, since it can be hashCode().

I'm not insisting on longs. But here is what I reasoned. :)
I imagined a good number of the applications which would use
Bailey would be similar to an email system - the application
would provide the 'position' values so that a search on a fraction
of all the documents spans a relatively small number of nodes.
Let's use Yonik's suggestion to assign such 'position' values:

>  Of course, fixing my bug it would be (username.hashCode() << 29) |
>  (id.hashCode() >>> 3)

One user may have one document. Another may have a lot.
Is 29 bits for username enough? Maybe. But is 3 bits for the
documents of a user enough? That means a user's documents
cannot span more than 8 nodes.

Maybe I over-thought the problem. :)

Ning