Re: [bailey-developers] SF.net SVN: bailey: [17] trunk/src

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

ni...@us... wrote:
> 1 Add the protocol and the classes related to log propagation. Also add a simple implementation in the withlog package and a test case TestSimpleDbWithLog.

This stuff looks great!  You're a tour de force!

A few minor comments:

- RangeResults belongs in the ddb implementation package.  The top-level 
package has the end-user API, and RangeResults are an implementation detail.

- in Hadoop, the way we handle threads is to stop them with 
Thread.interrupt() rather than by setting a flag.  In the thread, always 
treat InterruptedException as a signal to exit.  the run() loop should 
check !this.isInterrupted().

- shouldn't the host get the hostMap from the master?  And shouldn't it 
periodically refresh both the hostMap and the logMap from the master? 
For this, and instead of using ClientToMaster within a host, we need to 
add a method to HostToMaster protocol that returns a Mapper for the 
subset of the ring that concerns the calling node or host.  Intitially 
it might return the full mapper, but, eventually, it should only 
transmit the node's neighborhood, or perhaps the host's neighborhoods.

- Should we have a single propagator per host, instead of per node? 
That would conserve calls to the master, and a single propagation thread 
would throttle things, so that indexing doesn't overwhelm search 
performance.  OTOH, we might sometimes want to propagate changes faster 
than a single thread can.  But that's probably better dealt with 
explicitly rather than having a thread per node...

- some possible name improvements:

   Tuple -> NodeState ?
   logMap -> overlappingNodes or neighbors?
   propagator -> retriever? synchronizer?

- the point where we have a new log event from a neighbor, and need to 
resolve it against ourselves seems like a good point for a method call.

> 2 Add the RangedDatabase class which contains NodeStatus, Database and Log.
> 3 Add "getDocs" to the Database class to retrieve a number of documents. This will be used to improve performance during the log propagation. Q: Should Database be aware of Range to support filtered queries based on Range? Or do we make RangedDatabase add a clause to a query before passing it down to Database?

I think we should add a Range element to Query that narrows it.  But we 
first need to define what it means in terms of other public API 
elements.  I think we define it in terms of the document's "position" 
field, which is the hashCode of its id by default, but can be explicitly 
specified.  Does that sound right?

Does getDocs() need to be in the top-level application API?  At some 
point we need to distinguish between full documents and "outline" 
documents.  E.g., if we're storing full-text then we don't want to 
transmit that to search clients when they're just displaying hits.  We 
might, e.g., add a list of fields to be retrieved to Query.  But I don't 
yet see a case where an application will need to fetch a set of 
documents by id.  Except for search results, one-at-a-time access will 
be more typical, no?

Sorry I've not been more involved this week...

Doug