Menu

#2 pygr db sharing via XMLRPC services

open
Alignment (2)
5
2007-01-10
2007-01-10
No

We want to make it easy for pygr to share resources across a set of machines. We already have an XMLRPC ResourceController framework (coordinator.py) that could easily be adapted to providing services such as
- access to a sequence database
- access to an NLMSA alignment database
etc.

Initial plans:
1. server base class should have _dispatch=safe_dispatch, serve_forever(resourceID,rc_url). Serve_forever just does get_server, register_instance, detach_as_demon_process, register with the RC. Should probably include some heartbeat reporter to stay in touch with the RC, and a ping responder that allows the RC to check it's running.

2. seqdb.BlastDB is probably the easiest first example. Just create a server class as a mix-in with xmlrpc_methods attribute, getitem that provides XMLRPC interface (just echo the string ID if it exists in the database), and a strslice(id,start,stop) method.

3. client side: seq db class creates a ServerProxy and provides __getitem__, __len__, __hash__, __invert__ as in BlastDBbase. Ideally design it to be able to restart from a different server if original server dies. Client sequence object class just calls server.strslice(id,start,stop).

4. ResourceController should keep a resource database in a shelve. The basic model is each resource has a unique string ID. Each resource can be mapped to one or more rules that provide that resource. Each rule consists of two aspects: a class for instantiating the resource on the client, and an argument list for its constructor. Keep these in the shelve. The class is designated as a string of the form "pygr/seqdb.BlastDB", which translates as "from pygr import seqdb" and then references the class as "seqdb.BlastDB".

5. client-side, there should be a single object representing the connection to the ResourceController, which is used as an interface for accessing resources. This could just be a dict interface.

6. Later we should look carefully at extending the sequence model to handle caching in a really clean way. Then NLMSASlice could provide the key hints for caching.

7. For NLMSA XMLRPC server, there are two possible places to make the client-server cut: at the level of the NLMSASlice retrieval (efficient, only one XMLRPC call per query) or as a drop-in replacement for the IntervalFileDBIterator in NLMSASlice.__new__. Study this carefully. Refer to my previous plan to use NLMSASlice as a general interface for other alignment sources: all you need to give it is a method that will performs an interval query (which could go to a database, whatever...). That can either be in a single step or via an intermediate LPO step. This actually could be VERY straightforward to implement.

8. Actually making a client-server version of the NLMSA class looks pretty simple. Basically, the server needs a couple new methods (getSeqInfo, getSlice) that provide the XMLRPC front-end. getSeqInfo would return a tuple of sequence resource IDs for the client to open via the standard getResource. getSlice would implement whatever mechanism we choose for running an interval query and handing back a bunch of result intervals to the NLMSASlice constructor. These server methods can be added by making a python subclass of the NLMSA extension class. The client side modifications would probably involve some minor changes to the NLMSA extension class(es). __new__ would have to use the server's getSeqInfo to find out what seq databases to build a PrefixUnionDict from. NLMSASlice.__new__ would also have to know to use the client-side getSlice method to get the list of result intervals, and not to run ns.forceLoad(). Since NLMSASequence is already being opened mode='onDemand' there is nothing to change there...

Discussion


Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.