[Gusdev-gusdev] Similarity table changes

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Sucheta, and All:

If the blast libraries do not overlap, i.e., contain different sets of  
sequences, then there is probably no problem.  You can simply  
distinguish the Similarity rows by the target sequence's  
external_database_release_id.

If the libraries overlap, then the issue is more difficult.  We don't  
have the notion of a library in this sense, and of course the library  
size affects the p-values for matches.  There is a DoTS::Library table  
that holds clone information that could be hacked to provide what you  
want, but I do *not* recommend it as a long term solution.  You could  
also use the DoTS::DbRefNaSequence or DoTS::AASequenceDbRef as  
appropriate to link to a DoTS::DbRef which links to an  
ExternalDatabase.  These tables could easily be used to gather  
sequences into multiple BLAST database files.

The best solution is probably to create new tables.

I propose, then the following changes to the Similarity table and the  
addition of new table to track search libraries:

DoTS::Similarity
   - add search_algorithm_invocation_id link to stably point to  
parameter values for the search.

SRes::SearchLibrary
   - contains a description of the search library including entry count  
etc.

SRes::SearchLibraryMember
   - uses a soft link, i.e., table_id, row_id to indicate membership.
   - link is soft so that SearchLibrary can also be used for motifs,  
etc. that may not be in sequence table.
   - SearchLibrary might contain a table_id to record what kind of  
entries are in the library.

Thoughts?

Jonathan

On Jul 19, 2004, at 9:31 AM, Sucheta Tripathy wrote:

> Hi Jonathan,
>
> Thanks for your reply. I also have a similar concern which I posted  
> sometimes back. My concern is to do with multiple databases rather  
> than the parameters for blast search. Currently we want to store blast  
> results against 23 different databases. Any suggestions which table  
> may be suitable?
>
> Thanks
>
> Sucheta
>
> At 12:18 AM 7/19/2004 -0400, Jonathan Schug wrote:
>> Josef:
>>
>> The two columns in DoTS::Similarity that might be useful are algorithm
>> and row_alg_invocation_id.  Algorithm is not recommended; it is meant
>> to be used to distinguish, say, BLAST hits from FASTA hits.
>> Row_alg_invocation_id is better and will work.  You can use the
>> AlgoithmInvocation parameters.  However, this will not work if the  
>> rows
>> are modified in some way later one.  Later updates will change the
>> row_alg_invocation_id ruining this scheme.  You could also consider
>> linking Similarity rows to the invocation via an Evidence row.  This  
>> is
>> more stable.
>>
>> PlasmoDB faced this when tuning BLAST parameters to avoid the problems
>> with the high AT content of the Pf genome.  They may have tuned the
>> parameters outside of the DB.
>>
>> You might also consider running the BLAST searches with the most
>> lenient parameters, then recreating the more stringent searches with
>> query parameters if this is possible
>>
>> Jonathan
>>
>>
>> ---------------------------------------------------------------------- 
>> -- ---
>> Jonathan Schug            Center for Bioinformatics
>> js...@pc...     Computational Biology and Informatics Lab
>> (215) 573-3113 voice      University of Pennsylvania,
>> (215) 573-3111 fax        1413 Blockley Hall, Philadelphia, PA
>> 19014-6021
>>
>>
>>
>>
>>
>> -------------------------------------------------------
>> This SF.Net email is sponsored by BEA Weblogic Workshop
>> FREE Java Enterprise J2EE developer tools!
>> Get your free copy of BEA WebLogic Workshop 8.1 today.
>> http://ads.osdn.com/?ad_id=4721&alloc_id=10040&op=click
>> _______________________________________________
>> Gusdev-gusdev mailing list
>> Gus...@li...
>> https://lists.sourceforge.net/lists/listinfo/gusdev-gusdev