From: Jonathan S. <js...@pc...> - 2004-07-19 14:00:49
|
Sucheta, and All: If the blast libraries do not overlap, i.e., contain different sets of sequences, then there is probably no problem. You can simply distinguish the Similarity rows by the target sequence's external_database_release_id. If the libraries overlap, then the issue is more difficult. We don't have the notion of a library in this sense, and of course the library size affects the p-values for matches. There is a DoTS::Library table that holds clone information that could be hacked to provide what you want, but I do *not* recommend it as a long term solution. You could also use the DoTS::DbRefNaSequence or DoTS::AASequenceDbRef as appropriate to link to a DoTS::DbRef which links to an ExternalDatabase. These tables could easily be used to gather sequences into multiple BLAST database files. The best solution is probably to create new tables. I propose, then the following changes to the Similarity table and the addition of new table to track search libraries: DoTS::Similarity - add search_algorithm_invocation_id link to stably point to parameter values for the search. SRes::SearchLibrary - contains a description of the search library including entry count etc. SRes::SearchLibraryMember - uses a soft link, i.e., table_id, row_id to indicate membership. - link is soft so that SearchLibrary can also be used for motifs, etc. that may not be in sequence table. - SearchLibrary might contain a table_id to record what kind of entries are in the library. Thoughts? Jonathan On Jul 19, 2004, at 9:31 AM, Sucheta Tripathy wrote: > Hi Jonathan, > > Thanks for your reply. I also have a similar concern which I posted > sometimes back. My concern is to do with multiple databases rather > than the parameters for blast search. Currently we want to store blast > results against 23 different databases. Any suggestions which table > may be suitable? > > Thanks > > Sucheta > > At 12:18 AM 7/19/2004 -0400, Jonathan Schug wrote: >> Josef: >> >> The two columns in DoTS::Similarity that might be useful are algorithm >> and row_alg_invocation_id. Algorithm is not recommended; it is meant >> to be used to distinguish, say, BLAST hits from FASTA hits. >> Row_alg_invocation_id is better and will work. You can use the >> AlgoithmInvocation parameters. However, this will not work if the >> rows >> are modified in some way later one. Later updates will change the >> row_alg_invocation_id ruining this scheme. You could also consider >> linking Similarity rows to the invocation via an Evidence row. This >> is >> more stable. >> >> PlasmoDB faced this when tuning BLAST parameters to avoid the problems >> with the high AT content of the Pf genome. They may have tuned the >> parameters outside of the DB. >> >> You might also consider running the BLAST searches with the most >> lenient parameters, then recreating the more stringent searches with >> query parameters if this is possible >> >> Jonathan >> >> >> ---------------------------------------------------------------------- >> -- --- >> Jonathan Schug Center for Bioinformatics >> js...@pc... Computational Biology and Informatics Lab >> (215) 573-3113 voice University of Pennsylvania, >> (215) 573-3111 fax 1413 Blockley Hall, Philadelphia, PA >> 19014-6021 >> >> >> >> >> >> ------------------------------------------------------- >> This SF.Net email is sponsored by BEA Weblogic Workshop >> FREE Java Enterprise J2EE developer tools! >> Get your free copy of BEA WebLogic Workshop 8.1 today. >> http://ads.osdn.com/?ad_id=4721&alloc_id=10040&op=click >> _______________________________________________ >> Gusdev-gusdev mailing list >> Gus...@li... >> https://lists.sourceforge.net/lists/listinfo/gusdev-gusdev |