|
From: Josef J. <ju...@cs...> - 2006-02-02 00:01:37
|
In the recent past, some of us needed a way to
distinguish between blast results in DoTS.Similarity by
the blast parameters used. For example, I might
blast the same sets of sequences several times
with different parameters and put all the results
of all these blast searches into DoTS.Similarity.
I was of course, able to jury-rig a way to do
this, though now with GUS 3.5, an officially
sanctioned method has been implemented
by using the Core.AnalysisAlgorithm table.
Below is a crude entity relationship diagram
of how Core.AnalysisAlgorithm fits in with other
relevant tables (be sure to view this file with
the courier font):
Core.Algorithm (name)
|
|
Core.AlgorithmImplementation Core.AlgorithmParamKeyType
| | \ / (string, float, int, ...)
| | \ /
Core.TableInfo Core.AlgorithmInvocation Core.AlgorithmParamKey
| \ | \ / (description of parameter)
| \ | \ /
DoTS.Similarity Core.AnalysisAlgorithm Core.AlgorithmParam (parameters as a string)
(individual parameter)
And so, to insert into GUS a list of blast parameters such as:
-p blastp -FD -W2 -G 11 -E 1 -e 0.5 -f 11 -M BLOSUM62 -b 1000000 -v 1000000
one would need to:
have a row for every flag in Core.AlgorithmParamKey
have a row for every value after a flag in Core.AlgorithmParam
which is very complicated to both insert and query. I suppose if one
wrote a gus plugin which is a wrapper around blast, inserting parameters
in these all of these rows/fields could be easily taken care
of, though we at the Preuss lab just are not going to do things
that way. We may get blast data from a collaborator that took
days to run on a multi-node cluster and we just want to dump
this data into DoTS.Similarity/DoTS.SimilaritySpan and query
it. We can't run this blast search again with a plugin.
And again, querying blast results by blast parameter between the
same set of sequences looks to be very complex with the
above tables. Imagine writing SQL trying to distinguish between
blast results based on these three sets of parameters.
-p blastp -FD -W2 -G 11 -E 1 -e 0.5 -f 11 -M BLOSUM62 -b 1000000 -v 1000000
-p blastp -FD -W3 -G 11 -E 1 -e 0.5 -f 11 -M BLOSUM62 -b 1000000 -v 1000000
-p blastp -FD -W2 -G 11 -E 1 -e 0.5 -f 11 -M BLOSUM80 -b 1000000 -v 1000000
Perhaps people on the list can let me know if there are
any labs outside of CBIL that are depositing and querying
blast search parameters with the above tables.
What we at the Preuss lab really need is a simple way
to group rows in DoTS.Similarity together, much like the
way one groups rows in DoTS.ExternalNASequence together with
the table SRes.ExternalDatabaseRelease. Then a set of blast
results could be labeled with a convenient name
such as "ME vs Ath, W9" or "Jim's lab, -FF".
I will go a head and implement something locally to do
this, but I would think such a thing would not only
be valuable, but necessary for others too.
Does implementing a such a table (perhaps calling it
DoTS.SimilaritySet) in the official distribution make
sense?
Or perhaps I am wrong in my understanding of how
the Core.AnalysisAlgorithm table can be used, and
there is a simpler way to do this. If so, I hope
someone can enlighten me.
Thank you for reading;
Josef
|