From: Josef J. <ju...@cs...> - 2006-02-02 00:01:37
|
In the recent past, some of us needed a way to distinguish between blast results in DoTS.Similarity by the blast parameters used. For example, I might blast the same sets of sequences several times with different parameters and put all the results of all these blast searches into DoTS.Similarity. I was of course, able to jury-rig a way to do this, though now with GUS 3.5, an officially sanctioned method has been implemented by using the Core.AnalysisAlgorithm table. Below is a crude entity relationship diagram of how Core.AnalysisAlgorithm fits in with other relevant tables (be sure to view this file with the courier font): Core.Algorithm (name) | | Core.AlgorithmImplementation Core.AlgorithmParamKeyType | | \ / (string, float, int, ...) | | \ / Core.TableInfo Core.AlgorithmInvocation Core.AlgorithmParamKey | \ | \ / (description of parameter) | \ | \ / DoTS.Similarity Core.AnalysisAlgorithm Core.AlgorithmParam (parameters as a string) (individual parameter) And so, to insert into GUS a list of blast parameters such as: -p blastp -FD -W2 -G 11 -E 1 -e 0.5 -f 11 -M BLOSUM62 -b 1000000 -v 1000000 one would need to: have a row for every flag in Core.AlgorithmParamKey have a row for every value after a flag in Core.AlgorithmParam which is very complicated to both insert and query. I suppose if one wrote a gus plugin which is a wrapper around blast, inserting parameters in these all of these rows/fields could be easily taken care of, though we at the Preuss lab just are not going to do things that way. We may get blast data from a collaborator that took days to run on a multi-node cluster and we just want to dump this data into DoTS.Similarity/DoTS.SimilaritySpan and query it. We can't run this blast search again with a plugin. And again, querying blast results by blast parameter between the same set of sequences looks to be very complex with the above tables. Imagine writing SQL trying to distinguish between blast results based on these three sets of parameters. -p blastp -FD -W2 -G 11 -E 1 -e 0.5 -f 11 -M BLOSUM62 -b 1000000 -v 1000000 -p blastp -FD -W3 -G 11 -E 1 -e 0.5 -f 11 -M BLOSUM62 -b 1000000 -v 1000000 -p blastp -FD -W2 -G 11 -E 1 -e 0.5 -f 11 -M BLOSUM80 -b 1000000 -v 1000000 Perhaps people on the list can let me know if there are any labs outside of CBIL that are depositing and querying blast search parameters with the above tables. What we at the Preuss lab really need is a simple way to group rows in DoTS.Similarity together, much like the way one groups rows in DoTS.ExternalNASequence together with the table SRes.ExternalDatabaseRelease. Then a set of blast results could be labeled with a convenient name such as "ME vs Ath, W9" or "Jim's lab, -FF". I will go a head and implement something locally to do this, but I would think such a thing would not only be valuable, but necessary for others too. Does implementing a such a table (perhaps calling it DoTS.SimilaritySet) in the official distribution make sense? Or perhaps I am wrong in my understanding of how the Core.AnalysisAlgorithm table can be used, and there is a simpler way to do this. If so, I hope someone can enlighten me. Thank you for reading; Josef |