From: Josef J. <ju...@cs...> - 2006-02-02 00:01:37
|
In the recent past, some of us needed a way to distinguish between blast results in DoTS.Similarity by the blast parameters used. For example, I might blast the same sets of sequences several times with different parameters and put all the results of all these blast searches into DoTS.Similarity. I was of course, able to jury-rig a way to do this, though now with GUS 3.5, an officially sanctioned method has been implemented by using the Core.AnalysisAlgorithm table. Below is a crude entity relationship diagram of how Core.AnalysisAlgorithm fits in with other relevant tables (be sure to view this file with the courier font): Core.Algorithm (name) | | Core.AlgorithmImplementation Core.AlgorithmParamKeyType | | \ / (string, float, int, ...) | | \ / Core.TableInfo Core.AlgorithmInvocation Core.AlgorithmParamKey | \ | \ / (description of parameter) | \ | \ / DoTS.Similarity Core.AnalysisAlgorithm Core.AlgorithmParam (parameters as a string) (individual parameter) And so, to insert into GUS a list of blast parameters such as: -p blastp -FD -W2 -G 11 -E 1 -e 0.5 -f 11 -M BLOSUM62 -b 1000000 -v 1000000 one would need to: have a row for every flag in Core.AlgorithmParamKey have a row for every value after a flag in Core.AlgorithmParam which is very complicated to both insert and query. I suppose if one wrote a gus plugin which is a wrapper around blast, inserting parameters in these all of these rows/fields could be easily taken care of, though we at the Preuss lab just are not going to do things that way. We may get blast data from a collaborator that took days to run on a multi-node cluster and we just want to dump this data into DoTS.Similarity/DoTS.SimilaritySpan and query it. We can't run this blast search again with a plugin. And again, querying blast results by blast parameter between the same set of sequences looks to be very complex with the above tables. Imagine writing SQL trying to distinguish between blast results based on these three sets of parameters. -p blastp -FD -W2 -G 11 -E 1 -e 0.5 -f 11 -M BLOSUM62 -b 1000000 -v 1000000 -p blastp -FD -W3 -G 11 -E 1 -e 0.5 -f 11 -M BLOSUM62 -b 1000000 -v 1000000 -p blastp -FD -W2 -G 11 -E 1 -e 0.5 -f 11 -M BLOSUM80 -b 1000000 -v 1000000 Perhaps people on the list can let me know if there are any labs outside of CBIL that are depositing and querying blast search parameters with the above tables. What we at the Preuss lab really need is a simple way to group rows in DoTS.Similarity together, much like the way one groups rows in DoTS.ExternalNASequence together with the table SRes.ExternalDatabaseRelease. Then a set of blast results could be labeled with a convenient name such as "ME vs Ath, W9" or "Jim's lab, -FF". I will go a head and implement something locally to do this, but I would think such a thing would not only be valuable, but necessary for others too. Does implementing a such a table (perhaps calling it DoTS.SimilaritySet) in the official distribution make sense? Or perhaps I am wrong in my understanding of how the Core.AnalysisAlgorithm table can be used, and there is a simpler way to do this. If so, I hope someone can enlighten me. Thank you for reading; Josef |
From: <ju...@cs...> - 2006-02-06 17:13:44
|
Any thoughts on this? Does implementing a "SRes.ExternalDatabaseRelease"-like way to group rows in DoTS.Similarity together sound useful to anybody, or do people find the use of Core.AnalysisAlgorithm to accomplish such a thing completely satisfactory? Thanks, Josef Daphne Preuss Laboratory Molecular Genetics and Cell Biology The University of Chicago ju...@cs... voice: (773) 834-3985 fax: (773) 702-6648 I wrote: > > > In the recent past, some of us needed a way to > distinguish between blast results in DoTS.Similarity by > the blast parameters used. For example, I might > blast the same sets of sequences several times > with different parameters and put all the results > of all these blast searches into DoTS.Similarity. > > I was of course, able to jury-rig a way to do > this, though now with GUS 3.5, an officially > sanctioned method has been implemented > by using the Core.AnalysisAlgorithm table. > > Below is a crude entity relationship diagram > of how Core.AnalysisAlgorithm fits in with other > relevant tables (be sure to view this file with > the courier font): > > Core.Algorithm (name) > | > | > Core.AlgorithmImplementation Core.AlgorithmParamKeyType > | | \ / (string, float, int, ...) > | | \ / > Core.TableInfo Core.AlgorithmInvocation Core.AlgorithmParamKey > | \ | \ / (description of parameter) > | \ | \ / > DoTS.Similarity Core.AnalysisAlgorithm Core.AlgorithmParam (parameters as a string) > (individual parameter) > > > And so, to insert into GUS a list of blast parameters such as: > > -p blastp -FD -W2 -G 11 -E 1 -e 0.5 -f 11 -M BLOSUM62 -b 1000000 -v 1000000 > > one would need to: > > have a row for every flag in Core.AlgorithmParamKey > have a row for every value after a flag in Core.AlgorithmParam > > which is very complicated to both insert and query. I suppose if one > wrote a gus plugin which is a wrapper around blast, inserting parameters > in these all of these rows/fields could be easily taken care > of, though we at the Preuss lab just are not going to do things > that way. We may get blast data from a collaborator that took > days to run on a multi-node cluster and we just want to dump > this data into DoTS.Similarity/DoTS.SimilaritySpan and query > it. We can't run this blast search again with a plugin. > > And again, querying blast results by blast parameter between the > same set of sequences looks to be very complex with the > above tables. Imagine writing SQL trying to distinguish between > blast results based on these three sets of parameters. > > -p blastp -FD -W2 -G 11 -E 1 -e 0.5 -f 11 -M BLOSUM62 -b 1000000 -v 1000000 > -p blastp -FD -W3 -G 11 -E 1 -e 0.5 -f 11 -M BLOSUM62 -b 1000000 -v 1000000 > -p blastp -FD -W2 -G 11 -E 1 -e 0.5 -f 11 -M BLOSUM80 -b 1000000 -v 1000000 > > > Perhaps people on the list can let me know if there are > any labs outside of CBIL that are depositing and querying > blast search parameters with the above tables. > > > What we at the Preuss lab really need is a simple way > to group rows in DoTS.Similarity together, much like the > way one groups rows in DoTS.ExternalNASequence together with > the table SRes.ExternalDatabaseRelease. Then a set of blast > results could be labeled with a convenient name > such as "ME vs Ath, W9" or "Jim's lab, -FF". > > > I will go a head and implement something locally to do > this, but I would think such a thing would not only > be valuable, but necessary for others too. > Does implementing a such a table (perhaps calling it > DoTS.SimilaritySet) in the official distribution make > sense? > > Or perhaps I am wrong in my understanding of how > the Core.AnalysisAlgorithm table can be used, and > there is a simpler way to do this. If so, I hope > someone can enlighten me. > > > Thank you for reading; > > Josef > |
From: Chris S. <sto...@pc...> - 2006-02-07 02:39:10
|
Hi Josef, AlgorithmInvocation has a comment_string attribute that can be used to put your convenient name. AnalysisAlgorithm can be used to tie that to Similarity rows through the softlinks table_id, row_id. Note that there is also a record keeping attribute: row_alg_invocation_id that could be used as well - this typically stores a record of the plugin used to load the data. The semantics of an ExternalDatabaseRelease have been broadened to include data files so that would be OK too if you wanted to record details of who, when, where, what etc. in a more structured way. What you would want I guess is a linking table that says these similarities came from this external database release. Evidence could be use (target = similarity; fact = external database release). We are looking into altering Similarity or adding a table to better capture alignments (an attribute to indicate gaps). We can also consider providing a link to external database release as part of this if it makes sense. Cheers. Chris On Feb 6, 2006, at 12:13 PM, Josef Jurek wrote: > > > Any thoughts on this? > > Does implementing a "SRes.ExternalDatabaseRelease"-like way to group > rows in DoTS.Similarity together sound useful to anybody, > or do people find the use of Core.AnalysisAlgorithm to accomplish > such a thing completely satisfactory? > > Thanks, Josef > > > Daphne Preuss Laboratory > Molecular Genetics and Cell Biology > The University of Chicago > ju...@cs... > > voice: (773) 834-3985 > fax: (773) 702-6648 > > > I wrote: >> >> >> In the recent past, some of us needed a way to >> distinguish between blast results in DoTS.Similarity by >> the blast parameters used. For example, I might >> blast the same sets of sequences several times >> with different parameters and put all the results >> of all these blast searches into DoTS.Similarity. >> >> I was of course, able to jury-rig a way to do >> this, though now with GUS 3.5, an officially >> sanctioned method has been implemented >> by using the Core.AnalysisAlgorithm table. >> >> Below is a crude entity relationship diagram >> of how Core.AnalysisAlgorithm fits in with other >> relevant tables (be sure to view this file with >> the courier font): >> >> Core.Algorithm (name) >> | >> | >> Core.AlgorithmImplementation >> Core.AlgorithmParamKeyType >> | | \ / >> (string, float, int, ...) >> | | \ / >> Core.TableInfo Core.AlgorithmInvocation >> Core.AlgorithmParamKey >> | \ | \ / (description of >> parameter) >> | \ | \ / >> DoTS.Similarity Core.AnalysisAlgorithm Core.AlgorithmParam >> (parameters as a string) >> (individual parameter) >> >> >> And so, to insert into GUS a list of blast parameters such as: >> >> -p blastp -FD -W2 -G 11 -E 1 -e 0.5 -f 11 -M BLOSUM62 -b >> 1000000 -v 1000000 >> >> one would need to: >> >> have a row for every flag in Core.AlgorithmParamKey >> have a row for every value after a flag in Core.AlgorithmParam >> >> which is very complicated to both insert and query. I suppose if one >> wrote a gus plugin which is a wrapper around blast, inserting >> parameters >> in these all of these rows/fields could be easily taken care >> of, though we at the Preuss lab just are not going to do things >> that way. We may get blast data from a collaborator that took >> days to run on a multi-node cluster and we just want to dump >> this data into DoTS.Similarity/DoTS.SimilaritySpan and query >> it. We can't run this blast search again with a plugin. >> >> And again, querying blast results by blast parameter between the >> same set of sequences looks to be very complex with the >> above tables. Imagine writing SQL trying to distinguish between >> blast results based on these three sets of parameters. >> >> -p blastp -FD -W2 -G 11 -E 1 -e 0.5 -f 11 -M BLOSUM62 -b >> 1000000 -v 1000000 >> -p blastp -FD -W3 -G 11 -E 1 -e 0.5 -f 11 -M BLOSUM62 -b >> 1000000 -v 1000000 >> -p blastp -FD -W2 -G 11 -E 1 -e 0.5 -f 11 -M BLOSUM80 -b >> 1000000 -v 1000000 >> >> >> Perhaps people on the list can let me know if there are >> any labs outside of CBIL that are depositing and querying >> blast search parameters with the above tables. >> >> >> What we at the Preuss lab really need is a simple way >> to group rows in DoTS.Similarity together, much like the >> way one groups rows in DoTS.ExternalNASequence together with >> the table SRes.ExternalDatabaseRelease. Then a set of blast >> results could be labeled with a convenient name >> such as "ME vs Ath, W9" or "Jim's lab, -FF". >> >> >> I will go a head and implement something locally to do >> this, but I would think such a thing would not only >> be valuable, but necessary for others too. >> Does implementing a such a table (perhaps calling it >> DoTS.SimilaritySet) in the official distribution make >> sense? >> >> Or perhaps I am wrong in my understanding of how >> the Core.AnalysisAlgorithm table can be used, and >> there is a simpler way to do this. If so, I hope >> someone can enlighten me. >> >> >> Thank you for reading; >> >> Josef >> > > > > > > > > > ------------------------------------------------------- > This SF.net email is sponsored by: Splunk Inc. Do you grep through > log files > for problems? Stop! Download the new AJAX search engine that makes > searching your log files as easy as surfing the web. DOWNLOAD > SPLUNK! > http://sel.as-us.falkag.net/sel? > cmd=lnk&kid=103432&bid=230486&dat=121642 > _______________________________________________ > Gusdev-gusdev mailing list > Gus...@li... > https://lists.sourceforge.net/lists/listinfo/gusdev-gusdev |
From: <ju...@cs...> - 2006-02-07 23:32:51
|
Chris Stoeckert <sto...@pc...> writes: > > AlgorithmInvocation has a comment_string attribute that can be used > to put your convenient name. AnalysisAlgorithm can be used to tie > that to Similarity rows through the softlinks table_id, row_id. Ok, for a set of blast results I can put one row in Core.AlgorithmInvocation (with the blast parameters) and Core.AnalysisAlgorithm is the linking table to multiple lines in DoTS.Similarity. For this purpose, one can then just ignore : Core.AlgorithmParamKey Core.AlgorithmParam Core.AlgorithmParamKeyType I did make it overly complicated. [...] > Evidence could > be use (target = similarity; fact = external database release). Ok, this could work too. > We are looking into altering Similarity or adding a table to better > capture alignments (an attribute to indicate gaps). Cool; > We can also > consider providing a link to external database release as part of > this if it makes sense. That's pretty much what I went a head an implemented in our local installation. Thanks for taking a look; Josef > On Feb 6, 2006, at 12:13 PM, Josef Jurek wrote: > > > > > > > Any thoughts on this? > > > > Does implementing a "SRes.ExternalDatabaseRelease"-like way to group > > rows in DoTS.Similarity together sound useful to anybody, > > or do people find the use of Core.AnalysisAlgorithm to accomplish > > such a thing completely satisfactory? > > > > Thanks, Josef > > [...] > > > > I wrote: > >> > >> > >> In the recent past, some of us needed a way to > >> distinguish between blast results in DoTS.Similarity by [...] |