From: Steve F. <sfi...@pc...> - 2005-02-03 12:53:07
|
Aaron encouraged me to take a second look at SO. (my first look came up dry, and i surmised that it was more "feature" oriented than "sequence" oriented) the results are below But first, here are the term in CBIL's SequenceType table: DNA RNA ds-DNA ss-DNA ss-RNA ds-RNA mRNA EST tRNA rRNA unknown predicted_mRNA virtual GSS oligonucleotide To, me, this is confuting multiple concepts: polymer type, strandedness, molecule But, I am now thinking that if we replaced that list with the following attributes and values, we would probably be just fine. SequenceType here *is* confuting multiple concepts, but, in a way that i think will satisfy intuition and reasonable querying needs: Singlestranded true false SequenceType: chromosomal mRNA rRNA tRNA EST oligo HasPieces (is virtual) true false Now for the SO survey: Polymer Type - no - DNA - no - RNA - no Molecule - no - chromosome - SO:0000340 - mRNA - SO:0000234 - tRNA - SO:0000253 - rRNA - SO:0000252 - oligo - SO:0000696 Strandedness - no - single - no - double - no Sequencing process - derived_from - Genomic - no - EST - SO:0000345 - predicted - no - transcribed - no - what else? Source - no - nucleus - no - mitochondria - no - plastid - no - plasmid - no - episome - no Guess what, all the sequence types in my proposed list above are found in the SO: - chromosome - SO:0000340 - mRNA - SO:0000234 - tRNA - SO:0000253 - rRNA - SO:0000252 - oligo - SO:0000696 - EST - SO:0000345 But, does that mean we should abolish the SequenceType table? If we do, then a sequence would point to the SO for its type. The advantage is that we will be out of the business of inventing yet another CV. The disadvantage is that now users have to wade through 400+ terms to find the 6 that we think are relevant ???? steve Steve Fischer wrote: > folks- > > Having looked at SO and MGED, I am not sure they are capturing what I > have in mind, or, what we have captured in our SequenceType table > > Here is the way I am thinking about breaking down "sequence type." > (If somebody can show me how these map into either of the ontologies > Chris has mentioned that would be great). > > For NA sequences: > > Polymer Type > - DNA > - RNA > Molecule > - chromosome > - mRNA > - tRNA > - rRNA > - oligo > Strandedness > - single > - double > Sequencing process > - Genomic > - EST > - predicted > - transcribed > - what else? > Source > - nucleus > - mitochondria > - plastid > - plasmid > - episome > > Steve > > Chris Stoeckert wrote: > >> Steve, >> There are two complementary standards for sequence type. One comes >> from the MGED Ontology. >> see >> http://mged.sourceforge.net/ontologies/MGEDontology.php#BioSequenceType >> The other is SO http://song.sourceforge.net/ >> Chris >> >> On Feb 2, 2005, at 5:14 PM, Steve Fischer wrote: >> >>> folks- >>> >>> in gus we have a Dots.SequenceType table. >>> >>> here are the columns: >>> nucleotide_type >>> sub_type >>> strand >>> hierarchy [should be hierarchy_depth] >>> parent_sequence_type_id >>> name >>> description >>> >>> First question: does anybody know of an "emerging standard" for this? >>> >>> If there is one, then we should include it in the Controlled Vocabs >>> that we package with GUS. >>> >>> Otherwise, we have, I think, two candidate SequenceTypeCVs: >>> - the one provided by Sanger on the wiki: >>> http://www.gusdb.org/wiki/index.php/Bootstrap%20data#ExternalDatabase >>> - the one currently housed in CBIL's GUS instance >>> >>> As part of the GUS 3.5 install, we are getting serious about making >>> the loading of CVs much easier. A central part of that is making >>> the CVs available from CBIL's download site (eg, the CBIL anatomy CV). >>> >>> So, i am thinking that CBIL should chose one (or more) sequence type >>> CVs to provide as downloads. They could be offered in GUS XML format. >>> >>> Then, the automated GUS CV installer would find them from CBIL just >>> like it will find GO from the GO Consortium. >>> >>> Any plugin that uses SequenceTypes should *not* hard code the >>> transform, but, instead, take a SequenceTypeMapping file. The file >>> specifies the mapping from input sequence type to that stored in gus >>> (by name). The plugin should pre-scan the input file to detect if >>> there are any illegal sequence types, and warn the user before >>> loading any data >>> >>> If users find sequence types that the CBIL CV is missing, they can >>> propose them via the mailing list. >>> >>> The objective is to: >>> 1. work with the fact that different input files for a plugin may >>> use different sequence types >>> 2. get out of the business of ad hoc changes to the sequence types >>> stored in the db >>> >>> comments? >>> >>> steve >>> >>> as a candidate CV the Sequence the SequenceTypesCV as developed by >>> >>> If not, then, how about this. Plugins that depend on sequence type >>> use a standard config file for sequence type. (this might apply to >>> other loose CVs). The config file specifies the >>> >>> >>> ------------------------------------------------------- >>> This SF.Net email is sponsored by: IntelliVIEW -- Interactive Reporting >>> Tool for open source databases. Create drag-&-drop reports. Save time >>> by over 75%! Publish reports on the web. Export to DOC, XLS, RTF, etc. >>> Download a FREE copy at http://www.intelliview.com/go/osdn_nl >>> _______________________________________________ >>> Gusdev-gusdev mailing list >>> Gus...@li... >>> https://lists.sourceforge.net/lists/listinfo/gusdev-gusdev >> >> >> >> >> >> ------------------------------------------------------- >> This SF.Net email is sponsored by: IntelliVIEW -- Interactive Reporting >> Tool for open source databases. Create drag-&-drop reports. Save time >> by over 75%! Publish reports on the web. Export to DOC, XLS, RTF, etc. >> Download a FREE copy at http://www.intelliview.com/go/osdn_nl >> _______________________________________________ >> Gusdev-gusdev mailing list >> Gus...@li... >> https://lists.sourceforge.net/lists/listinfo/gusdev-gusdev > > > |