Re: [Gusdev-gusdev] Sequence Type controlled vocab

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Aaron encouraged me to take a second look at SO.   (my first look came 
up dry, and i surmised that it was more "feature" oriented than 
"sequence" oriented)

the results are below

But first, here are the term in CBIL's SequenceType table:
DNA
RNA
ds-DNA
ss-DNA
ss-RNA
ds-RNA
mRNA
EST
tRNA
rRNA
unknown
predicted_mRNA
virtual
GSS
oligonucleotide

To, me, this is confuting multiple concepts: polymer type, strandedness, 
molecule

But, I am now thinking that if we replaced that list with the following 
attributes and values, we would probably be just fine.  SequenceType 
here *is* confuting multiple concepts, but, in a way that i think will 
satisfy intuition and reasonable querying needs:

  Singlestranded 
     true
     false
  SequenceType:
     chromosomal
     mRNA
     rRNA
     tRNA
     EST
     oligo
   HasPieces   (is virtual)
      true
      false   

Now for the SO survey:

Polymer Type   - no
  - DNA  - no
  - RNA - no

Molecule - no
 - chromosome  - SO:0000340
  - mRNA -  SO:0000234
  - tRNA -  SO:0000253
 - rRNA -  SO:0000252
 - oligo -  SO:0000696
Strandedness - no
 - single  - no
 - double  - no
Sequencing process   - derived_from
 - Genomic - no
 - EST  -  SO:0000345
 - predicted - no
 - transcribed - no
 - what else?
Source - no
 - nucleus  - no
 - mitochondria - no
 - plastid  - no
 - plasmid  - no
 - episome  - no

Guess what, all the sequence types in my proposed list above are found 
in the SO:
 - chromosome  - SO:0000340
  - mRNA -  SO:0000234
  - tRNA -  SO:0000253
 - rRNA -  SO:0000252
 - oligo -  SO:0000696
  - EST  -  SO:0000345

But, does that mean we should abolish the SequenceType table?    If we 
do, then a sequence would point to the SO for its type.   The advantage 
is that we will be out of the business of inventing yet another CV.   
The disadvantage is that now users have to wade through 400+ terms to 
find the 6 that we think are relevant

????

steve

Steve Fischer wrote:

> folks-
>
> Having looked at SO and MGED, I am not sure they are capturing what I 
> have in mind, or, what we have captured in our SequenceType table
>
> Here is the way I am thinking about breaking down "sequence type."   
> (If somebody can show me how these map into either of the ontologies 
> Chris has mentioned that would be great).
>
> For NA sequences:
>
> Polymer Type
>  - DNA
>  - RNA
> Molecule
>  - chromosome
>  - mRNA
>  - tRNA
>  - rRNA
>  - oligo
> Strandedness
>  - single
>  - double
> Sequencing process
>  - Genomic
>  - EST
>  - predicted
>  - transcribed
>  - what else?
> Source
>  - nucleus
>  - mitochondria
>  - plastid
>  - plasmid
>  - episome
>
> Steve
>
> Chris Stoeckert wrote:
>
>> Steve,
>> There are two complementary standards for sequence type. One comes 
>> from the MGED Ontology.
>> see 
>> http://mged.sourceforge.net/ontologies/MGEDontology.php#BioSequenceType
>> The other is SO  http://song.sourceforge.net/
>> Chris
>>
>> On Feb 2, 2005, at 5:14 PM, Steve Fischer wrote:
>>
>>> folks-
>>>
>>> in gus we have a Dots.SequenceType table.
>>>
>>> here are the columns:
>>> nucleotide_type
>>> sub_type
>>> strand
>>> hierarchy    [should be hierarchy_depth]
>>> parent_sequence_type_id
>>> name
>>> description
>>>
>>> First question:  does anybody know of an "emerging standard" for this?
>>>
>>> If there is one, then we should include it in the Controlled Vocabs 
>>> that we package with GUS.
>>>
>>> Otherwise, we have, I think, two candidate SequenceTypeCVs:
>>>   - the one provided by Sanger on the wiki:  
>>> http://www.gusdb.org/wiki/index.php/Bootstrap%20data#ExternalDatabase
>>>   - the one currently housed in CBIL's GUS instance
>>>
>>> As part of the GUS 3.5 install, we are getting serious about making 
>>> the loading of CVs much easier.   A central part of that is making 
>>> the CVs available from CBIL's download site (eg, the CBIL anatomy CV).
>>>
>>> So, i am thinking that CBIL should chose one (or more) sequence type 
>>> CVs to provide as downloads.  They could be offered in GUS XML format.
>>>
>>> Then, the automated GUS CV installer would find them from CBIL just 
>>> like it will find GO from the GO Consortium.
>>>
>>> Any plugin that uses SequenceTypes should *not* hard code the 
>>> transform, but, instead, take a SequenceTypeMapping file.  The file 
>>> specifies the mapping from input sequence type to that stored in gus 
>>> (by name).  The plugin should pre-scan the input file to detect if 
>>> there are any illegal sequence types, and warn the user before 
>>> loading any data
>>>
>>> If users find sequence types that the CBIL CV is missing, they can 
>>> propose them via the mailing list.
>>>
>>> The objective is to:
>>> 1. work with the fact that different input files for a plugin may 
>>> use different sequence types
>>> 2. get out of the business of ad hoc changes to the sequence types 
>>> stored in the db
>>>
>>> comments?
>>>
>>> steve
>>>
>>> as a candidate CV the Sequence the SequenceTypesCV as developed by
>>>
>>> If not, then, how about this.   Plugins that depend on sequence type 
>>> use a standard config file for sequence type.  (this might apply to 
>>> other loose CVs).  The config file specifies the
>>>
>>>
>>> -------------------------------------------------------
>>> This SF.Net email is sponsored by: IntelliVIEW -- Interactive Reporting
>>> Tool for open source databases. Create drag-&-drop reports. Save time
>>> by over 75%! Publish reports on the web. Export to DOC, XLS, RTF, etc.
>>> Download a FREE copy at http://www.intelliview.com/go/osdn_nl
>>> _______________________________________________
>>> Gusdev-gusdev mailing list
>>> Gus...@li...
>>> https://lists.sourceforge.net/lists/listinfo/gusdev-gusdev
>>
>>
>>
>>
>>
>> -------------------------------------------------------
>> This SF.Net email is sponsored by: IntelliVIEW -- Interactive Reporting
>> Tool for open source databases. Create drag-&-drop reports. Save time
>> by over 75%! Publish reports on the web. Export to DOC, XLS, RTF, etc.
>> Download a FREE copy at http://www.intelliview.com/go/osdn_nl
>> _______________________________________________
>> Gusdev-gusdev mailing list
>> Gus...@li...
>> https://lists.sourceforge.net/lists/listinfo/gusdev-gusdev
>
>
>