|
From: Murphy, T. (NIH/NLM/N. [C] <mur...@nc...> - 2015-10-02 15:37:06
|
Yep, I was just using this as an example where the INSDC spec, which only supports the feature type "centromere", isn't as fine-grained as SO, which has centromere_DNA_element_I/II/III. So this is a case where there would be a little bit of value in providing the SO IDs on the features, but there aren't any examples in GenBank where it's been done. I didn't ask the GenBank submission staff if anyone had ever asked to include SO IDs or terms, or if it's just never come up. But from my review of existing data I found very few cases like this where it wasn't a simple 1:1 mapping between INSDC feature type (or feature type plus class attribute, like with the ncRNAs) and a SO term. That is, the extra granularity isn't in common use for the types of annotations that are submitted to GenBank. I did find some cases where our GFF3 writer wasn't using as precise a SO term as it could for converting an INSDC feature, but those are still 1:1 mappings and we can fix that. -Terence From: Stacia R Engel [mailto:st...@st...] Sent: Friday, September 11, 2015 6:27 AM To: SO developers <son...@li...> Subject: Re: [SO-devel] SO in Genbank/INSDC feature table files? SGD works directly with NCBI regarding which terms are allowed. There are currently constraints in place that do not permit more granular terms for the centromere example below (as well as other examples). Currently allowed feature_types are listed here; this is what we follow: http://www.insdc.org/documents/feature-table Regards, Stacia Engel On Sep 9, 2015, at 10:59 AM, Murphy, Terence (NIH/NLM/NCBI) [C] <mur...@nc...<mailto:mur...@nc...>> wrote: Yep, I can think of a few places where the INSDC features aren't as specific as one would like, but in general annotation data submitted to GenBank isn't very feature-rich. Most submissions don't extend beyond CDS features. There are a few where additional features were provided that could benefit from SO terms, but I'm not aware of a submitter offering to provide SO terms. For example, I could see adding specific SO db_xrefs to the centromere feature and sub-features for this example from S. cerevisiae: http://www.ncbi.nlm.nih.gov/nuccore/BK006935.2 centromere<http://www.ncbi.nlm.nih.gov/nuccore/329136682?from=151465&to=151582&sat=4&sat_key=143500901> 151465..151582 /note="CEN1; Chromosome I centromere" /db_xref="SGD:S000006463<http://www.yeastgenome.org/cgi-bin/locus.fpl?sgdid=S000006463>" centromere<http://www.ncbi.nlm.nih.gov/nuccore/329136682?from=151465&to=151474&sat=4&sat_key=143500901> 151465..151474 /note="CEN1_CDEI of CEN1" centromere<http://www.ncbi.nlm.nih.gov/nuccore/329136682?from=151475&to=151557&sat=4&sat_key=143500901> 151475..151557 /note="CEN1_CDEII of CEN1" centromere<http://www.ncbi.nlm.nih.gov/nuccore/329136682?from=151558&to=151582&sat=4&sat_key=143500901> 151558..151582 /note="CEN1_CDEIII of CEN1" As far as how it could be done, the main and really only option would be using db_xrefs, providing the SO:<integer> and registering the SO database as a db_xref tag. Also listing the text description for the SO term would have to go in the /note. But as of right now the SO tag isn't registered, so I don't think anyone has ever tried to submit such db_xrefs. If you're interested in what db_xrefs have been registered, the current list is here: http://www.ncbi.nlm.nih.gov/genbank/collab/db_xref Best regards, -Terence From: Jim Hu [mailto:ji...@ta...] Sent: Wednesday, September 09, 2015 12:51 PM To: SO developers <son...@li...<mailto:son...@li...>> Subject: Re: [SO-devel] SO in Genbank/INSDC feature table files? Hi Terence, Thanks for the reply. Interesting point on the validation issues. I was under the impression that while there is a mapping to INSDC feature keys, that SO annotation would provide value added based on the much higher granularity of SO. So the mapping is "lossy" in one direction. Which led me to wonder if there what the appropriate mechanism would be to use to add that information to a Genbank submission.... I was hoping others had already done this! Jim On Sep 8, 2015, at 7:04 AM, Murphy, Terence (NIH/NLM/NCBI) [C] <mur...@nc...<mailto:mur...@nc...>> wrote: Hi Jim, It doesn't look like there are any examples of SO terms submitted to INSDC, at least not as db_xrefs. Most INSDC features map 1:1 with a SO term, and NCBI uses internal mapping tables to convert between INSDC and SO feature types when reading and writing GFF3 files, so providing SO terms as db_xrefs would typically be redundant (or potentially conflict with our mapping table, so we'd need to add validation if anyone ever did submit annotation to INSDC with SO terms). There are a few areas where the INSDC feature options are a bit lacking and SO has something more specific, but those are rarely seen in INSDC submissions so I think the issue just hasn't come up. -Terence From: Jim Hu [mailto:ji...@ta...] Sent: Monday, September 07, 2015 2:22 PM To: SO developers <son...@li...<mailto:son...@li...>> Subject: [SO-devel] SO in Genbank/INSDC feature table files? Are there examples I can use for teaching where SO terms are incorporated in Genbank records? Would they be xrefs? Thanks, Jim ===================================== Jim Hu Professor Dept. of Biochemistry and Biophysics 2128 TAMU Texas A&M Univ. College Station, TX 77843-2128 979-862-4054 ------------------------------------------------------------------------------ _______________________________________________ SOng-devel mailing list SOn...@li...<mailto:SOn...@li...> https://lists.sourceforge.net/lists/listinfo/song-devel ============================== Jim Hu Dept. of Biochemistry and Biophysics Texas A&M Univ. Jam...@ag...<mailto:Jam...@ag...> ------------------------------------------------------------------------------ Monitor Your Dynamic Infrastructure at Any Scale With Datadog! Get real-time metrics from all of your servers, apps and tools in one place. SourceForge users - Click here to start your Free Trial of Datadog now! http://pubads.g.doubleclick.net/gampad/clk?id=241902991&iu=/4140_______________________________________________ SOng-devel mailing list SOn...@li...<mailto:SOn...@li...> https://lists.sourceforge.net/lists/listinfo/song-devel ----------------------------------------------------- Stacia R. Engel, Ph.D. Senior Biocuration Scientist Group Leader, Curation Saccharomyces Genome Database Department of Genetics Stanford University Stanford, CA 94305 USA www.yeastgenome.org<http://www.yeastgenome.org> st...@st...<mailto:st...@st...> ----------------------------------------------------- |