|
From: Barry M. <bar...@ge...> - 2013-02-26 19:17:36
|
Thanks Peter for the useful feedback. The reference to the Order tag is a mistake - it's text that snuck in somehow from another version of the document where we were exploring the possible solutions. I'll take that out now. In the case of the multiple ##sequence-region directives for the same seqid. That was a clarification that I added thinking that it added clarity to the text without breaking backwards compatibility, but perhaps I was wrong. In the example you describe why wouldn't you just pull out the the features that you want, but still allow the ##sequence-region to specify the full range? This: ##sequence-region chr1 10000 52000 Instead of this: > ##sequence-region chr1 10000 12000 > ##sequence-region chr1 50000 52000 The second form seems to add significant complexity to the parser without obvious benefit to me. You now have to range overlap all features against all ##sequence-region directives for a given seqid to validate. However, if you broke the chromosome into parts (in the fasta file or ##FASTA section), you would have to use different seqids for the separate sequences and thus you'd have to adapt the sequence-region directives accordingly. For example, if you don't split the actual sequences then having the sequence-region span the full range of the sequence (or at least the full range of the annotated features), seems simpler and sufficient. ##sequence-region chr1 1 .. 52000 … ##FASTA >chr1 ACGT…TGCA ---------------------- If you are splitting out the sequences themselves then you have to re-name and the coordinate ranges change. ##sequence-region chr1_partA 1 .. 12000 # The next one is really 50000-52000 ##sequence-region chr1_partB 1 .. 2000 … ##FASTA >chr1_partA ACGT…TGCA >chr1_partB ACGT…TGCA On Feb 26, 2013, at 11:32 AM, Peter Cock wrote: > On Tue, Feb 26, 2013 at 5:02 PM, Barry Moore <bar...@gm...> wrote: >> Hi All, >> >> The GFF3 specification has been updated to version 1.21. Changes are minor >> - mostly involve clarification to the wording. Changes include the >> following: >> >> • Clarification of escaping conventions. >> • Explicit requirement that the value of start and end be one-based positive >> integers. >> • Clarification to the use of quotes in attribute values. >> • Clarification of lines begining with # and exclusion of inline comments. >> • Clarification that the ##gff-version pragma only appears once in a file. >> • Clarification to the ##sequence-region pragma. >> >> The current specification can be found at: >> http://www.sequenceontology.org/resources/gff3.html >> >> If you would like to review the changes in version 1.21 they are highlighted >> in the document: >> http://www.sequenceontology.org/resources/gff3_1.21.html >> >> The first meeting of the GFF3 working group was held on February 7th and >> discussed how to annotate partial features and circular genomes. This work >> is ongoing and will ultimately result in proposed updates to the >> specification and/or best practices guidelines. > > I noticed to things I wanted to bring up. > > There appears to be a problem in the current text on features spanning > the origin of a circular genome referencing the new proposed "Order" tag, > yet that tag is not listed (yet). As an aside, I personally prefer the name > "Part" as in Part=1/3, Part=2/3 and Part=3/3, but the term isn't critical. > > Separately, the new text forbids multiple ##sequence-region lines > for the same seqid. Was the use-case of describing two (or more) > sub-regions of a large reference rejected? For example, extracting > a reduced GFF3 file to include just the locus of two genes of interest, > say: > > ##sequence-region chr1 10000 12000 > ##sequence-region chr1 50000 52000 > > Regards, > > Peter > > ------------------------------------------------------------------------------ > Everyone hates slow websites. So do we. > Make your web apps faster with AppDynamics > Download AppDynamics Lite for free today: > http://p.sf.net/sfu/appdyn_d2d_feb > _______________________________________________ > SOng-devel mailing list > SOn...@li... > https://lists.sourceforge.net/lists/listinfo/song-devel Barry Moore Research Scientist Dept. of Human Genetics University of Utah Salt Lake City, UT 84112 -------------------------------------------- (801) 585-3543 |