From: David B. <dav...@gm...> - 2010-07-27 06:14:49
|
Hi again guys, Here is a example: NC_006578.gbk from ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Bacillus_thuringiensis_konkukian/NC_006578.gbkhas a gene that spans join(77090..77112,1..586) and it produces an error: $ bp_genbank2gff3.pl -y NC_006578.gbk # Input: NC_006578.gbk # working on region:NC_006578, Bacillus thuringiensis serovar konkukian str. 97-27, 23-JUL-2008, Bacillus thuringiensis serovar konkukian str. 97-27 plasmid pBT9727, complete sequence. NC_006578 Unflattening error: Details: ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: PROBLEM, SEVERITY==2 Ranges not in correct order. Strange ensembl genbank entry? Range: [77090,77112] [1,586] STACK: Error::throw STACK: Bio::Root::Root::throw /usr/local/share/perl/5.10.1/Bio/Root/Root.pm:473 STACK: Bio::SeqFeature::Tools::Unflattener::problem /usr/local/share/perl/5.10.1/Bio/SeqFeature/Tools/Unflattener.pm:952 STACK: Bio::SeqFeature::Tools::Unflattener::_check_order_is_consistent /usr/local/share/perl/5.10.1/Bio/SeqFeature/Tools/Unflattener.pm:2842 STACK: Bio::SeqFeature::Tools::Unflattener::infer_mRNA_from_CDS /usr/local/share/perl/5.10.1/Bio/SeqFeature/Tools/Unflattener.pm:2713 STACK: Bio::SeqFeature::Tools::Unflattener::unflatten_seq /usr/local/share/perl/5.10.1/Bio/SeqFeature/Tools/Unflattener.pm:1532 STACK: main::unflatten_seq /usr/local/bin/bp_genbank2gff3.pl:1023 STACK: /usr/local/bin/bp_genbank2gff3.pl:506 ----------------------------------------------------------- # Possible gene unflattening error withNC_006578: consult STDERR # GFF3 saved to ./NC_006578.gff; DNA saved to ./NC_006578.fa Another similar example: NC_008785.gbk from ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Burkholderia_mallei_SAVP1/NC_008785.gbk . Best, Dave On Mon, Jul 26, 2010 at 6:20 PM, Scott Cain <sc...@sc...> wrote: > Hi Dave, > > Please keep your responses on the list so they can be archived. > > I'm also cc'ing Nathan Liles, who did the work on the genbank2gff3 > script to deal with bacterial genomes. Perhaps Nathan can take a look > at this genbank entry and see more quickly what the problem is. > > Thanks, > Scott > > > > > On Sun, Jul 25, 2010 at 8:26 AM, David Breimann > <dav...@gm...> wrote: > > Scott, > > > > I cloned the latest version of bioperl from github (I'm not sure what you > > mean by developers version; I thought the dev branch is obsolete but I'm > not > > sure; anyway - I got the version from bioperl-live). > > bp_genbank2gff3.pl fails exactly on features which are on the margin, > e.g. > > "Ranges not in correct order. Strange ensembl genbank entry? Range: > > [207497,208369] [1,687]". > > > > Thanks, > > Dave > > > > On Fri, Jul 23, 2010 at 6:10 PM, Scott Cain <sc...@sc...> wrote: > >> Hi David, > >> > >> The NCBI GFF3 is notoriously bad and doesn't pass validation at the > >> GFF3 validator: > >> > >> http://modencode.oicr.on.ca/cgi-bin/validate_gff3_online > >> > >> The most notable problems actually have to do with the relationships > >> between features. For example, in the first few lines: > >> > >> NC_007777.1 RefSeq gene 35 1723 . + . > >> locus_tag=Francci3_0001;db_xref=GeneID:3902947 > >> NC_007777.1 RefSeq CDS 35 1720 . + 0 > >> locus_tag=Francci3_0001;transl_table=11;product=chromosomal > >> replication initiator protein > >> > >> > DnaA;protein_id=YP_479125.1;db_xref=GI:86738725;db_xref=InterPro:IPR001957;db_xref=InterPro:IPR003593;db_xref=InterPro:IPR013159;db_xref=InterPro:IPR013317;db_xref=GeneID:3902947;exon_number=1 > >> > >> While there is not anything technically wrong with these two lines, > >> there is what you might call a logic error: the CDS should have the > >> gene as a parent. Without that information, a genome browser is going > >> to have a difficult time displaying the data appropriately. Feel free > >> to complain to the folks at NCBI that there GFF3 is really bad (I've > >> done that a few times, but I think they are ignoring me :-) > >> > >> So, the question is, what should you use? The best option I can > >> suggest to you is the genbank2gff3 script that comes with BioPerl, > >> called bp_genbank2gff3.pl. If you get the developers version from > >> github, you can use a version of that script that has been fixed to > >> work appropriately with bacterial/circular genomes. > >> > >> Scott > >> > >> > >> On Fri, Jul 23, 2010 at 10:54 AM, David Breimann > >> <dav...@gm...> wrote: > >>> I am trying to set up my first genome, after successfully playing with > >>> the tutorial examples. and I run into some problems. > >>> > >>> I use a fasta and a gff file from NCBI: > >>> ftp://ftp.ncbi.nlm.nih.gov/genomes/Bacteria/Frankia_CcI3/NC_007777.fna > >>> ftp://ftp.ncbi.nlm.nih.gov/genomes/Bacteria/Frankia_CcI3/NC_007777.gff > >>> > >>> Setting up the sequence file seems to pass OK, but when I run > >>> flatfile-to-json.pl with the GFF I get an error: > >>> > >>> > >>> ../../../jbrowse/bin/flatfile-to-json.pl --gff NC_007777.gff > >>> --tracklabel test -key test > >>> > >>> working on seq gi|86738724|ref|NC_007777.1| > >>> Use of uninitialized value in string eq at > >>> ../../../jbrowse/bin/flatfile-to-json.pl line 179, <GEN2> line 24. > >>> > >>> What's wrong? > >>> > >>> Thank you, > >>> David > >>> > >>> > >>> > ------------------------------------------------------------------------------ > >>> This SF.net email is sponsored by Sprint > >>> What will you do first with EVO, the first 4G phone? > >>> Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first > >>> _______________________________________________ > >>> Gmod-ajax mailing list > >>> Gmo...@li... > >>> https://lists.sourceforge.net/lists/listinfo/gmod-ajax > >>> > >> > >> > >> > >> -- > >> ------------------------------------------------------------------------ > >> Scott Cain, Ph. D. scott at scottcain > >> dot net > >> GMOD Coordinator (http://gmod.org/) 216-392-3087 > >> Ontario Institute for Cancer Research > >> > > > > > > > > -- > ------------------------------------------------------------------------ > Scott Cain, Ph. D. scott at scottcain dot > net > GMOD Coordinator (http://gmod.org/) 216-392-3087 > Ontario Institute for Cancer Research > |