Re: [Gmod-ajax] flatfile-to-json.pl error with GFF

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Hi again guys,

Here is a example:
NC_006578.gbk from
ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Bacillus_thuringiensis_konkukian/NC_006578.gbkhas
a gene that spans join(77090..77112,1..586) and it produces an error:

$ bp_genbank2gff3.pl -y NC_006578.gbk
# Input: NC_006578.gbk
# working on region:NC_006578, Bacillus thuringiensis serovar konkukian str.
97-27, 23-JUL-2008, Bacillus thuringiensis serovar konkukian str. 97-27
plasmid pBT9727, complete sequence.
NC_006578 Unflattening error:
Details:
------------- EXCEPTION: Bio::Root::Exception -------------
MSG: PROBLEM, SEVERITY==2
Ranges not in correct order. Strange ensembl genbank entry? Range:
[77090,77112] [1,586]
STACK: Error::throw
STACK: Bio::Root::Root::throw
/usr/local/share/perl/5.10.1/Bio/Root/Root.pm:473
STACK: Bio::SeqFeature::Tools::Unflattener::problem
/usr/local/share/perl/5.10.1/Bio/SeqFeature/Tools/Unflattener.pm:952
STACK: Bio::SeqFeature::Tools::Unflattener::_check_order_is_consistent
/usr/local/share/perl/5.10.1/Bio/SeqFeature/Tools/Unflattener.pm:2842
STACK: Bio::SeqFeature::Tools::Unflattener::infer_mRNA_from_CDS
/usr/local/share/perl/5.10.1/Bio/SeqFeature/Tools/Unflattener.pm:2713
STACK: Bio::SeqFeature::Tools::Unflattener::unflatten_seq
/usr/local/share/perl/5.10.1/Bio/SeqFeature/Tools/Unflattener.pm:1532
STACK: main::unflatten_seq /usr/local/bin/bp_genbank2gff3.pl:1023
STACK: /usr/local/bin/bp_genbank2gff3.pl:506
-----------------------------------------------------------

# Possible gene unflattening error withNC_006578: consult STDERR
# GFF3 saved to ./NC_006578.gff; DNA saved to ./NC_006578.fa

Another similar example: NC_008785.gbk from
ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Burkholderia_mallei_SAVP1/NC_008785.gbk
.

Best,
Dave

On Mon, Jul 26, 2010 at 6:20 PM, Scott Cain <sc...@sc...> wrote:

> Hi Dave,
>
> Please keep your responses on the list so they can be archived.
>
> I'm also cc'ing Nathan Liles, who did the work on the genbank2gff3
> script to deal with bacterial genomes.  Perhaps Nathan can take a look
> at this genbank entry and see more quickly what the problem is.
>
> Thanks,
> Scott
>
>
>
>
> On Sun, Jul 25, 2010 at 8:26 AM, David Breimann
> <dav...@gm...> wrote:
> > Scott,
> >
> > I cloned the latest version of bioperl from github (I'm not sure what you
> > mean by developers version; I thought the dev branch is obsolete but I'm
> not
> > sure; anyway - I got the version from bioperl-live).
> > bp_genbank2gff3.pl fails exactly on features which are on the margin,
> e.g.
> > "Ranges not in correct order. Strange ensembl genbank entry? Range:
> > [207497,208369] [1,687]".
> >
> > Thanks,
> > Dave
> >
> > On Fri, Jul 23, 2010 at 6:10 PM, Scott Cain <sc...@sc...> wrote:
> >> Hi David,
> >>
> >> The NCBI GFF3 is notoriously bad and doesn't pass validation at the
> >> GFF3 validator:
> >>
> >>  http://modencode.oicr.on.ca/cgi-bin/validate_gff3_online
> >>
> >> The most notable problems actually have to do with the relationships
> >> between features.  For example, in the first few lines:
> >>
> >> NC_007777.1     RefSeq  gene    35      1723    .       +       .
> >>  locus_tag=Francci3_0001;db_xref=GeneID:3902947
> >> NC_007777.1     RefSeq  CDS     35      1720    .       +       0
> >>  locus_tag=Francci3_0001;transl_table=11;product=chromosomal
> >> replication initiator protein
> >>
> >>
> DnaA;protein_id=YP_479125.1;db_xref=GI:86738725;db_xref=InterPro:IPR001957;db_xref=InterPro:IPR003593;db_xref=InterPro:IPR013159;db_xref=InterPro:IPR013317;db_xref=GeneID:3902947;exon_number=1
> >>
> >> While there is not anything technically wrong with these two lines,
> >> there is what you might call a logic error: the CDS should have the
> >> gene as a parent.  Without that information, a genome browser is going
> >> to have a difficult time displaying the data appropriately.  Feel free
> >> to complain to the folks at NCBI that there GFF3 is really bad (I've
> >> done that a few times, but I think they are ignoring me :-)
> >>
> >> So, the question is, what should you use?  The best option I can
> >> suggest to you is the genbank2gff3 script that comes with BioPerl,
> >> called bp_genbank2gff3.pl. If you get the developers version from
> >> github, you can use a version of that script that has been fixed to
> >> work appropriately with bacterial/circular genomes.
> >>
> >> Scott
> >>
> >>
> >> On Fri, Jul 23, 2010 at 10:54 AM, David Breimann
> >> <dav...@gm...> wrote:
> >>> I am trying to set up my first genome, after successfully playing with
> >>> the tutorial examples. and I run into some problems.
> >>>
> >>> I use a fasta and a gff file from NCBI:
> >>> ftp://ftp.ncbi.nlm.nih.gov/genomes/Bacteria/Frankia_CcI3/NC_007777.fna
> >>> ftp://ftp.ncbi.nlm.nih.gov/genomes/Bacteria/Frankia_CcI3/NC_007777.gff
> >>>
> >>> Setting up the sequence file seems to pass OK, but when I run
> >>> flatfile-to-json.pl with the GFF I get an error:
> >>>
> >>>
> >>> ../../../jbrowse/bin/flatfile-to-json.pl --gff NC_007777.gff
> >>> --tracklabel test -key test
> >>>
> >>> working on seq gi|86738724|ref|NC_007777.1|
> >>> Use of uninitialized value in string eq at
> >>> ../../../jbrowse/bin/flatfile-to-json.pl line 179, <GEN2> line 24.
> >>>
> >>> What's wrong?
> >>>
> >>> Thank you,
> >>> David
> >>>
> >>>
> >>>
> ------------------------------------------------------------------------------
> >>> This SF.net email is sponsored by Sprint
> >>> What will you do first with EVO, the first 4G phone?
> >>> Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first
> >>> _______________________________________________
> >>> Gmod-ajax mailing list
> >>> Gmo...@li...
> >>> https://lists.sourceforge.net/lists/listinfo/gmod-ajax
> >>>
> >>
> >>
> >>
> >> --
> >> ------------------------------------------------------------------------
> >> Scott Cain, Ph. D.                                   scott at scottcain
> >> dot net
> >> GMOD Coordinator (http://gmod.org/)                     216-392-3087
> >> Ontario Institute for Cancer Research
> >>
> >
> >
>
>
>
> --
> ------------------------------------------------------------------------
> Scott Cain, Ph. D.                                   scott at scottcain dot
> net
> GMOD Coordinator (http://gmod.org/)                     216-392-3087
> Ontario Institute for Cancer Research
>