From: Scott C. <sc...@sc...> - 2010-07-23 15:10:44
|
Hi David, The NCBI GFF3 is notoriously bad and doesn't pass validation at the GFF3 validator: http://modencode.oicr.on.ca/cgi-bin/validate_gff3_online The most notable problems actually have to do with the relationships between features. For example, in the first few lines: NC_007777.1 RefSeq gene 35 1723 . + . locus_tag=Francci3_0001;db_xref=GeneID:3902947 NC_007777.1 RefSeq CDS 35 1720 . + 0 locus_tag=Francci3_0001;transl_table=11;product=chromosomal replication initiator protein DnaA;protein_id=YP_479125.1;db_xref=GI:86738725;db_xref=InterPro:IPR001957;db_xref=InterPro:IPR003593;db_xref=InterPro:IPR013159;db_xref=InterPro:IPR013317;db_xref=GeneID:3902947;exon_number=1 While there is not anything technically wrong with these two lines, there is what you might call a logic error: the CDS should have the gene as a parent. Without that information, a genome browser is going to have a difficult time displaying the data appropriately. Feel free to complain to the folks at NCBI that there GFF3 is really bad (I've done that a few times, but I think they are ignoring me :-) So, the question is, what should you use? The best option I can suggest to you is the genbank2gff3 script that comes with BioPerl, called bp_genbank2gff3.pl. If you get the developers version from github, you can use a version of that script that has been fixed to work appropriately with bacterial/circular genomes. Scott On Fri, Jul 23, 2010 at 10:54 AM, David Breimann <dav...@gm...> wrote: > I am trying to set up my first genome, after successfully playing with > the tutorial examples. and I run into some problems. > > I use a fasta and a gff file from NCBI: > ftp://ftp.ncbi.nlm.nih.gov/genomes/Bacteria/Frankia_CcI3/NC_007777.fna > ftp://ftp.ncbi.nlm.nih.gov/genomes/Bacteria/Frankia_CcI3/NC_007777.gff > > Setting up the sequence file seems to pass OK, but when I run > flatfile-to-json.pl with the GFF I get an error: > > > ../../../jbrowse/bin/flatfile-to-json.pl --gff NC_007777.gff > --tracklabel test -key test > > working on seq gi|86738724|ref|NC_007777.1| > Use of uninitialized value in string eq at > ../../../jbrowse/bin/flatfile-to-json.pl line 179, <GEN2> line 24. > > What's wrong? > > Thank you, > David > > ------------------------------------------------------------------------------ > This SF.net email is sponsored by Sprint > What will you do first with EVO, the first 4G phone? > Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first > _______________________________________________ > Gmod-ajax mailing list > Gmo...@li... > https://lists.sourceforge.net/lists/listinfo/gmod-ajax > -- ------------------------------------------------------------------------ Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research |