From: Scott C. <sc...@sc...> - 2020-01-16 21:16:52
|
Hi Miharimamy, Thanks for sending this report. Generally, loading GFF into Chado can be difficult, as the perl-based loader that you are using can be quite particular about the format of the GFF and producers of GFF generally aren't so particular. Since the loader makes the (in my view, correct) decision to not load anything if it can't load everything in a file, it quits. So, taking each of the problems you found in order: 1. "dbxref=NCBI:" isn't a valid accession (presumably there should be something after the "NCBI:") and so the loader won't continue. Two options here are to contact SGD and point out that there is a problem with their GFF, or delete the offending item and try again, which it looks like you did for item 2. 2. If the item Note is missing from the cvterm table, I think that probably means that you didn't install the schema using the "make" procedure that would have installed some necessary items in the cv and cvterm table. 3 and 4: Messages that srcfeatures can't be found mean that the feature on which the features in your GFF reside hasn't been defined yet (that is, the thing referred to in column 1 of the GFF doesn't exist). Frequently, creators of GFF don't define the reference sequence in the GFF for whatever reason (it's not required by the GFF3 spec, since it might be credibly defined elsewhere). To define it in the GFF you have, add a line before anything else that looks something like this: NZ_CP027390.1 . chromosome 1 1234566 . . . ID= NZ_CP027390.1;Name=NZ_CP027390.1 Where "NZ_CP027390.1" is the name of the reference sequence, "chromosome" is the Sequence Ontology term for the type of thing (other options would include "contig" and "supercontig"), and the "123456" is the length of the sequence. Finally, I would add that using Chado through Tripal frequently makes life easier, though I think all of these issues would still have been problems (with the probable exception of item 2--Tripal would have initialized the "Note" item in cvterm I think). Scott On Thu, Jan 16, 2020 at 11:15 AM RAJAONISON Andriamiharimamy via Gmod-schema <gmo...@li...> wrote: > Hi, > > > > Hi, > > Hope this mail will find you well. Send you my best wishes for this new > year. > > I am reaching to you because I have issues loading GFF files into you > Chado. I tried several files but none of them seems to work. > > > > Thank you in advance for you help > > > > Miharimamy > > > > 1. The S.cerevisiae example in gmod.org > > *Command : * > > chado-1.31/load/bin/gmod_bulk_load_gff3.pl --organism yeast --gfffile > saccharomyces/saccharomyces_cerevisiae.gff.sorted --fastafile > saccharomyces/saccharomyces_cerevisiae.gff.sorted.fasta > > > > *Output : * > > ------------- EXCEPTION: Bio::Root::Exception ------------- > > MSG: Error in line: > > chrIII SGD chromosome 1 316620 . . . > ID=chrIII;dbxref=NCBI:;Name=chrIII > > > > Dbxref value 'NCBI:' did not conform to GFF3 specification > > STACK: Error::throw > > STACK: Bio::Root::Root::throw /usr/local/share/perl5/Bio/Root/Root.pm:447 > > STACK: Bio::FeatureIO::gff::_handle_feature > /usr/local/share/perl5/Bio/FeatureIO/gff.pm:659 > > STACK: Bio::FeatureIO::gff::next_feature > /usr/local/share/perl5/Bio/FeatureIO/gff.pm:187 > > STACK: chado-1.31/load/bin/gmod_bulk_load_gff3.pl:782 > > ----------------------------------------------------------- > > 1. S.cerevisiae without dbxref > > *Command : * > > chado-1.31/load/bin/gmod_bulk_load_gff3.pl --organism yeast --gfffile > saccharomyces/test_saccharomyces_ncbi --fastafile > saccharomyces/saccharomyces_cerevisiae.gff.sorted.fasta > > > > *Output : * > > ------------- EXCEPTION: Bio::Root::Exception ------------- > > MSG: I couldn't find the 'Note' cvterm in the database; > > Did you load the feature property controlled vocabulary? > > STACK: Error::throw > > STACK: Bio::Root::Root::throw /usr/local/share/perl5/Bio/Root/Root.pm:447 > > STACK: Bio::GMOD::DB::Adapter::handle_note > /usr/local/share/perl5/Bio/GMOD/DB/Adapter.pm:3901 > > STACK: chado-1.31/load/bin/gmod_bulk_load_gff3.pl:974 > > > > 1. A GFF from NCBI > > *Command : * > > chado-1.31/load/bin/gmod_bulk_load_gff3.pl --organism bacteria --gfffile > GCF_000008865.2/GCF_003018035.1_ASM301803v1_genomic.gff.sorted > > > > *Output :* > > Unable to find srcfeature NZ_CP027390.1 in the database. > > Perhaps you need to rerun your data load with the '--recreate_cache' > option. at /usr/local/share/perl5/Bio/GMOD/DB/Adapter.pm line 4605, <GEN0> > line 2. > > > Bio::GMOD::DB::Adapter::src_second_chance('Bio::GMOD::DB::Adapter=HASH(0x241a8d0)', > 'Bio::SeqFeature::Annotated=HASH(0x26c61d0)') called at chado-1.31/load/bin/ > gmod_bulk_load_gff3.pl line 851 > > > > 1. A GFF3 from prokka > > *Command :* > > chado-1.31/load/bin/gmod_bulk_load_gff3.pl --organism bacteria -gfffile > GFF3/GCF_000455285.gff.sorted --fastafile > GFF3/GCF_000455285.gff.sorted.fasta > > *Output :* > > Unable to find srcfeature gnl|Prokka|GCF_000455285_1 in the database. > > Perhaps you need to rerun your data load with the '--recreate_cache' > option. at /usr/local/share/perl5/Bio/GMOD/DB/Adapter.pm line 4605, <GEN0> > line 2. > > > Bio::GMOD::DB::Adapter::src_second_chance('Bio::GMOD::DB::Adapter=HASH(0x33008f0)', > 'Bio::SeqFeature::Annotated=HASH(0x35ac0a0)') called at chado-1.31/load/bin/ > gmod_bulk_load_gff3.pl line 851 > > > > > > > > > _______________________________________________ > Gmod-schema mailing list > Gmo...@li... > https://lists.sourceforge.net/lists/listinfo/gmod-schema > -- ------------------------------------------------------------------------ Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research |