From: RAJAONISON A. <mir...@ya...> - 2020-01-17 13:57:46
|
Hi Scott, Thank you again for your help because adding the parent feature removes the srcfeature related error. Yet I get another issue that I don’t understand : --------------------- WARNING --------------------- MSG: Can not set Bio::Location::Simple::end() equal to start; start not set --------------------------------------------------- Use of uninitialized value $featuretype in pattern match (m//) at chado-1.31/load/bin/gmod_bulk_load_gff3.pl line 819, <GEN0> line 2. Use of uninitialized value $featuretype in pattern match (m//) at chado-1.31/load/bin/gmod_bulk_load_gff3.pl line 820, <GEN0> line 2. ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: no cvterm for STACK: Error::throw STACK: Bio::Root::Root::throw /usr/local/share/perl5/Bio/Root/Root.pm:447 STACK: Bio::GMOD::DB::Adapter::get_type /usr/local/share/perl5/Bio/GMOD/DB/Adapter.pm:4629 STACK: chado-1.31/load/bin/gmod_bulk_load_gff3.pl:848 ----------------------------------------------------------- Abnormal termination, trying to clean up... Attempting to clean up the loader temp table (so that --recreate_cache won't be needed)... Trying to remove the run lock (so that --remove_lock won't be needed)... Exiting... Thank you Miharimamy De : Scott Cain <sc...@sc...> Envoyé : jeudi 16 janvier 2020 22:31 À : RAJAONISON Andriamiharimamy <mir...@ya...> Cc : GMOD Schema/Chado List <gmo...@li...> Objet : Re: [Gmod-schema] [ISSUE] Load gff in chado Hi Miharimamy, Thanks for sending this report. Generally, loading GFF into Chado can be difficult, as the perl-based loader that you are using can be quite particular about the format of the GFF and producers of GFF generally aren't so particular. Since the loader makes the (in my view, correct) decision to not load anything if it can't load everything in a file, it quits. So, taking each of the problems you found in order: 1. "dbxref=NCBI:" isn't a valid accession (presumably there should be something after the "NCBI:") and so the loader won't continue. Two options here are to contact SGD and point out that there is a problem with their GFF, or delete the offending item and try again, which it looks like you did for item 2. 2. If the item Note is missing from the cvterm table, I think that probably means that you didn't install the schema using the "make" procedure that would have installed some necessary items in the cv and cvterm table. 3 and 4: Messages that srcfeatures can't be found mean that the feature on which the features in your GFF reside hasn't been defined yet (that is, the thing referred to in column 1 of the GFF doesn't exist). Frequently, creators of GFF don't define the reference sequence in the GFF for whatever reason (it's not required by the GFF3 spec, since it might be credibly defined elsewhere). To define it in the GFF you have, add a line before anything else that looks something like this: NZ_CP027390.1 . chromosome 1 1234566 . . . ID=NZ_CP027390.1;Name=NZ_CP027390.1 Where "NZ_CP027390.1" is the name of the reference sequence, "chromosome" is the Sequence Ontology term for the type of thing (other options would include "contig" and "supercontig"), and the "123456" is the length of the sequence. Finally, I would add that using Chado through Tripal frequently makes life easier, though I think all of these issues would still have been problems (with the probable exception of item 2--Tripal would have initialized the "Note" item in cvterm I think). Scott On Thu, Jan 16, 2020 at 11:15 AM RAJAONISON Andriamiharimamy via Gmod-schema <gmo...@li... <mailto:gmo...@li...> > wrote: Hi, Hi, Hope this mail will find you well. Send you my best wishes for this new year. I am reaching to you because I have issues loading GFF files into you Chado. I tried several files but none of them seems to work. Thank you in advance for you help Miharimamy 1. The S.cerevisiae example in gmod.org <http://gmod.org> Command : chado-1.31/load/bin/gmod_bulk_load_gff3.pl <http://gmod_bulk_load_gff3.pl> --organism yeast --gfffile saccharomyces/saccharomyces_cerevisiae.gff.sorted --fastafile saccharomyces/saccharomyces_cerevisiae.gff.sorted.fasta Output : ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: Error in line: chrIII SGD chromosome 1 316620 . . . ID=chrIII;dbxref=NCBI:;Name=chrIII Dbxref value 'NCBI:' did not conform to GFF3 specification STACK: Error::throw STACK: Bio::Root::Root::throw /usr/local/share/perl5/Bio/Root/Root.pm:447 STACK: Bio::FeatureIO::gff::_handle_feature /usr/local/share/perl5/Bio/FeatureIO/gff.pm:659 <http://gff.pm:659> STACK: Bio::FeatureIO::gff::next_feature /usr/local/share/perl5/Bio/FeatureIO/gff.pm:187 <http://gff.pm:187> STACK: chado-1.31/load/bin/gmod_bulk_load_gff3.pl:782 <http://gmod_bulk_load_gff3.pl:782> ----------------------------------------------------------- 2. S.cerevisiae without dbxref Command : chado-1.31/load/bin/gmod_bulk_load_gff3.pl <http://gmod_bulk_load_gff3.pl> --organism yeast --gfffile saccharomyces/test_saccharomyces_ncbi --fastafile saccharomyces/saccharomyces_cerevisiae.gff.sorted.fasta Output : ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: I couldn't find the 'Note' cvterm in the database; Did you load the feature property controlled vocabulary? STACK: Error::throw STACK: Bio::Root::Root::throw /usr/local/share/perl5/Bio/Root/Root.pm:447 STACK: Bio::GMOD::DB::Adapter::handle_note /usr/local/share/perl5/Bio/GMOD/DB/Adapter.pm:3901 STACK: chado-1.31/load/bin/gmod_bulk_load_gff3.pl:974 <http://gmod_bulk_load_gff3.pl:974> 3. A GFF from NCBI Command : chado-1.31/load/bin/gmod_bulk_load_gff3.pl <http://gmod_bulk_load_gff3.pl> --organism bacteria --gfffile GCF_000008865.2/GCF_003018035.1_ASM301803v1_genomic.gff.sorted Output : Unable to find srcfeature NZ_CP027390.1 in the database. Perhaps you need to rerun your data load with the '--recreate_cache' option. at /usr/local/share/perl5/Bio/GMOD/DB/Adapter.pm line 4605, <GEN0> line 2. Bio::GMOD::DB::Adapter::src_second_chance('Bio::GMOD::DB::Adapter=HASH(0x241a8d0)', 'Bio::SeqFeature::Annotated=HASH(0x26c61d0)') called at chado-1.31/load/bin/gmod_bulk_load_gff3.pl <http://gmod_bulk_load_gff3.pl> line 851 4. A GFF3 from prokka Command : chado-1.31/load/bin/gmod_bulk_load_gff3.pl <http://gmod_bulk_load_gff3.pl> --organism bacteria -gfffile GFF3/GCF_000455285.gff.sorted --fastafile GFF3/GCF_000455285.gff.sorted.fasta Output : Unable to find srcfeature gnl|Prokka|GCF_000455285_1 in the database. Perhaps you need to rerun your data load with the '--recreate_cache' option. at /usr/local/share/perl5/Bio/GMOD/DB/Adapter.pm line 4605, <GEN0> line 2. Bio::GMOD::DB::Adapter::src_second_chance('Bio::GMOD::DB::Adapter=HASH(0x33008f0)', 'Bio::SeqFeature::Annotated=HASH(0x35ac0a0)') called at chado-1.31/load/bin/gmod_bulk_load_gff3.pl <http://gmod_bulk_load_gff3.pl> line 851 _______________________________________________ Gmod-schema mailing list Gmo...@li... <mailto:Gmo...@li...> https://lists.sourceforge.net/lists/listinfo/gmod-schema -- ------------------------------------------------------------------------ Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research |