From: RAJAONISON A. <mir...@ya...> - 2020-01-15 11:30:32
|
Hi, Hi, Hope this mail will find you well. Send you my best wishes for this new year. I am reaching to you because I have issues loading GFF files into you Chado. I tried several files but none of them seems to work. Thank you in advance for you help Miharimamy 1. The S.cerevisiae example in gmod.org Command : chado-1.31/load/bin/gmod_bulk_load_gff3.pl --organism yeast --gfffile saccharomyces/saccharomyces_cerevisiae.gff.sorted --fastafile saccharomyces/saccharomyces_cerevisiae.gff.sorted.fasta Output : ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: Error in line: chrIII SGD chromosome 1 316620 . . . ID=chrIII;dbxref=NCBI:;Name=chrIII Dbxref value 'NCBI:' did not conform to GFF3 specification STACK: Error::throw STACK: Bio::Root::Root::throw /usr/local/share/perl5/Bio/Root/Root.pm:447 STACK: Bio::FeatureIO::gff::_handle_feature /usr/local/share/perl5/Bio/FeatureIO/gff.pm:659 STACK: Bio::FeatureIO::gff::next_feature /usr/local/share/perl5/Bio/FeatureIO/gff.pm:187 STACK: chado-1.31/load/bin/gmod_bulk_load_gff3.pl:782 ----------------------------------------------------------- 2. S.cerevisiae without dbxref Command : chado-1.31/load/bin/gmod_bulk_load_gff3.pl --organism yeast --gfffile saccharomyces/test_saccharomyces_ncbi --fastafile saccharomyces/saccharomyces_cerevisiae.gff.sorted.fasta Output : ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: I couldn't find the 'Note' cvterm in the database; Did you load the feature property controlled vocabulary? STACK: Error::throw STACK: Bio::Root::Root::throw /usr/local/share/perl5/Bio/Root/Root.pm:447 STACK: Bio::GMOD::DB::Adapter::handle_note /usr/local/share/perl5/Bio/GMOD/DB/Adapter.pm:3901 STACK: chado-1.31/load/bin/gmod_bulk_load_gff3.pl:974 3. A GFF from NCBI Command : chado-1.31/load/bin/gmod_bulk_load_gff3.pl --organism bacteria --gfffile GCF_000008865.2/GCF_003018035.1_ASM301803v1_genomic.gff.sorted Output : Unable to find srcfeature NZ_CP027390.1 in the database. Perhaps you need to rerun your data load with the '--recreate_cache' option. at /usr/local/share/perl5/Bio/GMOD/DB/Adapter.pm line 4605, <GEN0> line 2. Bio::GMOD::DB::Adapter::src_second_chance('Bio::GMOD::DB::Adapter=HASH(0x241 a8d0)', 'Bio::SeqFeature::Annotated=HASH(0x26c61d0)') called at chado-1.31/load/bin/gmod_bulk_load_gff3.pl line 851 4. A GFF3 from prokka Command : chado-1.31/load/bin/gmod_bulk_load_gff3.pl --organism bacteria -gfffile GFF3/GCF_000455285.gff.sorted --fastafile GFF3/GCF_000455285.gff.sorted.fasta Output : Unable to find srcfeature gnl|Prokka|GCF_000455285_1 in the database. Perhaps you need to rerun your data load with the '--recreate_cache' option. at /usr/local/share/perl5/Bio/GMOD/DB/Adapter.pm line 4605, <GEN0> line 2. Bio::GMOD::DB::Adapter::src_second_chance('Bio::GMOD::DB::Adapter=HASH(0x330 08f0)', 'Bio::SeqFeature::Annotated=HASH(0x35ac0a0)') called at chado-1.31/load/bin/gmod_bulk_load_gff3.pl line 851 |
From: Scott C. <sc...@sc...> - 2020-01-16 21:16:52
|
Hi Miharimamy, Thanks for sending this report. Generally, loading GFF into Chado can be difficult, as the perl-based loader that you are using can be quite particular about the format of the GFF and producers of GFF generally aren't so particular. Since the loader makes the (in my view, correct) decision to not load anything if it can't load everything in a file, it quits. So, taking each of the problems you found in order: 1. "dbxref=NCBI:" isn't a valid accession (presumably there should be something after the "NCBI:") and so the loader won't continue. Two options here are to contact SGD and point out that there is a problem with their GFF, or delete the offending item and try again, which it looks like you did for item 2. 2. If the item Note is missing from the cvterm table, I think that probably means that you didn't install the schema using the "make" procedure that would have installed some necessary items in the cv and cvterm table. 3 and 4: Messages that srcfeatures can't be found mean that the feature on which the features in your GFF reside hasn't been defined yet (that is, the thing referred to in column 1 of the GFF doesn't exist). Frequently, creators of GFF don't define the reference sequence in the GFF for whatever reason (it's not required by the GFF3 spec, since it might be credibly defined elsewhere). To define it in the GFF you have, add a line before anything else that looks something like this: NZ_CP027390.1 . chromosome 1 1234566 . . . ID= NZ_CP027390.1;Name=NZ_CP027390.1 Where "NZ_CP027390.1" is the name of the reference sequence, "chromosome" is the Sequence Ontology term for the type of thing (other options would include "contig" and "supercontig"), and the "123456" is the length of the sequence. Finally, I would add that using Chado through Tripal frequently makes life easier, though I think all of these issues would still have been problems (with the probable exception of item 2--Tripal would have initialized the "Note" item in cvterm I think). Scott On Thu, Jan 16, 2020 at 11:15 AM RAJAONISON Andriamiharimamy via Gmod-schema <gmo...@li...> wrote: > Hi, > > > > Hi, > > Hope this mail will find you well. Send you my best wishes for this new > year. > > I am reaching to you because I have issues loading GFF files into you > Chado. I tried several files but none of them seems to work. > > > > Thank you in advance for you help > > > > Miharimamy > > > > 1. The S.cerevisiae example in gmod.org > > *Command : * > > chado-1.31/load/bin/gmod_bulk_load_gff3.pl --organism yeast --gfffile > saccharomyces/saccharomyces_cerevisiae.gff.sorted --fastafile > saccharomyces/saccharomyces_cerevisiae.gff.sorted.fasta > > > > *Output : * > > ------------- EXCEPTION: Bio::Root::Exception ------------- > > MSG: Error in line: > > chrIII SGD chromosome 1 316620 . . . > ID=chrIII;dbxref=NCBI:;Name=chrIII > > > > Dbxref value 'NCBI:' did not conform to GFF3 specification > > STACK: Error::throw > > STACK: Bio::Root::Root::throw /usr/local/share/perl5/Bio/Root/Root.pm:447 > > STACK: Bio::FeatureIO::gff::_handle_feature > /usr/local/share/perl5/Bio/FeatureIO/gff.pm:659 > > STACK: Bio::FeatureIO::gff::next_feature > /usr/local/share/perl5/Bio/FeatureIO/gff.pm:187 > > STACK: chado-1.31/load/bin/gmod_bulk_load_gff3.pl:782 > > ----------------------------------------------------------- > > 1. S.cerevisiae without dbxref > > *Command : * > > chado-1.31/load/bin/gmod_bulk_load_gff3.pl --organism yeast --gfffile > saccharomyces/test_saccharomyces_ncbi --fastafile > saccharomyces/saccharomyces_cerevisiae.gff.sorted.fasta > > > > *Output : * > > ------------- EXCEPTION: Bio::Root::Exception ------------- > > MSG: I couldn't find the 'Note' cvterm in the database; > > Did you load the feature property controlled vocabulary? > > STACK: Error::throw > > STACK: Bio::Root::Root::throw /usr/local/share/perl5/Bio/Root/Root.pm:447 > > STACK: Bio::GMOD::DB::Adapter::handle_note > /usr/local/share/perl5/Bio/GMOD/DB/Adapter.pm:3901 > > STACK: chado-1.31/load/bin/gmod_bulk_load_gff3.pl:974 > > > > 1. A GFF from NCBI > > *Command : * > > chado-1.31/load/bin/gmod_bulk_load_gff3.pl --organism bacteria --gfffile > GCF_000008865.2/GCF_003018035.1_ASM301803v1_genomic.gff.sorted > > > > *Output :* > > Unable to find srcfeature NZ_CP027390.1 in the database. > > Perhaps you need to rerun your data load with the '--recreate_cache' > option. at /usr/local/share/perl5/Bio/GMOD/DB/Adapter.pm line 4605, <GEN0> > line 2. > > > Bio::GMOD::DB::Adapter::src_second_chance('Bio::GMOD::DB::Adapter=HASH(0x241a8d0)', > 'Bio::SeqFeature::Annotated=HASH(0x26c61d0)') called at chado-1.31/load/bin/ > gmod_bulk_load_gff3.pl line 851 > > > > 1. A GFF3 from prokka > > *Command :* > > chado-1.31/load/bin/gmod_bulk_load_gff3.pl --organism bacteria -gfffile > GFF3/GCF_000455285.gff.sorted --fastafile > GFF3/GCF_000455285.gff.sorted.fasta > > *Output :* > > Unable to find srcfeature gnl|Prokka|GCF_000455285_1 in the database. > > Perhaps you need to rerun your data load with the '--recreate_cache' > option. at /usr/local/share/perl5/Bio/GMOD/DB/Adapter.pm line 4605, <GEN0> > line 2. > > > Bio::GMOD::DB::Adapter::src_second_chance('Bio::GMOD::DB::Adapter=HASH(0x33008f0)', > 'Bio::SeqFeature::Annotated=HASH(0x35ac0a0)') called at chado-1.31/load/bin/ > gmod_bulk_load_gff3.pl line 851 > > > > > > > > > _______________________________________________ > Gmod-schema mailing list > Gmo...@li... > https://lists.sourceforge.net/lists/listinfo/gmod-schema > -- ------------------------------------------------------------------------ Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research |
From: RAJAONISON A. <mir...@ya...> - 2020-01-17 13:57:46
|
Hi Scott, Thank you again for your help because adding the parent feature removes the srcfeature related error. Yet I get another issue that I don’t understand : --------------------- WARNING --------------------- MSG: Can not set Bio::Location::Simple::end() equal to start; start not set --------------------------------------------------- Use of uninitialized value $featuretype in pattern match (m//) at chado-1.31/load/bin/gmod_bulk_load_gff3.pl line 819, <GEN0> line 2. Use of uninitialized value $featuretype in pattern match (m//) at chado-1.31/load/bin/gmod_bulk_load_gff3.pl line 820, <GEN0> line 2. ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: no cvterm for STACK: Error::throw STACK: Bio::Root::Root::throw /usr/local/share/perl5/Bio/Root/Root.pm:447 STACK: Bio::GMOD::DB::Adapter::get_type /usr/local/share/perl5/Bio/GMOD/DB/Adapter.pm:4629 STACK: chado-1.31/load/bin/gmod_bulk_load_gff3.pl:848 ----------------------------------------------------------- Abnormal termination, trying to clean up... Attempting to clean up the loader temp table (so that --recreate_cache won't be needed)... Trying to remove the run lock (so that --remove_lock won't be needed)... Exiting... Thank you Miharimamy De : Scott Cain <sc...@sc...> Envoyé : jeudi 16 janvier 2020 22:31 À : RAJAONISON Andriamiharimamy <mir...@ya...> Cc : GMOD Schema/Chado List <gmo...@li...> Objet : Re: [Gmod-schema] [ISSUE] Load gff in chado Hi Miharimamy, Thanks for sending this report. Generally, loading GFF into Chado can be difficult, as the perl-based loader that you are using can be quite particular about the format of the GFF and producers of GFF generally aren't so particular. Since the loader makes the (in my view, correct) decision to not load anything if it can't load everything in a file, it quits. So, taking each of the problems you found in order: 1. "dbxref=NCBI:" isn't a valid accession (presumably there should be something after the "NCBI:") and so the loader won't continue. Two options here are to contact SGD and point out that there is a problem with their GFF, or delete the offending item and try again, which it looks like you did for item 2. 2. If the item Note is missing from the cvterm table, I think that probably means that you didn't install the schema using the "make" procedure that would have installed some necessary items in the cv and cvterm table. 3 and 4: Messages that srcfeatures can't be found mean that the feature on which the features in your GFF reside hasn't been defined yet (that is, the thing referred to in column 1 of the GFF doesn't exist). Frequently, creators of GFF don't define the reference sequence in the GFF for whatever reason (it's not required by the GFF3 spec, since it might be credibly defined elsewhere). To define it in the GFF you have, add a line before anything else that looks something like this: NZ_CP027390.1 . chromosome 1 1234566 . . . ID=NZ_CP027390.1;Name=NZ_CP027390.1 Where "NZ_CP027390.1" is the name of the reference sequence, "chromosome" is the Sequence Ontology term for the type of thing (other options would include "contig" and "supercontig"), and the "123456" is the length of the sequence. Finally, I would add that using Chado through Tripal frequently makes life easier, though I think all of these issues would still have been problems (with the probable exception of item 2--Tripal would have initialized the "Note" item in cvterm I think). Scott On Thu, Jan 16, 2020 at 11:15 AM RAJAONISON Andriamiharimamy via Gmod-schema <gmo...@li... <mailto:gmo...@li...> > wrote: Hi, Hi, Hope this mail will find you well. Send you my best wishes for this new year. I am reaching to you because I have issues loading GFF files into you Chado. I tried several files but none of them seems to work. Thank you in advance for you help Miharimamy 1. The S.cerevisiae example in gmod.org <http://gmod.org> Command : chado-1.31/load/bin/gmod_bulk_load_gff3.pl <http://gmod_bulk_load_gff3.pl> --organism yeast --gfffile saccharomyces/saccharomyces_cerevisiae.gff.sorted --fastafile saccharomyces/saccharomyces_cerevisiae.gff.sorted.fasta Output : ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: Error in line: chrIII SGD chromosome 1 316620 . . . ID=chrIII;dbxref=NCBI:;Name=chrIII Dbxref value 'NCBI:' did not conform to GFF3 specification STACK: Error::throw STACK: Bio::Root::Root::throw /usr/local/share/perl5/Bio/Root/Root.pm:447 STACK: Bio::FeatureIO::gff::_handle_feature /usr/local/share/perl5/Bio/FeatureIO/gff.pm:659 <http://gff.pm:659> STACK: Bio::FeatureIO::gff::next_feature /usr/local/share/perl5/Bio/FeatureIO/gff.pm:187 <http://gff.pm:187> STACK: chado-1.31/load/bin/gmod_bulk_load_gff3.pl:782 <http://gmod_bulk_load_gff3.pl:782> ----------------------------------------------------------- 2. S.cerevisiae without dbxref Command : chado-1.31/load/bin/gmod_bulk_load_gff3.pl <http://gmod_bulk_load_gff3.pl> --organism yeast --gfffile saccharomyces/test_saccharomyces_ncbi --fastafile saccharomyces/saccharomyces_cerevisiae.gff.sorted.fasta Output : ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: I couldn't find the 'Note' cvterm in the database; Did you load the feature property controlled vocabulary? STACK: Error::throw STACK: Bio::Root::Root::throw /usr/local/share/perl5/Bio/Root/Root.pm:447 STACK: Bio::GMOD::DB::Adapter::handle_note /usr/local/share/perl5/Bio/GMOD/DB/Adapter.pm:3901 STACK: chado-1.31/load/bin/gmod_bulk_load_gff3.pl:974 <http://gmod_bulk_load_gff3.pl:974> 3. A GFF from NCBI Command : chado-1.31/load/bin/gmod_bulk_load_gff3.pl <http://gmod_bulk_load_gff3.pl> --organism bacteria --gfffile GCF_000008865.2/GCF_003018035.1_ASM301803v1_genomic.gff.sorted Output : Unable to find srcfeature NZ_CP027390.1 in the database. Perhaps you need to rerun your data load with the '--recreate_cache' option. at /usr/local/share/perl5/Bio/GMOD/DB/Adapter.pm line 4605, <GEN0> line 2. Bio::GMOD::DB::Adapter::src_second_chance('Bio::GMOD::DB::Adapter=HASH(0x241a8d0)', 'Bio::SeqFeature::Annotated=HASH(0x26c61d0)') called at chado-1.31/load/bin/gmod_bulk_load_gff3.pl <http://gmod_bulk_load_gff3.pl> line 851 4. A GFF3 from prokka Command : chado-1.31/load/bin/gmod_bulk_load_gff3.pl <http://gmod_bulk_load_gff3.pl> --organism bacteria -gfffile GFF3/GCF_000455285.gff.sorted --fastafile GFF3/GCF_000455285.gff.sorted.fasta Output : Unable to find srcfeature gnl|Prokka|GCF_000455285_1 in the database. Perhaps you need to rerun your data load with the '--recreate_cache' option. at /usr/local/share/perl5/Bio/GMOD/DB/Adapter.pm line 4605, <GEN0> line 2. Bio::GMOD::DB::Adapter::src_second_chance('Bio::GMOD::DB::Adapter=HASH(0x33008f0)', 'Bio::SeqFeature::Annotated=HASH(0x35ac0a0)') called at chado-1.31/load/bin/gmod_bulk_load_gff3.pl <http://gmod_bulk_load_gff3.pl> line 851 _______________________________________________ Gmod-schema mailing list Gmo...@li... <mailto:Gmo...@li...> https://lists.sourceforge.net/lists/listinfo/gmod-schema -- ------------------------------------------------------------------------ Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research |
From: Scott C. <sc...@sc...> - 2020-01-17 18:49:42
|
Hi Miharimamy, The end of the error message where it says "line 2" I'm pretty sure means the problem is in line 2 of your GFF file. Is that the line you added? What does it look like? Scott On Fri, Jan 17, 2020 at 5:57 AM RAJAONISON Andriamiharimamy < mir...@ya...> wrote: > Hi Scott, > > > > Thank you again for your help because adding the parent feature removes > the srcfeature related error. > > Yet I get another issue that I don’t understand : > > > > --------------------- WARNING --------------------- > > MSG: Can not set Bio::Location::Simple::end() equal to start; start not set > > --------------------------------------------------- > > Use of uninitialized value $featuretype in pattern match (m//) at > chado-1.31/load/bin/gmod_bulk_load_gff3.pl line 819, <GEN0> line 2. > > Use of uninitialized value $featuretype in pattern match (m//) at > chado-1.31/load/bin/gmod_bulk_load_gff3.pl line 820, <GEN0> line 2. > > > > ------------- EXCEPTION: Bio::Root::Exception ------------- > > MSG: no cvterm for > > STACK: Error::throw > > STACK: Bio::Root::Root::throw /usr/local/share/perl5/Bio/Root/Root.pm:447 > > STACK: Bio::GMOD::DB::Adapter::get_type > /usr/local/share/perl5/Bio/GMOD/DB/Adapter.pm:4629 > > STACK: chado-1.31/load/bin/gmod_bulk_load_gff3.pl:848 > > ----------------------------------------------------------- > > > > Abnormal termination, trying to clean up... > > > > Attempting to clean up the loader temp table (so that --recreate_cache > > won't be needed)... > > Trying to remove the run lock (so that --remove_lock won't be needed)... > > Exiting... > > > > Thank you > > > > Miharimamy > > > > > > *De :* Scott Cain <sc...@sc...> > *Envoyé :* jeudi 16 janvier 2020 22:31 > *À :* RAJAONISON Andriamiharimamy <mir...@ya...> > *Cc :* GMOD Schema/Chado List <gmo...@li...> > *Objet :* Re: [Gmod-schema] [ISSUE] Load gff in chado > > > > Hi Miharimamy, > > > > Thanks for sending this report. Generally, loading GFF into Chado can be > difficult, as the perl-based loader that you are using can be quite > particular about the format of the GFF and producers of GFF generally > aren't so particular. Since the loader makes the (in my view, correct) > decision to not load anything if it can't load everything in a file, it > quits. So, taking each of the problems you found in order: > > > > 1. "dbxref=NCBI:" isn't a valid accession (presumably there should be > something after the "NCBI:") and so the loader won't continue. Two options > here are to contact SGD and point out that there is a problem with their > GFF, or delete the offending item and try again, which it looks like you > did for item 2. > > > > 2. If the item Note is missing from the cvterm table, I think that > probably means that you didn't install the schema using the "make" > procedure that would have installed some necessary items in the cv and > cvterm table. > > > > 3 and 4: Messages that srcfeatures can't be found mean that the feature on > which the features in your GFF reside hasn't been defined yet (that is, the > thing referred to in column 1 of the GFF doesn't exist). Frequently, > creators of GFF don't define the reference sequence in the GFF for whatever > reason (it's not required by the GFF3 spec, since it might be credibly > defined elsewhere). To define it in the GFF you have, add a line before > anything else that looks something like this: > > > > NZ_CP027390.1 . chromosome 1 1234566 . . . > ID=NZ_CP027390.1;Name=NZ_CP027390.1 > > > > Where "NZ_CP027390.1" is the name of the reference sequence, "chromosome" > is the Sequence Ontology term for the type of thing (other options would > include "contig" and "supercontig"), and the "123456" is the length of the > sequence. > > > > Finally, I would add that using Chado through Tripal frequently makes life > easier, though I think all of these issues would still have been problems > (with the probable exception of item 2--Tripal would have initialized the > "Note" item in cvterm I think). > > > > Scott > > > > > > On Thu, Jan 16, 2020 at 11:15 AM RAJAONISON Andriamiharimamy via > Gmod-schema <gmo...@li...> wrote: > > Hi, > > > > Hi, > > Hope this mail will find you well. Send you my best wishes for this new > year. > > I am reaching to you because I have issues loading GFF files into you > Chado. I tried several files but none of them seems to work. > > > > Thank you in advance for you help > > > > Miharimamy > > > > 1. The S.cerevisiae example in gmod.org > > *Command : * > > chado-1.31/load/bin/gmod_bulk_load_gff3.pl --organism yeast --gfffile > saccharomyces/saccharomyces_cerevisiae.gff.sorted --fastafile > saccharomyces/saccharomyces_cerevisiae.gff.sorted.fasta > > > > *Output : * > > ------------- EXCEPTION: Bio::Root::Exception ------------- > > MSG: Error in line: > > chrIII SGD chromosome 1 316620 . . . > ID=chrIII;dbxref=NCBI:;Name=chrIII > > > > Dbxref value 'NCBI:' did not conform to GFF3 specification > > STACK: Error::throw > > STACK: Bio::Root::Root::throw /usr/local/share/perl5/Bio/Root/Root.pm:447 > > STACK: Bio::FeatureIO::gff::_handle_feature > /usr/local/share/perl5/Bio/FeatureIO/gff.pm:659 > > STACK: Bio::FeatureIO::gff::next_feature > /usr/local/share/perl5/Bio/FeatureIO/gff.pm:187 > > STACK: chado-1.31/load/bin/gmod_bulk_load_gff3.pl:782 > > ----------------------------------------------------------- > > 1. S.cerevisiae without dbxref > > *Command : * > > chado-1.31/load/bin/gmod_bulk_load_gff3.pl --organism yeast --gfffile > saccharomyces/test_saccharomyces_ncbi --fastafile > saccharomyces/saccharomyces_cerevisiae.gff.sorted.fasta > > > > *Output : * > > ------------- EXCEPTION: Bio::Root::Exception ------------- > > MSG: I couldn't find the 'Note' cvterm in the database; > > Did you load the feature property controlled vocabulary? > > STACK: Error::throw > > STACK: Bio::Root::Root::throw /usr/local/share/perl5/Bio/Root/Root.pm:447 > > STACK: Bio::GMOD::DB::Adapter::handle_note > /usr/local/share/perl5/Bio/GMOD/DB/Adapter.pm:3901 > > STACK: chado-1.31/load/bin/gmod_bulk_load_gff3.pl:974 > > > > 1. A GFF from NCBI > > *Command : * > > chado-1.31/load/bin/gmod_bulk_load_gff3.pl --organism bacteria --gfffile > GCF_000008865.2/GCF_003018035.1_ASM301803v1_genomic.gff.sorted > > > > *Output :* > > Unable to find srcfeature NZ_CP027390.1 in the database. > > Perhaps you need to rerun your data load with the '--recreate_cache' > option. at /usr/local/share/perl5/Bio/GMOD/DB/Adapter.pm line 4605, <GEN0> > line 2. > > > Bio::GMOD::DB::Adapter::src_second_chance('Bio::GMOD::DB::Adapter=HASH(0x241a8d0)', > 'Bio::SeqFeature::Annotated=HASH(0x26c61d0)') called at chado-1.31/load/bin/ > gmod_bulk_load_gff3.pl line 851 > > > > 1. A GFF3 from prokka > > *Command :* > > chado-1.31/load/bin/gmod_bulk_load_gff3.pl --organism bacteria -gfffile > GFF3/GCF_000455285.gff.sorted --fastafile > GFF3/GCF_000455285.gff.sorted.fasta > > *Output :* > > Unable to find srcfeature gnl|Prokka|GCF_000455285_1 in the database. > > Perhaps you need to rerun your data load with the '--recreate_cache' > option. at /usr/local/share/perl5/Bio/GMOD/DB/Adapter.pm line 4605, <GEN0> > line 2. > > > Bio::GMOD::DB::Adapter::src_second_chance('Bio::GMOD::DB::Adapter=HASH(0x33008f0)', > 'Bio::SeqFeature::Annotated=HASH(0x35ac0a0)') called at chado-1.31/load/bin/ > gmod_bulk_load_gff3.pl line 851 > > > > > > > > > > _______________________________________________ > Gmod-schema mailing list > Gmo...@li... > https://lists.sourceforge.net/lists/listinfo/gmod-schema > > > > -- > > ------------------------------------------------------------------------ > Scott Cain, Ph. D. scott at scottcain > dot net > GMOD Coordinator (http://gmod.org/) 216-392-3087 > Ontario Institute for Cancer Research > -- ------------------------------------------------------------------------ Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research |
From: RAJAONISON A. <mir...@ya...> - 2020-01-20 14:54:53
|
Hi Scott, You were right, the issue lied in the line 2. The field separator was 4 spaces instead of tabulation. I converted it and tried to load the file back but went back to the first error. Thank you for your time, Miharimamy Preparing data for inserting into the chado database (This may take a while ...) Unable to find srcfeature NZ_CP027391.1 in the database. Perhaps you need to rerun your data load with the '--recreate_cache' option. at /usr/local/share/perl5/Bio/GMOD/DB/Adapter.pm line 4605, <GEN0> line 5939. Bio::GMOD::DB::Adapter::src_second_chance('Bio::GMOD::DB::Adapter=HASH(0x297c120)', 'Bio::SeqFeature::Annotated=HASH(0x2bcd420)') called at chado-1.31/load/bin/gmod_bulk_load_gff3.pl line 851 Abnormal termination, trying to clean up... Attempting to clean up the loader temp table (so that --recreate_cache won't be needed)... Trying to remove the run lock (so that --remove_lock won't be needed)... Exiting... ##gff-version 3 NZ_CP027390.1 . chromosome 1 5901472 . . . ID=NZ_CP027390.1;Name=NZ_CP027390.1 NZ_CP027390.1 RefSeq gene 931 1197 . + . ID=gene0;Name=AX062_RS00010;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS00010 NZ_CP027390.1 RefSeq gene 1191 1577 . - . ID=gene1;Name=AX062_RS00015;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS00015 NZ_CP027390.1 RefSeq gene 10194 10736 . + . ID=gene10;Name=AX062_RS00060;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS00060 NZ_CP027390.1 RefSeq gene 80547 81860 . + . ID=gene100;Name=AX062_RS00510;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS00510 NZ_CP027390.1 RefSeq gene 918692 918997 . - . ID=gene1000;Name=AX062_RS05010;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS05010 NZ_CP027390.1 RefSeq gene 919105 919815 . + . ID=gene1001;Name=AX062_RS05015;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS05015 NZ_CP027390.1 RefSeq gene 919818 920378 . - . ID=gene1002;Name=AX062_RS05020;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS05020 NZ_CP027390.1 RefSeq gene 920413 920754 . - . ID=gene1003;Name=AX062_RS05025;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS05025 NZ_CP027390.1 RefSeq gene 920889 921215 . + . ID=gene1004;Name=AX062_RS05030;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS05030 NZ_CP027390.1 RefSeq gene 921252 921440 . + . ID=gene1005;Name=AX062_RS05035;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS05035 NZ_CP027390.1 RefSeq gene 921421 922635 . + . ID=gene1006;Name=AX062_RS05040;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS05040 De : Scott Cain <sc...@sc...> Envoyé : vendredi 17 janvier 2020 21:49 À : RAJAONISON Andriamiharimamy <mir...@ya...> Cc : GMOD Schema/Chado List <gmo...@li...> Objet : Re: [Gmod-schema] [ISSUE] Load gff in chado Hi Miharimamy, The end of the error message where it says "line 2" I'm pretty sure means the problem is in line 2 of your GFF file. Is that the line you added? What does it look like? Scott On Fri, Jan 17, 2020 at 5:57 AM RAJAONISON Andriamiharimamy <mir...@ya... <mailto:mir...@ya...> > wrote: Hi Scott, Thank you again for your help because adding the parent feature removes the srcfeature related error. Yet I get another issue that I don’t understand : --------------------- WARNING --------------------- MSG: Can not set Bio::Location::Simple::end() equal to start; start not set --------------------------------------------------- Use of uninitialized value $featuretype in pattern match (m//) at chado-1.31/load/bin/gmod_bulk_load_gff3.pl <http://gmod_bulk_load_gff3.pl> line 819, <GEN0> line 2. Use of uninitialized value $featuretype in pattern match (m//) at chado-1.31/load/bin/gmod_bulk_load_gff3.pl <http://gmod_bulk_load_gff3.pl> line 820, <GEN0> line 2. ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: no cvterm for STACK: Error::throw STACK: Bio::Root::Root::throw /usr/local/share/perl5/Bio/Root/Root.pm:447 STACK: Bio::GMOD::DB::Adapter::get_type /usr/local/share/perl5/Bio/GMOD/DB/Adapter.pm:4629 STACK: chado-1.31/load/bin/gmod_bulk_load_gff3.pl:848 <http://gmod_bulk_load_gff3.pl:848> ----------------------------------------------------------- Abnormal termination, trying to clean up... Attempting to clean up the loader temp table (so that --recreate_cache won't be needed)... Trying to remove the run lock (so that --remove_lock won't be needed)... Exiting... Thank you Miharimamy De : Scott Cain <sc...@sc... <mailto:sc...@sc...> > Envoyé : jeudi 16 janvier 2020 22:31 À : RAJAONISON Andriamiharimamy <mir...@ya... <mailto:mir...@ya...> > Cc : GMOD Schema/Chado List <gmo...@li... <mailto:gmo...@li...> > Objet : Re: [Gmod-schema] [ISSUE] Load gff in chado Hi Miharimamy, Thanks for sending this report. Generally, loading GFF into Chado can be difficult, as the perl-based loader that you are using can be quite particular about the format of the GFF and producers of GFF generally aren't so particular. Since the loader makes the (in my view, correct) decision to not load anything if it can't load everything in a file, it quits. So, taking each of the problems you found in order: 1. "dbxref=NCBI:" isn't a valid accession (presumably there should be something after the "NCBI:") and so the loader won't continue. Two options here are to contact SGD and point out that there is a problem with their GFF, or delete the offending item and try again, which it looks like you did for item 2. 2. If the item Note is missing from the cvterm table, I think that probably means that you didn't install the schema using the "make" procedure that would have installed some necessary items in the cv and cvterm table. 3 and 4: Messages that srcfeatures can't be found mean that the feature on which the features in your GFF reside hasn't been defined yet (that is, the thing referred to in column 1 of the GFF doesn't exist). Frequently, creators of GFF don't define the reference sequence in the GFF for whatever reason (it's not required by the GFF3 spec, since it might be credibly defined elsewhere). To define it in the GFF you have, add a line before anything else that looks something like this: NZ_CP027390.1 . chromosome 1 1234566 . . . ID=NZ_CP027390.1;Name=NZ_CP027390.1 Where "NZ_CP027390.1" is the name of the reference sequence, "chromosome" is the Sequence Ontology term for the type of thing (other options would include "contig" and "supercontig"), and the "123456" is the length of the sequence. Finally, I would add that using Chado through Tripal frequently makes life easier, though I think all of these issues would still have been problems (with the probable exception of item 2--Tripal would have initialized the "Note" item in cvterm I think). Scott On Thu, Jan 16, 2020 at 11:15 AM RAJAONISON Andriamiharimamy via Gmod-schema <gmo...@li... <mailto:gmo...@li...> > wrote: Hi, Hi, Hope this mail will find you well. Send you my best wishes for this new year. I am reaching to you because I have issues loading GFF files into you Chado. I tried several files but none of them seems to work. Thank you in advance for you help Miharimamy 1. The S.cerevisiae example in gmod.org <http://gmod.org> Command : chado-1.31/load/bin/gmod_bulk_load_gff3.pl <http://gmod_bulk_load_gff3.pl> --organism yeast --gfffile saccharomyces/saccharomyces_cerevisiae.gff.sorted --fastafile saccharomyces/saccharomyces_cerevisiae.gff.sorted.fasta Output : ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: Error in line: chrIII SGD chromosome 1 316620 . . . ID=chrIII;dbxref=NCBI:;Name=chrIII Dbxref value 'NCBI:' did not conform to GFF3 specification STACK: Error::throw STACK: Bio::Root::Root::throw /usr/local/share/perl5/Bio/Root/Root.pm:447 STACK: Bio::FeatureIO::gff::_handle_feature /usr/local/share/perl5/Bio/FeatureIO/gff.pm:659 <http://gff.pm:659> STACK: Bio::FeatureIO::gff::next_feature /usr/local/share/perl5/Bio/FeatureIO/gff.pm:187 <http://gff.pm:187> STACK: chado-1.31/load/bin/gmod_bulk_load_gff3.pl:782 <http://gmod_bulk_load_gff3.pl:782> ----------------------------------------------------------- 2. S.cerevisiae without dbxref Command : chado-1.31/load/bin/gmod_bulk_load_gff3.pl <http://gmod_bulk_load_gff3.pl> --organism yeast --gfffile saccharomyces/test_saccharomyces_ncbi --fastafile saccharomyces/saccharomyces_cerevisiae.gff.sorted.fasta Output : ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: I couldn't find the 'Note' cvterm in the database; Did you load the feature property controlled vocabulary? STACK: Error::throw STACK: Bio::Root::Root::throw /usr/local/share/perl5/Bio/Root/Root.pm:447 STACK: Bio::GMOD::DB::Adapter::handle_note /usr/local/share/perl5/Bio/GMOD/DB/Adapter.pm:3901 STACK: chado-1.31/load/bin/gmod_bulk_load_gff3.pl:974 <http://gmod_bulk_load_gff3.pl:974> 3. A GFF from NCBI Command : chado-1.31/load/bin/gmod_bulk_load_gff3.pl <http://gmod_bulk_load_gff3.pl> --organism bacteria --gfffile GCF_000008865.2/GCF_003018035.1_ASM301803v1_genomic.gff.sorted Output : Unable to find srcfeature NZ_CP027390.1 in the database. Perhaps you need to rerun your data load with the '--recreate_cache' option. at /usr/local/share/perl5/Bio/GMOD/DB/Adapter.pm line 4605, <GEN0> line 2. Bio::GMOD::DB::Adapter::src_second_chance('Bio::GMOD::DB::Adapter=HASH(0x241a8d0)', 'Bio::SeqFeature::Annotated=HASH(0x26c61d0)') called at chado-1.31/load/bin/gmod_bulk_load_gff3.pl <http://gmod_bulk_load_gff3.pl> line 851 4. A GFF3 from prokka Command : chado-1.31/load/bin/gmod_bulk_load_gff3.pl <http://gmod_bulk_load_gff3.pl> --organism bacteria -gfffile GFF3/GCF_000455285.gff.sorted --fastafile GFF3/GCF_000455285.gff.sorted.fasta Output : Unable to find srcfeature gnl|Prokka|GCF_000455285_1 in the database. Perhaps you need to rerun your data load with the '--recreate_cache' option. at /usr/local/share/perl5/Bio/GMOD/DB/Adapter.pm line 4605, <GEN0> line 2. Bio::GMOD::DB::Adapter::src_second_chance('Bio::GMOD::DB::Adapter=HASH(0x33008f0)', 'Bio::SeqFeature::Annotated=HASH(0x35ac0a0)') called at chado-1.31/load/bin/gmod_bulk_load_gff3.pl <http://gmod_bulk_load_gff3.pl> line 851 _______________________________________________ Gmod-schema mailing list Gmo...@li... <mailto:Gmo...@li...> https://lists.sourceforge.net/lists/listinfo/gmod-schema -- ------------------------------------------------------------------------ Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research -- ------------------------------------------------------------------------ Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research |
From: Scott C. <sc...@sc...> - 2020-01-20 18:21:41
|
Hi Miharimamy, Did you try running with the --recreate_cache option as suggested by the error message? On Mon, Jan 20, 2020 at 6:54 AM RAJAONISON Andriamiharimamy < mir...@ya...> wrote: > Hi Scott, > > You were right, the issue lied in the line 2. The field separator was 4 > spaces instead of tabulation. > > I converted it and tried to load the file back but went back to the first > error. > > > > Thank you for your time, > > Miharimamy > > > > Preparing data for inserting into the chado database > > (This may take a while ...) > > Unable to find srcfeature NZ_CP027391.1 in the database. > > Perhaps you need to rerun your data load with the '--recreate_cache' > option. at /usr/local/share/perl5/Bio/GMOD/DB/Adapter.pm line 4605, <GEN0> > line 5939. > > > Bio::GMOD::DB::Adapter::src_second_chance('Bio::GMOD::DB::Adapter=HASH(0x297c120)', > 'Bio::SeqFeature::Annotated=HASH(0x2bcd420)') called at chado-1.31/load/bin/ > gmod_bulk_load_gff3.pl line 851 > > > > Abnormal termination, trying to clean up... > > > > Attempting to clean up the loader temp table (so that --recreate_cache > > won't be needed)... > > Trying to remove the run lock (so that --remove_lock won't be needed)... > > Exiting... > > > > > > ##gff-version 3 > > NZ_CP027390.1 . chromosome 1 5901472 . . > . ID=NZ_CP027390.1;Name=NZ_CP027390.1 > > NZ_CP027390.1 RefSeq gene 931 1197 . + . > ID=gene0;Name=AX062_RS00010;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS00010 > > NZ_CP027390.1 RefSeq gene 1191 1577 . - . > ID=gene1;Name=AX062_RS00015;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS00015 > > NZ_CP027390.1 RefSeq gene 10194 10736 . + . > ID=gene10;Name=AX062_RS00060;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS00060 > > NZ_CP027390.1 RefSeq gene 80547 81860 . + . > ID=gene100;Name=AX062_RS00510;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS00510 > > NZ_CP027390.1 RefSeq gene 918692 918997 . - . > ID=gene1000;Name=AX062_RS05010;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS05010 > > NZ_CP027390.1 RefSeq gene 919105 919815 . + . > ID=gene1001;Name=AX062_RS05015;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS05015 > > NZ_CP027390.1 RefSeq gene 919818 920378 . - . > ID=gene1002;Name=AX062_RS05020;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS05020 > > NZ_CP027390.1 RefSeq gene 920413 920754 . - . > ID=gene1003;Name=AX062_RS05025;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS05025 > > NZ_CP027390.1 RefSeq gene 920889 921215 . + . > ID=gene1004;Name=AX062_RS05030;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS05030 > > NZ_CP027390.1 RefSeq gene 921252 921440 . + . > ID=gene1005;Name=AX062_RS05035;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS05035 > > NZ_CP027390.1 RefSeq gene 921421 922635 . + . > ID=gene1006;Name=AX062_RS05040;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS05040 > > > > > > *De :* Scott Cain <sc...@sc...> > *Envoyé :* vendredi 17 janvier 2020 21:49 > *À :* RAJAONISON Andriamiharimamy <mir...@ya...> > *Cc :* GMOD Schema/Chado List <gmo...@li...> > *Objet :* Re: [Gmod-schema] [ISSUE] Load gff in chado > > > > Hi Miharimamy, > > > > The end of the error message where it says "line 2" I'm pretty sure means > the problem is in line 2 of your GFF file. Is that the line you added? > What does it look like? > > > > Scott > > > > > > On Fri, Jan 17, 2020 at 5:57 AM RAJAONISON Andriamiharimamy < > mir...@ya...> wrote: > > Hi Scott, > > > > Thank you again for your help because adding the parent feature removes > the srcfeature related error. > > Yet I get another issue that I don’t understand : > > > > --------------------- WARNING --------------------- > > MSG: Can not set Bio::Location::Simple::end() equal to start; start not set > > --------------------------------------------------- > > Use of uninitialized value $featuretype in pattern match (m//) at > chado-1.31/load/bin/gmod_bulk_load_gff3.pl line 819, <GEN0> line 2. > > Use of uninitialized value $featuretype in pattern match (m//) at > chado-1.31/load/bin/gmod_bulk_load_gff3.pl line 820, <GEN0> line 2. > > > > ------------- EXCEPTION: Bio::Root::Exception ------------- > > MSG: no cvterm for > > STACK: Error::throw > > STACK: Bio::Root::Root::throw /usr/local/share/perl5/Bio/Root/Root.pm:447 > > STACK: Bio::GMOD::DB::Adapter::get_type > /usr/local/share/perl5/Bio/GMOD/DB/Adapter.pm:4629 > > STACK: chado-1.31/load/bin/gmod_bulk_load_gff3.pl:848 > > ----------------------------------------------------------- > > > > Abnormal termination, trying to clean up... > > > > Attempting to clean up the loader temp table (so that --recreate_cache > > won't be needed)... > > Trying to remove the run lock (so that --remove_lock won't be needed)... > > Exiting... > > > > Thank you > > > > Miharimamy > > > > > > *De :* Scott Cain <sc...@sc...> > *Envoyé :* jeudi 16 janvier 2020 22:31 > *À :* RAJAONISON Andriamiharimamy <mir...@ya...> > *Cc :* GMOD Schema/Chado List <gmo...@li...> > *Objet :* Re: [Gmod-schema] [ISSUE] Load gff in chado > > > > Hi Miharimamy, > > > > Thanks for sending this report. Generally, loading GFF into Chado can be > difficult, as the perl-based loader that you are using can be quite > particular about the format of the GFF and producers of GFF generally > aren't so particular. Since the loader makes the (in my view, correct) > decision to not load anything if it can't load everything in a file, it > quits. So, taking each of the problems you found in order: > > > > 1. "dbxref=NCBI:" isn't a valid accession (presumably there should be > something after the "NCBI:") and so the loader won't continue. Two options > here are to contact SGD and point out that there is a problem with their > GFF, or delete the offending item and try again, which it looks like you > did for item 2. > > > > 2. If the item Note is missing from the cvterm table, I think that > probably means that you didn't install the schema using the "make" > procedure that would have installed some necessary items in the cv and > cvterm table. > > > > 3 and 4: Messages that srcfeatures can't be found mean that the feature on > which the features in your GFF reside hasn't been defined yet (that is, the > thing referred to in column 1 of the GFF doesn't exist). Frequently, > creators of GFF don't define the reference sequence in the GFF for whatever > reason (it's not required by the GFF3 spec, since it might be credibly > defined elsewhere). To define it in the GFF you have, add a line before > anything else that looks something like this: > > > > NZ_CP027390.1 . chromosome 1 1234566 . . . > ID=NZ_CP027390.1;Name=NZ_CP027390.1 > > > > Where "NZ_CP027390.1" is the name of the reference sequence, "chromosome" > is the Sequence Ontology term for the type of thing (other options would > include "contig" and "supercontig"), and the "123456" is the length of the > sequence. > > > > Finally, I would add that using Chado through Tripal frequently makes life > easier, though I think all of these issues would still have been problems > (with the probable exception of item 2--Tripal would have initialized the > "Note" item in cvterm I think). > > > > Scott > > > > > > On Thu, Jan 16, 2020 at 11:15 AM RAJAONISON Andriamiharimamy via > Gmod-schema <gmo...@li...> wrote: > > Hi, > > > > Hi, > > Hope this mail will find you well. Send you my best wishes for this new > year. > > I am reaching to you because I have issues loading GFF files into you > Chado. I tried several files but none of them seems to work. > > > > Thank you in advance for you help > > > > Miharimamy > > > > 1. The S.cerevisiae example in gmod.org > > *Command : * > > chado-1.31/load/bin/gmod_bulk_load_gff3.pl --organism yeast --gfffile > saccharomyces/saccharomyces_cerevisiae.gff.sorted --fastafile > saccharomyces/saccharomyces_cerevisiae.gff.sorted.fasta > > > > *Output : * > > ------------- EXCEPTION: Bio::Root::Exception ------------- > > MSG: Error in line: > > chrIII SGD chromosome 1 316620 . . . > ID=chrIII;dbxref=NCBI:;Name=chrIII > > > > Dbxref value 'NCBI:' did not conform to GFF3 specification > > STACK: Error::throw > > STACK: Bio::Root::Root::throw /usr/local/share/perl5/Bio/Root/Root.pm:447 > > STACK: Bio::FeatureIO::gff::_handle_feature > /usr/local/share/perl5/Bio/FeatureIO/gff.pm:659 > > STACK: Bio::FeatureIO::gff::next_feature > /usr/local/share/perl5/Bio/FeatureIO/gff.pm:187 > > STACK: chado-1.31/load/bin/gmod_bulk_load_gff3.pl:782 > > ----------------------------------------------------------- > > 1. S.cerevisiae without dbxref > > *Command : * > > chado-1.31/load/bin/gmod_bulk_load_gff3.pl --organism yeast --gfffile > saccharomyces/test_saccharomyces_ncbi --fastafile > saccharomyces/saccharomyces_cerevisiae.gff.sorted.fasta > > > > *Output : * > > ------------- EXCEPTION: Bio::Root::Exception ------------- > > MSG: I couldn't find the 'Note' cvterm in the database; > > Did you load the feature property controlled vocabulary? > > STACK: Error::throw > > STACK: Bio::Root::Root::throw /usr/local/share/perl5/Bio/Root/Root.pm:447 > > STACK: Bio::GMOD::DB::Adapter::handle_note > /usr/local/share/perl5/Bio/GMOD/DB/Adapter.pm:3901 > > STACK: chado-1.31/load/bin/gmod_bulk_load_gff3.pl:974 > > > > 1. A GFF from NCBI > > *Command : * > > chado-1.31/load/bin/gmod_bulk_load_gff3.pl --organism bacteria --gfffile > GCF_000008865.2/GCF_003018035.1_ASM301803v1_genomic.gff.sorted > > > > *Output :* > > Unable to find srcfeature NZ_CP027390.1 in the database. > > Perhaps you need to rerun your data load with the '--recreate_cache' > option. at /usr/local/share/perl5/Bio/GMOD/DB/Adapter.pm line 4605, <GEN0> > line 2. > > > Bio::GMOD::DB::Adapter::src_second_chance('Bio::GMOD::DB::Adapter=HASH(0x241a8d0)', > 'Bio::SeqFeature::Annotated=HASH(0x26c61d0)') called at chado-1.31/load/bin/ > gmod_bulk_load_gff3.pl line 851 > > > > 1. A GFF3 from prokka > > *Command :* > > chado-1.31/load/bin/gmod_bulk_load_gff3.pl --organism bacteria -gfffile > GFF3/GCF_000455285.gff.sorted --fastafile > GFF3/GCF_000455285.gff.sorted.fasta > > *Output :* > > Unable to find srcfeature gnl|Prokka|GCF_000455285_1 in the database. > > Perhaps you need to rerun your data load with the '--recreate_cache' > option. at /usr/local/share/perl5/Bio/GMOD/DB/Adapter.pm line 4605, <GEN0> > line 2. > > > Bio::GMOD::DB::Adapter::src_second_chance('Bio::GMOD::DB::Adapter=HASH(0x33008f0)', > 'Bio::SeqFeature::Annotated=HASH(0x35ac0a0)') called at chado-1.31/load/bin/ > gmod_bulk_load_gff3.pl line 851 > > > > > > > > > > _______________________________________________ > Gmod-schema mailing list > Gmo...@li... > https://lists.sourceforge.net/lists/listinfo/gmod-schema > > > > -- > > ------------------------------------------------------------------------ > Scott Cain, Ph. D. scott at scottcain > dot net > GMOD Coordinator (http://gmod.org/) 216-392-3087 > Ontario Institute for Cancer Research > > > > -- > > ------------------------------------------------------------------------ > Scott Cain, Ph. D. scott at scottcain > dot net > GMOD Coordinator (http://gmod.org/) 216-392-3087 > Ontario Institute for Cancer Research > -- ------------------------------------------------------------------------ Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research |
From: RAJAONISON A. <mir...@ya...> - 2020-01-20 18:59:08
|
Hi Scott, Even with the --recreate_cache and --remove_lock, I still get the same error. Miharimamy (Re)creating the uniquename cache in the database... Creating table... Populating table... Creating indexes... Adjusting the primary key sequences (if necessary)...Done. Preparing data for inserting into the chado database (This may take a while ...) Unable to find srcfeature NZ_CP027391.1 in the database. Perhaps you need to rerun your data load with the '--recreate_cache' option. at /usr/local/share/perl5/Bio/GMOD/DB/Adapter.pm line 4605, <GEN0> line 5939. Bio::GMOD::DB::Adapter::src_second_chance('Bio::GMOD::DB::Adapter=HASH(0x25623e0)', 'Bio::SeqFeature::Annotated=HASH(0x2680c18)') called at chado-1.31/load/bin/gmod_bulk_load_gff3.pl line 851 Abnormal termination, trying to clean up... Attempting to clean up the loader temp table (so that --recreate_cache won't be needed)... Trying to remove the run lock (so that --remove_lock won't be needed)... Exiting... De : Scott Cain <sc...@sc...> Envoyé : lundi 20 janvier 2020 20:56 À : RAJAONISON Andriamiharimamy <mir...@ya...> Cc : GMOD Schema/Chado List <gmo...@li...> Objet : Re: [Gmod-schema] [ISSUE] Load gff in chado Hi Miharimamy, Did you try running with the --recreate_cache option as suggested by the error message? On Mon, Jan 20, 2020 at 6:54 AM RAJAONISON Andriamiharimamy <mir...@ya... <mailto:mir...@ya...> > wrote: Hi Scott, You were right, the issue lied in the line 2. The field separator was 4 spaces instead of tabulation. I converted it and tried to load the file back but went back to the first error. Thank you for your time, Miharimamy Preparing data for inserting into the chado database (This may take a while ...) Unable to find srcfeature NZ_CP027391.1 in the database. Perhaps you need to rerun your data load with the '--recreate_cache' option. at /usr/local/share/perl5/Bio/GMOD/DB/Adapter.pm line 4605, <GEN0> line 5939. Bio::GMOD::DB::Adapter::src_second_chance('Bio::GMOD::DB::Adapter=HASH(0x297c120)', 'Bio::SeqFeature::Annotated=HASH(0x2bcd420)') called at chado-1.31/load/bin/gmod_bulk_load_gff3.pl <http://gmod_bulk_load_gff3.pl> line 851 Abnormal termination, trying to clean up... Attempting to clean up the loader temp table (so that --recreate_cache won't be needed)... Trying to remove the run lock (so that --remove_lock won't be needed)... Exiting... ##gff-version 3 NZ_CP027390.1 . chromosome 1 5901472 . . . ID=NZ_CP027390.1;Name=NZ_CP027390.1 NZ_CP027390.1 RefSeq gene 931 1197 . + . ID=gene0;Name=AX062_RS00010;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS00010 NZ_CP027390.1 RefSeq gene 1191 1577 . - . ID=gene1;Name=AX062_RS00015;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS00015 NZ_CP027390.1 RefSeq gene 10194 10736 . + . ID=gene10;Name=AX062_RS00060;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS00060 NZ_CP027390.1 RefSeq gene 80547 81860 . + . ID=gene100;Name=AX062_RS00510;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS00510 NZ_CP027390.1 RefSeq gene 918692 918997 . - . ID=gene1000;Name=AX062_RS05010;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS05010 NZ_CP027390.1 RefSeq gene 919105 919815 . + . ID=gene1001;Name=AX062_RS05015;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS05015 NZ_CP027390.1 RefSeq gene 919818 920378 . - . ID=gene1002;Name=AX062_RS05020;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS05020 NZ_CP027390.1 RefSeq gene 920413 920754 . - . ID=gene1003;Name=AX062_RS05025;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS05025 NZ_CP027390.1 RefSeq gene 920889 921215 . + . ID=gene1004;Name=AX062_RS05030;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS05030 NZ_CP027390.1 RefSeq gene 921252 921440 . + . ID=gene1005;Name=AX062_RS05035;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS05035 NZ_CP027390.1 RefSeq gene 921421 922635 . + . ID=gene1006;Name=AX062_RS05040;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS05040 De : Scott Cain <sc...@sc... <mailto:sc...@sc...> > Envoyé : vendredi 17 janvier 2020 21:49 À : RAJAONISON Andriamiharimamy <mir...@ya... <mailto:mir...@ya...> > Cc : GMOD Schema/Chado List <gmo...@li... <mailto:gmo...@li...> > Objet : Re: [Gmod-schema] [ISSUE] Load gff in chado Hi Miharimamy, The end of the error message where it says "line 2" I'm pretty sure means the problem is in line 2 of your GFF file. Is that the line you added? What does it look like? Scott On Fri, Jan 17, 2020 at 5:57 AM RAJAONISON Andriamiharimamy <mir...@ya... <mailto:mir...@ya...> > wrote: Hi Scott, Thank you again for your help because adding the parent feature removes the srcfeature related error. Yet I get another issue that I don’t understand : --------------------- WARNING --------------------- MSG: Can not set Bio::Location::Simple::end() equal to start; start not set --------------------------------------------------- Use of uninitialized value $featuretype in pattern match (m//) at chado-1.31/load/bin/gmod_bulk_load_gff3.pl <http://gmod_bulk_load_gff3.pl> line 819, <GEN0> line 2. Use of uninitialized value $featuretype in pattern match (m//) at chado-1.31/load/bin/gmod_bulk_load_gff3.pl <http://gmod_bulk_load_gff3.pl> line 820, <GEN0> line 2. ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: no cvterm for STACK: Error::throw STACK: Bio::Root::Root::throw /usr/local/share/perl5/Bio/Root/Root.pm:447 STACK: Bio::GMOD::DB::Adapter::get_type /usr/local/share/perl5/Bio/GMOD/DB/Adapter.pm:4629 STACK: chado-1.31/load/bin/gmod_bulk_load_gff3.pl:848 <http://gmod_bulk_load_gff3.pl:848> ----------------------------------------------------------- Abnormal termination, trying to clean up... Attempting to clean up the loader temp table (so that --recreate_cache won't be needed)... Trying to remove the run lock (so that --remove_lock won't be needed)... Exiting... Thank you Miharimamy De : Scott Cain <sc...@sc... <mailto:sc...@sc...> > Envoyé : jeudi 16 janvier 2020 22:31 À : RAJAONISON Andriamiharimamy <mir...@ya... <mailto:mir...@ya...> > Cc : GMOD Schema/Chado List <gmo...@li... <mailto:gmo...@li...> > Objet : Re: [Gmod-schema] [ISSUE] Load gff in chado Hi Miharimamy, Thanks for sending this report. Generally, loading GFF into Chado can be difficult, as the perl-based loader that you are using can be quite particular about the format of the GFF and producers of GFF generally aren't so particular. Since the loader makes the (in my view, correct) decision to not load anything if it can't load everything in a file, it quits. So, taking each of the problems you found in order: 1. "dbxref=NCBI:" isn't a valid accession (presumably there should be something after the "NCBI:") and so the loader won't continue. Two options here are to contact SGD and point out that there is a problem with their GFF, or delete the offending item and try again, which it looks like you did for item 2. 2. If the item Note is missing from the cvterm table, I think that probably means that you didn't install the schema using the "make" procedure that would have installed some necessary items in the cv and cvterm table. 3 and 4: Messages that srcfeatures can't be found mean that the feature on which the features in your GFF reside hasn't been defined yet (that is, the thing referred to in column 1 of the GFF doesn't exist). Frequently, creators of GFF don't define the reference sequence in the GFF for whatever reason (it's not required by the GFF3 spec, since it might be credibly defined elsewhere). To define it in the GFF you have, add a line before anything else that looks something like this: NZ_CP027390.1 . chromosome 1 1234566 . . . ID=NZ_CP027390.1;Name=NZ_CP027390.1 Where "NZ_CP027390.1" is the name of the reference sequence, "chromosome" is the Sequence Ontology term for the type of thing (other options would include "contig" and "supercontig"), and the "123456" is the length of the sequence. Finally, I would add that using Chado through Tripal frequently makes life easier, though I think all of these issues would still have been problems (with the probable exception of item 2--Tripal would have initialized the "Note" item in cvterm I think). Scott On Thu, Jan 16, 2020 at 11:15 AM RAJAONISON Andriamiharimamy via Gmod-schema <gmo...@li... <mailto:gmo...@li...> > wrote: Hi, Hi, Hope this mail will find you well. Send you my best wishes for this new year. I am reaching to you because I have issues loading GFF files into you Chado. I tried several files but none of them seems to work. Thank you in advance for you help Miharimamy 1. The S.cerevisiae example in gmod.org <http://gmod.org> Command : chado-1.31/load/bin/gmod_bulk_load_gff3.pl <http://gmod_bulk_load_gff3.pl> --organism yeast --gfffile saccharomyces/saccharomyces_cerevisiae.gff.sorted --fastafile saccharomyces/saccharomyces_cerevisiae.gff.sorted.fasta Output : ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: Error in line: chrIII SGD chromosome 1 316620 . . . ID=chrIII;dbxref=NCBI:;Name=chrIII Dbxref value 'NCBI:' did not conform to GFF3 specification STACK: Error::throw STACK: Bio::Root::Root::throw /usr/local/share/perl5/Bio/Root/Root.pm:447 STACK: Bio::FeatureIO::gff::_handle_feature /usr/local/share/perl5/Bio/FeatureIO/gff.pm:659 <http://gff.pm:659> STACK: Bio::FeatureIO::gff::next_feature /usr/local/share/perl5/Bio/FeatureIO/gff.pm:187 <http://gff.pm:187> STACK: chado-1.31/load/bin/gmod_bulk_load_gff3.pl:782 <http://gmod_bulk_load_gff3.pl:782> ----------------------------------------------------------- 2. S.cerevisiae without dbxref Command : chado-1.31/load/bin/gmod_bulk_load_gff3.pl <http://gmod_bulk_load_gff3.pl> --organism yeast --gfffile saccharomyces/test_saccharomyces_ncbi --fastafile saccharomyces/saccharomyces_cerevisiae.gff.sorted.fasta Output : ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: I couldn't find the 'Note' cvterm in the database; Did you load the feature property controlled vocabulary? STACK: Error::throw STACK: Bio::Root::Root::throw /usr/local/share/perl5/Bio/Root/Root.pm:447 STACK: Bio::GMOD::DB::Adapter::handle_note /usr/local/share/perl5/Bio/GMOD/DB/Adapter.pm:3901 STACK: chado-1.31/load/bin/gmod_bulk_load_gff3.pl:974 <http://gmod_bulk_load_gff3.pl:974> 3. A GFF from NCBI Command : chado-1.31/load/bin/gmod_bulk_load_gff3.pl <http://gmod_bulk_load_gff3.pl> --organism bacteria --gfffile GCF_000008865.2/GCF_003018035.1_ASM301803v1_genomic.gff.sorted Output : Unable to find srcfeature NZ_CP027390.1 in the database. Perhaps you need to rerun your data load with the '--recreate_cache' option. at /usr/local/share/perl5/Bio/GMOD/DB/Adapter.pm line 4605, <GEN0> line 2. Bio::GMOD::DB::Adapter::src_second_chance('Bio::GMOD::DB::Adapter=HASH(0x241a8d0)', 'Bio::SeqFeature::Annotated=HASH(0x26c61d0)') called at chado-1.31/load/bin/gmod_bulk_load_gff3.pl <http://gmod_bulk_load_gff3.pl> line 851 4. A GFF3 from prokka Command : chado-1.31/load/bin/gmod_bulk_load_gff3.pl <http://gmod_bulk_load_gff3.pl> --organism bacteria -gfffile GFF3/GCF_000455285.gff.sorted --fastafile GFF3/GCF_000455285.gff.sorted.fasta Output : Unable to find srcfeature gnl|Prokka|GCF_000455285_1 in the database. Perhaps you need to rerun your data load with the '--recreate_cache' option. at /usr/local/share/perl5/Bio/GMOD/DB/Adapter.pm line 4605, <GEN0> line 2. Bio::GMOD::DB::Adapter::src_second_chance('Bio::GMOD::DB::Adapter=HASH(0x33008f0)', 'Bio::SeqFeature::Annotated=HASH(0x35ac0a0)') called at chado-1.31/load/bin/gmod_bulk_load_gff3.pl <http://gmod_bulk_load_gff3.pl> line 851 _______________________________________________ Gmod-schema mailing list Gmo...@li... <mailto:Gmo...@li...> https://lists.sourceforge.net/lists/listinfo/gmod-schema -- ------------------------------------------------------------------------ Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research -- ------------------------------------------------------------------------ Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research -- ------------------------------------------------------------------------ Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research |
From: Scott C. <sc...@sc...> - 2020-01-20 19:10:58
|
Hi Miharimamy, The snippet of GFF that you posted had a chromosome named "NZ_CP027390.1" which looks fine to me, but the error message is about a reference sequence named "NZ_CP027391.1". Is it defined anywhere in your GFF? Scott On Mon, Jan 20, 2020 at 10:58 AM RAJAONISON Andriamiharimamy < mir...@ya...> wrote: > > Hi Scott, > > > > Even with the --recreate_cache and --remove_lock, I still get the same error. > > > > Miharimamy > > > > > > (Re)creating the uniquename cache in the database... > > Creating table... > > Populating table... > > Creating indexes... > > Adjusting the primary key sequences (if necessary)...Done. > > Preparing data for inserting into the chado database > > (This may take a while ...) > > Unable to find srcfeature NZ_CP027391.1 in the database. > > Perhaps you need to rerun your data load with the '--recreate_cache' option. at /usr/local/share/perl5/Bio/GMOD/DB/Adapter.pm line 4605, <GEN0> line 5939. > > Bio::GMOD::DB::Adapter::src_second_chance('Bio::GMOD::DB::Adapter=HASH(0x25623e0)', 'Bio::SeqFeature::Annotated=HASH(0x2680c18)') called at chado-1.31/load/bin/ gmod_bulk_load_gff3.pl line 851 > > > > Abnormal termination, trying to clean up... > > > > Attempting to clean up the loader temp table (so that --recreate_cache > > won't be needed)... > > Trying to remove the run lock (so that --remove_lock won't be needed)... > > Exiting... > > > > > > > > De : Scott Cain <sc...@sc...> > Envoyé : lundi 20 janvier 2020 20:56 > À : RAJAONISON Andriamiharimamy <mir...@ya...> > Cc : GMOD Schema/Chado List <gmo...@li...> > Objet : Re: [Gmod-schema] [ISSUE] Load gff in chado > > > > Hi Miharimamy, > > > > Did you try running with the --recreate_cache option as suggested by the error message? > > > > On Mon, Jan 20, 2020 at 6:54 AM RAJAONISON Andriamiharimamy < mir...@ya...> wrote: > > Hi Scott, > > You were right, the issue lied in the line 2. The field separator was 4 spaces instead of tabulation. > > I converted it and tried to load the file back but went back to the first error. > > > > Thank you for your time, > > Miharimamy > > > > Preparing data for inserting into the chado database > > (This may take a while ...) > > Unable to find srcfeature NZ_CP027391.1 in the database. > > Perhaps you need to rerun your data load with the '--recreate_cache' option. at /usr/local/share/perl5/Bio/GMOD/DB/Adapter.pm line 4605, <GEN0> line 5939. > > Bio::GMOD::DB::Adapter::src_second_chance('Bio::GMOD::DB::Adapter=HASH(0x297c120)', 'Bio::SeqFeature::Annotated=HASH(0x2bcd420)') called at chado-1.31/load/bin/ gmod_bulk_load_gff3.pl line 851 > > > > Abnormal termination, trying to clean up... > > > > Attempting to clean up the loader temp table (so that --recreate_cache > > won't be needed)... > > Trying to remove the run lock (so that --remove_lock won't be needed)... > > Exiting... > > > > > > ##gff-version 3 > > NZ_CP027390.1 . chromosome 1 5901472 . . . ID=NZ_CP027390.1;Name=NZ_CP027390.1 > > NZ_CP027390.1 RefSeq gene 931 1197 . + . ID=gene0;Name=AX062_RS00010;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS00010 > > NZ_CP027390.1 RefSeq gene 1191 1577 . - . ID=gene1;Name=AX062_RS00015;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS00015 > > NZ_CP027390.1 RefSeq gene 10194 10736 . + . ID=gene10;Name=AX062_RS00060;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS00060 > > NZ_CP027390.1 RefSeq gene 80547 81860 . + . ID=gene100;Name=AX062_RS00510;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS00510 > > NZ_CP027390.1 RefSeq gene 918692 918997 . - . ID=gene1000;Name=AX062_RS05010;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS05010 > > NZ_CP027390.1 RefSeq gene 919105 919815 . + . ID=gene1001;Name=AX062_RS05015;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS05015 > > NZ_CP027390.1 RefSeq gene 919818 920378 . - . ID=gene1002;Name=AX062_RS05020;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS05020 > > NZ_CP027390.1 RefSeq gene 920413 920754 . - . ID=gene1003;Name=AX062_RS05025;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS05025 > > NZ_CP027390.1 RefSeq gene 920889 921215 . + . ID=gene1004;Name=AX062_RS05030;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS05030 > > NZ_CP027390.1 RefSeq gene 921252 921440 . + . ID=gene1005;Name=AX062_RS05035;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS05035 > > NZ_CP027390.1 RefSeq gene 921421 922635 . + . ID=gene1006;Name=AX062_RS05040;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS05040 > > > > > > De : Scott Cain <sc...@sc...> > Envoyé : vendredi 17 janvier 2020 21:49 > À : RAJAONISON Andriamiharimamy <mir...@ya...> > Cc : GMOD Schema/Chado List <gmo...@li...> > Objet : Re: [Gmod-schema] [ISSUE] Load gff in chado > > > > Hi Miharimamy, > > > > The end of the error message where it says "line 2" I'm pretty sure means the problem is in line 2 of your GFF file. Is that the line you added? What does it look like? > > > > Scott > > > > > > On Fri, Jan 17, 2020 at 5:57 AM RAJAONISON Andriamiharimamy < mir...@ya...> wrote: > > Hi Scott, > > > > Thank you again for your help because adding the parent feature removes the srcfeature related error. > > Yet I get another issue that I don’t understand : > > > > --------------------- WARNING --------------------- > > MSG: Can not set Bio::Location::Simple::end() equal to start; start not set > > --------------------------------------------------- > > Use of uninitialized value $featuretype in pattern match (m//) at chado-1.31/load/bin/gmod_bulk_load_gff3.pl line 819, <GEN0> line 2. > > Use of uninitialized value $featuretype in pattern match (m//) at chado-1.31/load/bin/gmod_bulk_load_gff3.pl line 820, <GEN0> line 2. > > > > ------------- EXCEPTION: Bio::Root::Exception ------------- > > MSG: no cvterm for > > STACK: Error::throw > > STACK: Bio::Root::Root::throw /usr/local/share/perl5/Bio/Root/Root.pm:447 > > STACK: Bio::GMOD::DB::Adapter::get_type /usr/local/share/perl5/Bio/GMOD/DB/Adapter.pm:4629 > > STACK: chado-1.31/load/bin/gmod_bulk_load_gff3.pl:848 > > ----------------------------------------------------------- > > > > Abnormal termination, trying to clean up... > > > > Attempting to clean up the loader temp table (so that --recreate_cache > > won't be needed)... > > Trying to remove the run lock (so that --remove_lock won't be needed)... > > Exiting... > > > > Thank you > > > > Miharimamy > > > > > > De : Scott Cain <sc...@sc...> > Envoyé : jeudi 16 janvier 2020 22:31 > À : RAJAONISON Andriamiharimamy <mir...@ya...> > Cc : GMOD Schema/Chado List <gmo...@li...> > Objet : Re: [Gmod-schema] [ISSUE] Load gff in chado > > > > Hi Miharimamy, > > > > Thanks for sending this report. Generally, loading GFF into Chado can be difficult, as the perl-based loader that you are using can be quite particular about the format of the GFF and producers of GFF generally aren't so particular. Since the loader makes the (in my view, correct) decision to not load anything if it can't load everything in a file, it quits. So, taking each of the problems you found in order: > > > > 1. "dbxref=NCBI:" isn't a valid accession (presumably there should be something after the "NCBI:") and so the loader won't continue. Two options here are to contact SGD and point out that there is a problem with their GFF, or delete the offending item and try again, which it looks like you did for item 2. > > > > 2. If the item Note is missing from the cvterm table, I think that probably means that you didn't install the schema using the "make" procedure that would have installed some necessary items in the cv and cvterm table. > > > > 3 and 4: Messages that srcfeatures can't be found mean that the feature on which the features in your GFF reside hasn't been defined yet (that is, the thing referred to in column 1 of the GFF doesn't exist). Frequently, creators of GFF don't define the reference sequence in the GFF for whatever reason (it's not required by the GFF3 spec, since it might be credibly defined elsewhere). To define it in the GFF you have, add a line before anything else that looks something like this: > > > > NZ_CP027390.1 . chromosome 1 1234566 . . . ID=NZ_CP027390.1;Name=NZ_CP027390.1 > > > > Where "NZ_CP027390.1" is the name of the reference sequence, "chromosome" is the Sequence Ontology term for the type of thing (other options would include "contig" and "supercontig"), and the "123456" is the length of the sequence. > > > > Finally, I would add that using Chado through Tripal frequently makes life easier, though I think all of these issues would still have been problems (with the probable exception of item 2--Tripal would have initialized the "Note" item in cvterm I think). > > > > Scott > > > > > > On Thu, Jan 16, 2020 at 11:15 AM RAJAONISON Andriamiharimamy via Gmod-schema <gmo...@li...> wrote: > > Hi, > > > > Hi, > > Hope this mail will find you well. Send you my best wishes for this new year. > > I am reaching to you because I have issues loading GFF files into you Chado. I tried several files but none of them seems to work. > > > > Thank you in advance for you help > > > > Miharimamy > > > > The S.cerevisiae example in gmod.org > > Command : > > chado-1.31/load/bin/gmod_bulk_load_gff3.pl --organism yeast --gfffile saccharomyces/saccharomyces_cerevisiae.gff.sorted --fastafile saccharomyces/saccharomyces_cerevisiae.gff.sorted.fasta > > > > Output : > > ------------- EXCEPTION: Bio::Root::Exception ------------- > > MSG: Error in line: > > chrIII SGD chromosome 1 316620 . . . ID=chrIII;dbxref=NCBI:;Name=chrIII > > > > Dbxref value 'NCBI:' did not conform to GFF3 specification > > STACK: Error::throw > > STACK: Bio::Root::Root::throw /usr/local/share/perl5/Bio/Root/Root.pm:447 > > STACK: Bio::FeatureIO::gff::_handle_feature /usr/local/share/perl5/Bio/FeatureIO/gff.pm:659 > > STACK: Bio::FeatureIO::gff::next_feature /usr/local/share/perl5/Bio/FeatureIO/gff.pm:187 > > STACK: chado-1.31/load/bin/gmod_bulk_load_gff3.pl:782 > > ----------------------------------------------------------- > > S.cerevisiae without dbxref > > Command : > > chado-1.31/load/bin/gmod_bulk_load_gff3.pl --organism yeast --gfffile saccharomyces/test_saccharomyces_ncbi --fastafile saccharomyces/saccharomyces_cerevisiae.gff.sorted.fasta > > > > Output : > > ------------- EXCEPTION: Bio::Root::Exception ------------- > > MSG: I couldn't find the 'Note' cvterm in the database; > > Did you load the feature property controlled vocabulary? > > STACK: Error::throw > > STACK: Bio::Root::Root::throw /usr/local/share/perl5/Bio/Root/Root.pm:447 > > STACK: Bio::GMOD::DB::Adapter::handle_note /usr/local/share/perl5/Bio/GMOD/DB/Adapter.pm:3901 > > STACK: chado-1.31/load/bin/gmod_bulk_load_gff3.pl:974 > > > > A GFF from NCBI > > Command : > > chado-1.31/load/bin/gmod_bulk_load_gff3.pl --organism bacteria --gfffile GCF_000008865.2/GCF_003018035.1_ASM301803v1_genomic.gff.sorted > > > > Output : > > Unable to find srcfeature NZ_CP027390.1 in the database. > > Perhaps you need to rerun your data load with the '--recreate_cache' option. at /usr/local/share/perl5/Bio/GMOD/DB/Adapter.pm line 4605, <GEN0> line 2. > > Bio::GMOD::DB::Adapter::src_second_chance('Bio::GMOD::DB::Adapter=HASH(0x241a8d0)', 'Bio::SeqFeature::Annotated=HASH(0x26c61d0)') called at chado-1.31/load/bin/ gmod_bulk_load_gff3.pl line 851 > > > > A GFF3 from prokka > > Command : > > chado-1.31/load/bin/gmod_bulk_load_gff3.pl --organism bacteria -gfffile GFF3/GCF_000455285.gff.sorted --fastafile GFF3/GCF_000455285.gff.sorted.fasta > > Output : > > Unable to find srcfeature gnl|Prokka|GCF_000455285_1 in the database. > > Perhaps you need to rerun your data load with the '--recreate_cache' option. at /usr/local/share/perl5/Bio/GMOD/DB/Adapter.pm line 4605, <GEN0> line 2. > > Bio::GMOD::DB::Adapter::src_second_chance('Bio::GMOD::DB::Adapter=HASH(0x33008f0)', 'Bio::SeqFeature::Annotated=HASH(0x35ac0a0)') called at chado-1.31/load/bin/ gmod_bulk_load_gff3.pl line 851 > > > > > > > > > > _______________________________________________ > Gmod-schema mailing list > Gmo...@li... > https://lists.sourceforge.net/lists/listinfo/gmod-schema > > > > -- > > ------------------------------------------------------------------------ > Scott Cain, Ph. D. scott at scottcain dot net > GMOD Coordinator (http://gmod.org/) 216-392-3087 > Ontario Institute for Cancer Research > > > > -- > > ------------------------------------------------------------------------ > Scott Cain, Ph. D. scott at scottcain dot net > GMOD Coordinator (http://gmod.org/) 216-392-3087 > Ontario Institute for Cancer Research > > > > -- > > ------------------------------------------------------------------------ > Scott Cain, Ph. D. scott at scottcain dot net > GMOD Coordinator (http://gmod.org/) 216-392-3087 > Ontario Institute for Cancer Research -- ------------------------------------------------------------------------ Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research |
From: RAJAONISON A. <mir...@ya...> - 2020-01-20 19:09:55
|
Hi Scott, Yes it is there. Do I have to declare it like the NZ_CP027390.1 too ? Miharimamy De : Scott Cain <sc...@sc...> Envoyé : lundi 20 janvier 2020 22:03 À : RAJAONISON Andriamiharimamy <mir...@ya...> Cc : GMOD Schema/Chado List <gmo...@li...> Objet : Re: [Gmod-schema] [ISSUE] Load gff in chado Hi Miharimamy, The snippet of GFF that you posted had a chromosome named "NZ_CP027390.1" which looks fine to me, but the error message is about a reference sequence named "NZ_CP027391.1". Is it defined anywhere in your GFF? Scott On Mon, Jan 20, 2020 at 10:58 AM RAJAONISON Andriamiharimamy <mir...@ya... <mailto:mir...@ya...> > wrote: > > Hi Scott, > > > > Even with the --recreate_cache and --remove_lock, I still get the same error. > > > > Miharimamy > > > > > > (Re)creating the uniquename cache in the database... > > Creating table... > > Populating table... > > Creating indexes... > > Adjusting the primary key sequences (if necessary)...Done. > > Preparing data for inserting into the chado database > > (This may take a while ...) > > Unable to find srcfeature NZ_CP027391.1 in the database. > > Perhaps you need to rerun your data load with the '--recreate_cache' option. at /usr/local/share/perl5/Bio/GMOD/DB/Adapter.pm line 4605, <GEN0> line 5939. > > Bio::GMOD::DB::Adapter::src_second_chance('Bio::GMOD::DB::Adapter=HASH(0x25623e0)', 'Bio::SeqFeature::Annotated=HASH(0x2680c18)') called at chado-1.31/load/bin/gmod_bulk_load_gff3.pl <http://gmod_bulk_load_gff3.pl> line 851 > > > > Abnormal termination, trying to clean up... > > > > Attempting to clean up the loader temp table (so that --recreate_cache > > won't be needed)... > > Trying to remove the run lock (so that --remove_lock won't be needed)... > > Exiting... > > > > > > > > De : Scott Cain <sc...@sc... <mailto:sc...@sc...> > > Envoyé : lundi 20 janvier 2020 20:56 > À : RAJAONISON Andriamiharimamy <mir...@ya... <mailto:mir...@ya...> > > Cc : GMOD Schema/Chado List <gmo...@li... <mailto:gmo...@li...> > > Objet : Re: [Gmod-schema] [ISSUE] Load gff in chado > > > > Hi Miharimamy, > > > > Did you try running with the --recreate_cache option as suggested by the error message? > > > > On Mon, Jan 20, 2020 at 6:54 AM RAJAONISON Andriamiharimamy <mir...@ya... <mailto:mir...@ya...> > wrote: > > Hi Scott, > > You were right, the issue lied in the line 2. The field separator was 4 spaces instead of tabulation. > > I converted it and tried to load the file back but went back to the first error. > > > > Thank you for your time, > > Miharimamy > > > > Preparing data for inserting into the chado database > > (This may take a while ...) > > Unable to find srcfeature NZ_CP027391.1 in the database. > > Perhaps you need to rerun your data load with the '--recreate_cache' option. at /usr/local/share/perl5/Bio/GMOD/DB/Adapter.pm line 4605, <GEN0> line 5939. > > Bio::GMOD::DB::Adapter::src_second_chance('Bio::GMOD::DB::Adapter=HASH(0x297c120)', 'Bio::SeqFeature::Annotated=HASH(0x2bcd420)') called at chado-1.31/load/bin/gmod_bulk_load_gff3.pl <http://gmod_bulk_load_gff3.pl> line 851 > > > > Abnormal termination, trying to clean up... > > > > Attempting to clean up the loader temp table (so that --recreate_cache > > won't be needed)... > > Trying to remove the run lock (so that --remove_lock won't be needed)... > > Exiting... > > > > > > ##gff-version 3 > > NZ_CP027390.1 . chromosome 1 5901472 . . . ID=NZ_CP027390.1;Name=NZ_CP027390.1 > > NZ_CP027390.1 RefSeq gene 931 1197 . + . ID=gene0;Name=AX062_RS00010;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS00010 > > NZ_CP027390.1 RefSeq gene 1191 1577 . - . ID=gene1;Name=AX062_RS00015;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS00015 > > NZ_CP027390.1 RefSeq gene 10194 10736 . + . ID=gene10;Name=AX062_RS00060;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS00060 > > NZ_CP027390.1 RefSeq gene 80547 81860 . + . ID=gene100;Name=AX062_RS00510;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS00510 > > NZ_CP027390.1 RefSeq gene 918692 918997 . - . ID=gene1000;Name=AX062_RS05010;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS05010 > > NZ_CP027390.1 RefSeq gene 919105 919815 . + . ID=gene1001;Name=AX062_RS05015;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS05015 > > NZ_CP027390.1 RefSeq gene 919818 920378 . - . ID=gene1002;Name=AX062_RS05020;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS05020 > > NZ_CP027390.1 RefSeq gene 920413 920754 . - . ID=gene1003;Name=AX062_RS05025;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS05025 > > NZ_CP027390.1 RefSeq gene 920889 921215 . + . ID=gene1004;Name=AX062_RS05030;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS05030 > > NZ_CP027390.1 RefSeq gene 921252 921440 . + . ID=gene1005;Name=AX062_RS05035;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS05035 > > NZ_CP027390.1 RefSeq gene 921421 922635 . + . ID=gene1006;Name=AX062_RS05040;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS05040 > > > > > > De : Scott Cain <sc...@sc... <mailto:sc...@sc...> > > Envoyé : vendredi 17 janvier 2020 21:49 > À : RAJAONISON Andriamiharimamy <mir...@ya... <mailto:mir...@ya...> > > Cc : GMOD Schema/Chado List <gmo...@li... <mailto:gmo...@li...> > > Objet : Re: [Gmod-schema] [ISSUE] Load gff in chado > > > > Hi Miharimamy, > > > > The end of the error message where it says "line 2" I'm pretty sure means the problem is in line 2 of your GFF file. Is that the line you added? What does it look like? > > > > Scott > > > > > > On Fri, Jan 17, 2020 at 5:57 AM RAJAONISON Andriamiharimamy <mir...@ya... <mailto:mir...@ya...> > wrote: > > Hi Scott, > > > > Thank you again for your help because adding the parent feature removes the srcfeature related error. > > Yet I get another issue that I don’t understand : > > > > --------------------- WARNING --------------------- > > MSG: Can not set Bio::Location::Simple::end() equal to start; start not set > > --------------------------------------------------- > > Use of uninitialized value $featuretype in pattern match (m//) at chado-1.31/load/bin/gmod_bulk_load_gff3.pl <http://gmod_bulk_load_gff3.pl> line 819, <GEN0> line 2. > > Use of uninitialized value $featuretype in pattern match (m//) at chado-1.31/load/bin/gmod_bulk_load_gff3.pl <http://gmod_bulk_load_gff3.pl> line 820, <GEN0> line 2. > > > > ------------- EXCEPTION: Bio::Root::Exception ------------- > > MSG: no cvterm for > > STACK: Error::throw > > STACK: Bio::Root::Root::throw /usr/local/share/perl5/Bio/Root/Root.pm:447 > > STACK: Bio::GMOD::DB::Adapter::get_type /usr/local/share/perl5/Bio/GMOD/DB/Adapter.pm:4629 > > STACK: chado-1.31/load/bin/gmod_bulk_load_gff3.pl:848 <http://gmod_bulk_load_gff3.pl:848> > > ----------------------------------------------------------- > > > > Abnormal termination, trying to clean up... > > > > Attempting to clean up the loader temp table (so that --recreate_cache > > won't be needed)... > > Trying to remove the run lock (so that --remove_lock won't be needed)... > > Exiting... > > > > Thank you > > > > Miharimamy > > > > > > De : Scott Cain <sc...@sc... <mailto:sc...@sc...> > > Envoyé : jeudi 16 janvier 2020 22:31 > À : RAJAONISON Andriamiharimamy <mir...@ya... <mailto:mir...@ya...> > > Cc : GMOD Schema/Chado List <gmo...@li... <mailto:gmo...@li...> > > Objet : Re: [Gmod-schema] [ISSUE] Load gff in chado > > > > Hi Miharimamy, > > > > Thanks for sending this report. Generally, loading GFF into Chado can be difficult, as the perl-based loader that you are using can be quite particular about the format of the GFF and producers of GFF generally aren't so particular. Since the loader makes the (in my view, correct) decision to not load anything if it can't load everything in a file, it quits. So, taking each of the problems you found in order: > > > > 1. "dbxref=NCBI:" isn't a valid accession (presumably there should be something after the "NCBI:") and so the loader won't continue. Two options here are to contact SGD and point out that there is a problem with their GFF, or delete the offending item and try again, which it looks like you did for item 2. > > > > 2. If the item Note is missing from the cvterm table, I think that probably means that you didn't install the schema using the "make" procedure that would have installed some necessary items in the cv and cvterm table. > > > > 3 and 4: Messages that srcfeatures can't be found mean that the feature on which the features in your GFF reside hasn't been defined yet (that is, the thing referred to in column 1 of the GFF doesn't exist). Frequently, creators of GFF don't define the reference sequence in the GFF for whatever reason (it's not required by the GFF3 spec, since it might be credibly defined elsewhere). To define it in the GFF you have, add a line before anything else that looks something like this: > > > > NZ_CP027390.1 . chromosome 1 1234566 . . . ID=NZ_CP027390.1;Name=NZ_CP027390.1 > > > > Where "NZ_CP027390.1" is the name of the reference sequence, "chromosome" is the Sequence Ontology term for the type of thing (other options would include "contig" and "supercontig"), and the "123456" is the length of the sequence. > > > > Finally, I would add that using Chado through Tripal frequently makes life easier, though I think all of these issues would still have been problems (with the probable exception of item 2--Tripal would have initialized the "Note" item in cvterm I think). > > > > Scott > > > > > > On Thu, Jan 16, 2020 at 11:15 AM RAJAONISON Andriamiharimamy via Gmod-schema <gmo...@li... <mailto:gmo...@li...> > wrote: > > Hi, > > > > Hi, > > Hope this mail will find you well. Send you my best wishes for this new year. > > I am reaching to you because I have issues loading GFF files into you Chado. I tried several files but none of them seems to work. > > > > Thank you in advance for you help > > > > Miharimamy > > > > The S.cerevisiae example in gmod.org <http://gmod.org> > > Command : > > chado-1.31/load/bin/gmod_bulk_load_gff3.pl <http://gmod_bulk_load_gff3.pl> --organism yeast --gfffile saccharomyces/saccharomyces_cerevisiae.gff.sorted --fastafile saccharomyces/saccharomyces_cerevisiae.gff.sorted.fasta > > > > Output : > > ------------- EXCEPTION: Bio::Root::Exception ------------- > > MSG: Error in line: > > chrIII SGD chromosome 1 316620 . . . ID=chrIII;dbxref=NCBI:;Name=chrIII > > > > Dbxref value 'NCBI:' did not conform to GFF3 specification > > STACK: Error::throw > > STACK: Bio::Root::Root::throw /usr/local/share/perl5/Bio/Root/Root.pm:447 > > STACK: Bio::FeatureIO::gff::_handle_feature /usr/local/share/perl5/Bio/FeatureIO/gff.pm:659 <http://gff.pm:659> > > STACK: Bio::FeatureIO::gff::next_feature /usr/local/share/perl5/Bio/FeatureIO/gff.pm:187 <http://gff.pm:187> > > STACK: chado-1.31/load/bin/gmod_bulk_load_gff3.pl:782 <http://gmod_bulk_load_gff3.pl:782> > > ----------------------------------------------------------- > > S.cerevisiae without dbxref > > Command : > > chado-1.31/load/bin/gmod_bulk_load_gff3.pl <http://gmod_bulk_load_gff3.pl> --organism yeast --gfffile saccharomyces/test_saccharomyces_ncbi --fastafile saccharomyces/saccharomyces_cerevisiae.gff.sorted.fasta > > > > Output : > > ------------- EXCEPTION: Bio::Root::Exception ------------- > > MSG: I couldn't find the 'Note' cvterm in the database; > > Did you load the feature property controlled vocabulary? > > STACK: Error::throw > > STACK: Bio::Root::Root::throw /usr/local/share/perl5/Bio/Root/Root.pm:447 > > STACK: Bio::GMOD::DB::Adapter::handle_note /usr/local/share/perl5/Bio/GMOD/DB/Adapter.pm:3901 > > STACK: chado-1.31/load/bin/gmod_bulk_load_gff3.pl:974 <http://gmod_bulk_load_gff3.pl:974> > > > > A GFF from NCBI > > Command : > > chado-1.31/load/bin/gmod_bulk_load_gff3.pl <http://gmod_bulk_load_gff3.pl> --organism bacteria --gfffile GCF_000008865.2/GCF_003018035.1_ASM301803v1_genomic.gff.sorted > > > > Output : > > Unable to find srcfeature NZ_CP027390.1 in the database. > > Perhaps you need to rerun your data load with the '--recreate_cache' option. at /usr/local/share/perl5/Bio/GMOD/DB/Adapter.pm line 4605, <GEN0> line 2. > > Bio::GMOD::DB::Adapter::src_second_chance('Bio::GMOD::DB::Adapter=HASH(0x241a8d0)', 'Bio::SeqFeature::Annotated=HASH(0x26c61d0)') called at chado-1.31/load/bin/gmod_bulk_load_gff3.pl <http://gmod_bulk_load_gff3.pl> line 851 > > > > A GFF3 from prokka > > Command : > > chado-1.31/load/bin/gmod_bulk_load_gff3.pl <http://gmod_bulk_load_gff3.pl> --organism bacteria -gfffile GFF3/GCF_000455285.gff.sorted --fastafile GFF3/GCF_000455285.gff.sorted.fasta > > Output : > > Unable to find srcfeature gnl|Prokka|GCF_000455285_1 in the database. > > Perhaps you need to rerun your data load with the '--recreate_cache' option. at /usr/local/share/perl5/Bio/GMOD/DB/Adapter.pm line 4605, <GEN0> line 2. > > Bio::GMOD::DB::Adapter::src_second_chance('Bio::GMOD::DB::Adapter=HASH(0x33008f0)', 'Bio::SeqFeature::Annotated=HASH(0x35ac0a0)') called at chado-1.31/load/bin/gmod_bulk_load_gff3.pl <http://gmod_bulk_load_gff3.pl> line 851 > > > > > > > > > > _______________________________________________ > Gmod-schema mailing list > Gmo...@li... <mailto:Gmo...@li...> > https://lists.sourceforge.net/lists/listinfo/gmod-schema > > > > -- > > ------------------------------------------------------------------------ > Scott Cain, Ph. D. scott at scottcain dot net > GMOD Coordinator (http://gmod.org/) 216-392-3087 > Ontario Institute for Cancer Research > > > > -- > > ------------------------------------------------------------------------ > Scott Cain, Ph. D. scott at scottcain dot net > GMOD Coordinator (http://gmod.org/) 216-392-3087 > Ontario Institute for Cancer Research > > > > -- > > ------------------------------------------------------------------------ > Scott Cain, Ph. D. scott at scottcain dot net > GMOD Coordinator (http://gmod.org/) 216-392-3087 > Ontario Institute for Cancer Research -- ------------------------------------------------------------------------ Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research |
From: RAJAONISON A. <mir...@ya...> - 2020-01-20 19:24:17
|
Hi again Scott, Here is the modified header of the file : ##gff-version 3 NZ_CP027390.1 . chromosome 1 5901472 . . . ID=NZ_CP027390.1;Name=NZ_CP027390.1 NZ_CP027391.1 . plasmid 1 98724 . . . ID=NZ_CP027391.1;Name=NZ_CP027391.1 NZ_CP027390.1 RefSeq gene 931 1197 . + . ID=gene0;Name=AX062_RS00010;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS00010 NZ_CP027390.1 RefSeq gene 1191 1577 . - . ID=gene1;Name=AX062_RS00015;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS00015 And here is the output of the script. ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: The following tag(s) are illegal and are causing this parser to die: Is_circular STACK: Error::throw STACK: Bio::Root::Root::throw /usr/local/share/perl5/Bio/Root/Root.pm:447 STACK: Bio::FeatureIO::gff::_handle_feature /usr/local/share/perl5/Bio/FeatureIO/gff.pm:787 STACK: Bio::FeatureIO::gff::next_feature /usr/local/share/perl5/Bio/FeatureIO/gff.pm:187 STACK: chado-1.31/load/bin/gmod_bulk_load_gff3.pl:782 ----------------------------------------------------------- I checked the tag in the file and here is the snippet causing issue: NZ_CP027390.1 RefSeq region 1 5802748 . + . ID=id0;Dbxref=taxon:562;Is_circular=true;Name=ANONYMOUS;collected-by=CDC;country=USA;gbkey=Src;genome=chromosome;mol_type=genomic DNA;serov ar=E. coli O26:Pending;strain=2015C-4944 NZ_CP027390.1 cmsearch riboswitch 4764335 4764483 . + . ID=id113;Dbxref=RFAM:RF00050;Note=FMN riboswitch;bound_moiety=flavin mononucleotide;gbkey=regulatory;inference=COORDINATES: profile:INFERNAL:1.1.1;regulatory_class=riboswitch NZ_CP027390.1 RefSeq direct_repeat 5085858 5086435 . + . ID=id120;gbkey=repeat_region;inference=COORDINATES: alignment:pilercr:v1.02;rpt_family=CRISPR;rpt_type=direct;rpt_unit_range=508586 2..5085885;rpt_unit_seq=tccccgcgccagcggggataaacc NZ_CP027390.1 RefSeq direct_repeat 5112137 5112653 . + . ID=id121;gbkey=repeat_region;inference=COORDINATES: alignment:pilercr:v1.02;rpt_family=CRISPR;rpt_type=direct;rpt_unit_range=511213 7..5112165;rpt_unit_seq=gtgttccccgcgccagcggggataaaccg NZ_CP027391.1 RefSeq region 1 98724 . + . ID=id147;Dbxref=taxon:562;Is_circular=true;Name=unnamed;collected-by=CDC;country=USA;gbkey=Src;genome=plasmid;mol_type=genomic DNA;plasmid- name=unnamed;serovar=E. coli O26:Pending;strain=2015C-4944 Thank you Miharimamy De : Scott Cain <sc...@sc...> Envoyé : lundi 20 janvier 2020 22:03 À : RAJAONISON Andriamiharimamy <mir...@ya...> Cc : GMOD Schema/Chado List <gmo...@li...> Objet : Re: [Gmod-schema] [ISSUE] Load gff in chado Hi Miharimamy, The snippet of GFF that you posted had a chromosome named "NZ_CP027390.1" which looks fine to me, but the error message is about a reference sequence named "NZ_CP027391.1". Is it defined anywhere in your GFF? Scott On Mon, Jan 20, 2020 at 10:58 AM RAJAONISON Andriamiharimamy <mir...@ya... <mailto:mir...@ya...> > wrote: > > Hi Scott, > > > > Even with the --recreate_cache and --remove_lock, I still get the same error. > > > > Miharimamy > > > > > > (Re)creating the uniquename cache in the database... > > Creating table... > > Populating table... > > Creating indexes... > > Adjusting the primary key sequences (if necessary)...Done. > > Preparing data for inserting into the chado database > > (This may take a while ...) > > Unable to find srcfeature NZ_CP027391.1 in the database. > > Perhaps you need to rerun your data load with the '--recreate_cache' option. at /usr/local/share/perl5/Bio/GMOD/DB/Adapter.pm line 4605, <GEN0> line 5939. > > Bio::GMOD::DB::Adapter::src_second_chance('Bio::GMOD::DB::Adapter=HASH(0x25623e0)', 'Bio::SeqFeature::Annotated=HASH(0x2680c18)') called at chado-1.31/load/bin/gmod_bulk_load_gff3.pl <http://gmod_bulk_load_gff3.pl> line 851 > > > > Abnormal termination, trying to clean up... > > > > Attempting to clean up the loader temp table (so that --recreate_cache > > won't be needed)... > > Trying to remove the run lock (so that --remove_lock won't be needed)... > > Exiting... > > > > > > > > De : Scott Cain <sc...@sc... <mailto:sc...@sc...> > > Envoyé : lundi 20 janvier 2020 20:56 > À : RAJAONISON Andriamiharimamy <mir...@ya... <mailto:mir...@ya...> > > Cc : GMOD Schema/Chado List <gmo...@li... <mailto:gmo...@li...> > > Objet : Re: [Gmod-schema] [ISSUE] Load gff in chado > > > > Hi Miharimamy, > > > > Did you try running with the --recreate_cache option as suggested by the error message? > > > > On Mon, Jan 20, 2020 at 6:54 AM RAJAONISON Andriamiharimamy <mir...@ya... <mailto:mir...@ya...> > wrote: > > Hi Scott, > > You were right, the issue lied in the line 2. The field separator was 4 spaces instead of tabulation. > > I converted it and tried to load the file back but went back to the first error. > > > > Thank you for your time, > > Miharimamy > > > > Preparing data for inserting into the chado database > > (This may take a while ...) > > Unable to find srcfeature NZ_CP027391.1 in the database. > > Perhaps you need to rerun your data load with the '--recreate_cache' option. at /usr/local/share/perl5/Bio/GMOD/DB/Adapter.pm line 4605, <GEN0> line 5939. > > Bio::GMOD::DB::Adapter::src_second_chance('Bio::GMOD::DB::Adapter=HASH(0x297c120)', 'Bio::SeqFeature::Annotated=HASH(0x2bcd420)') called at chado-1.31/load/bin/gmod_bulk_load_gff3.pl <http://gmod_bulk_load_gff3.pl> line 851 > > > > Abnormal termination, trying to clean up... > > > > Attempting to clean up the loader temp table (so that --recreate_cache > > won't be needed)... > > Trying to remove the run lock (so that --remove_lock won't be needed)... > > Exiting... > > > > > > ##gff-version 3 > > NZ_CP027390.1 . chromosome 1 5901472 . . . ID=NZ_CP027390.1;Name=NZ_CP027390.1 > > NZ_CP027390.1 RefSeq gene 931 1197 . + . ID=gene0;Name=AX062_RS00010;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS00010 > > NZ_CP027390.1 RefSeq gene 1191 1577 . - . ID=gene1;Name=AX062_RS00015;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS00015 > > NZ_CP027390.1 RefSeq gene 10194 10736 . + . ID=gene10;Name=AX062_RS00060;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS00060 > > NZ_CP027390.1 RefSeq gene 80547 81860 . + . ID=gene100;Name=AX062_RS00510;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS00510 > > NZ_CP027390.1 RefSeq gene 918692 918997 . - . ID=gene1000;Name=AX062_RS05010;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS05010 > > NZ_CP027390.1 RefSeq gene 919105 919815 . + . ID=gene1001;Name=AX062_RS05015;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS05015 > > NZ_CP027390.1 RefSeq gene 919818 920378 . - . ID=gene1002;Name=AX062_RS05020;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS05020 > > NZ_CP027390.1 RefSeq gene 920413 920754 . - . ID=gene1003;Name=AX062_RS05025;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS05025 > > NZ_CP027390.1 RefSeq gene 920889 921215 . + . ID=gene1004;Name=AX062_RS05030;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS05030 > > NZ_CP027390.1 RefSeq gene 921252 921440 . + . ID=gene1005;Name=AX062_RS05035;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS05035 > > NZ_CP027390.1 RefSeq gene 921421 922635 . + . ID=gene1006;Name=AX062_RS05040;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS05040 > > > > > > De : Scott Cain <sc...@sc... <mailto:sc...@sc...> > > Envoyé : vendredi 17 janvier 2020 21:49 > À : RAJAONISON Andriamiharimamy <mir...@ya... <mailto:mir...@ya...> > > Cc : GMOD Schema/Chado List <gmo...@li... <mailto:gmo...@li...> > > Objet : Re: [Gmod-schema] [ISSUE] Load gff in chado > > > > Hi Miharimamy, > > > > The end of the error message where it says "line 2" I'm pretty sure means the problem is in line 2 of your GFF file. Is that the line you added? What does it look like? > > > > Scott > > > > > > On Fri, Jan 17, 2020 at 5:57 AM RAJAONISON Andriamiharimamy <mir...@ya... <mailto:mir...@ya...> > wrote: > > Hi Scott, > > > > Thank you again for your help because adding the parent feature removes the srcfeature related error. > > Yet I get another issue that I don’t understand : > > > > --------------------- WARNING --------------------- > > MSG: Can not set Bio::Location::Simple::end() equal to start; start not set > > --------------------------------------------------- > > Use of uninitialized value $featuretype in pattern match (m//) at chado-1.31/load/bin/gmod_bulk_load_gff3.pl <http://gmod_bulk_load_gff3.pl> line 819, <GEN0> line 2. > > Use of uninitialized value $featuretype in pattern match (m//) at chado-1.31/load/bin/gmod_bulk_load_gff3.pl <http://gmod_bulk_load_gff3.pl> line 820, <GEN0> line 2. > > > > ------------- EXCEPTION: Bio::Root::Exception ------------- > > MSG: no cvterm for > > STACK: Error::throw > > STACK: Bio::Root::Root::throw /usr/local/share/perl5/Bio/Root/Root.pm:447 > > STACK: Bio::GMOD::DB::Adapter::get_type /usr/local/share/perl5/Bio/GMOD/DB/Adapter.pm:4629 > > STACK: chado-1.31/load/bin/gmod_bulk_load_gff3.pl:848 <http://gmod_bulk_load_gff3.pl:848> > > ----------------------------------------------------------- > > > > Abnormal termination, trying to clean up... > > > > Attempting to clean up the loader temp table (so that --recreate_cache > > won't be needed)... > > Trying to remove the run lock (so that --remove_lock won't be needed)... > > Exiting... > > > > Thank you > > > > Miharimamy > > > > > > De : Scott Cain <sc...@sc... <mailto:sc...@sc...> > > Envoyé : jeudi 16 janvier 2020 22:31 > À : RAJAONISON Andriamiharimamy <mir...@ya... <mailto:mir...@ya...> > > Cc : GMOD Schema/Chado List <gmo...@li... <mailto:gmo...@li...> > > Objet : Re: [Gmod-schema] [ISSUE] Load gff in chado > > > > Hi Miharimamy, > > > > Thanks for sending this report. Generally, loading GFF into Chado can be difficult, as the perl-based loader that you are using can be quite particular about the format of the GFF and producers of GFF generally aren't so particular. Since the loader makes the (in my view, correct) decision to not load anything if it can't load everything in a file, it quits. So, taking each of the problems you found in order: > > > > 1. "dbxref=NCBI:" isn't a valid accession (presumably there should be something after the "NCBI:") and so the loader won't continue. Two options here are to contact SGD and point out that there is a problem with their GFF, or delete the offending item and try again, which it looks like you did for item 2. > > > > 2. If the item Note is missing from the cvterm table, I think that probably means that you didn't install the schema using the "make" procedure that would have installed some necessary items in the cv and cvterm table. > > > > 3 and 4: Messages that srcfeatures can't be found mean that the feature on which the features in your GFF reside hasn't been defined yet (that is, the thing referred to in column 1 of the GFF doesn't exist). Frequently, creators of GFF don't define the reference sequence in the GFF for whatever reason (it's not required by the GFF3 spec, since it might be credibly defined elsewhere). To define it in the GFF you have, add a line before anything else that looks something like this: > > > > NZ_CP027390.1 . chromosome 1 1234566 . . . ID=NZ_CP027390.1;Name=NZ_CP027390.1 > > > > Where "NZ_CP027390.1" is the name of the reference sequence, "chromosome" is the Sequence Ontology term for the type of thing (other options would include "contig" and "supercontig"), and the "123456" is the length of the sequence. > > > > Finally, I would add that using Chado through Tripal frequently makes life easier, though I think all of these issues would still have been problems (with the probable exception of item 2--Tripal would have initialized the "Note" item in cvterm I think). > > > > Scott > > > > > > On Thu, Jan 16, 2020 at 11:15 AM RAJAONISON Andriamiharimamy via Gmod-schema <gmo...@li... <mailto:gmo...@li...> > wrote: > > Hi, > > > > Hi, > > Hope this mail will find you well. Send you my best wishes for this new year. > > I am reaching to you because I have issues loading GFF files into you Chado. I tried several files but none of them seems to work. > > > > Thank you in advance for you help > > > > Miharimamy > > > > The S.cerevisiae example in gmod.org <http://gmod.org> > > Command : > > chado-1.31/load/bin/gmod_bulk_load_gff3.pl <http://gmod_bulk_load_gff3.pl> --organism yeast --gfffile saccharomyces/saccharomyces_cerevisiae.gff.sorted --fastafile saccharomyces/saccharomyces_cerevisiae.gff.sorted.fasta > > > > Output : > > ------------- EXCEPTION: Bio::Root::Exception ------------- > > MSG: Error in line: > > chrIII SGD chromosome 1 316620 . . . ID=chrIII;dbxref=NCBI:;Name=chrIII > > > > Dbxref value 'NCBI:' did not conform to GFF3 specification > > STACK: Error::throw > > STACK: Bio::Root::Root::throw /usr/local/share/perl5/Bio/Root/Root.pm:447 > > STACK: Bio::FeatureIO::gff::_handle_feature /usr/local/share/perl5/Bio/FeatureIO/gff.pm:659 <http://gff.pm:659> > > STACK: Bio::FeatureIO::gff::next_feature /usr/local/share/perl5/Bio/FeatureIO/gff.pm:187 <http://gff.pm:187> > > STACK: chado-1.31/load/bin/gmod_bulk_load_gff3.pl:782 <http://gmod_bulk_load_gff3.pl:782> > > ----------------------------------------------------------- > > S.cerevisiae without dbxref > > Command : > > chado-1.31/load/bin/gmod_bulk_load_gff3.pl <http://gmod_bulk_load_gff3.pl> --organism yeast --gfffile saccharomyces/test_saccharomyces_ncbi --fastafile saccharomyces/saccharomyces_cerevisiae.gff.sorted.fasta > > > > Output : > > ------------- EXCEPTION: Bio::Root::Exception ------------- > > MSG: I couldn't find the 'Note' cvterm in the database; > > Did you load the feature property controlled vocabulary? > > STACK: Error::throw > > STACK: Bio::Root::Root::throw /usr/local/share/perl5/Bio/Root/Root.pm:447 > > STACK: Bio::GMOD::DB::Adapter::handle_note /usr/local/share/perl5/Bio/GMOD/DB/Adapter.pm:3901 > > STACK: chado-1.31/load/bin/gmod_bulk_load_gff3.pl:974 <http://gmod_bulk_load_gff3.pl:974> > > > > A GFF from NCBI > > Command : > > chado-1.31/load/bin/gmod_bulk_load_gff3.pl <http://gmod_bulk_load_gff3.pl> --organism bacteria --gfffile GCF_000008865.2/GCF_003018035.1_ASM301803v1_genomic.gff.sorted > > > > Output : > > Unable to find srcfeature NZ_CP027390.1 in the database. > > Perhaps you need to rerun your data load with the '--recreate_cache' option. at /usr/local/share/perl5/Bio/GMOD/DB/Adapter.pm line 4605, <GEN0> line 2. > > Bio::GMOD::DB::Adapter::src_second_chance('Bio::GMOD::DB::Adapter=HASH(0x241a8d0)', 'Bio::SeqFeature::Annotated=HASH(0x26c61d0)') called at chado-1.31/load/bin/gmod_bulk_load_gff3.pl <http://gmod_bulk_load_gff3.pl> line 851 > > > > A GFF3 from prokka > > Command : > > chado-1.31/load/bin/gmod_bulk_load_gff3.pl <http://gmod_bulk_load_gff3.pl> --organism bacteria -gfffile GFF3/GCF_000455285.gff.sorted --fastafile GFF3/GCF_000455285.gff.sorted.fasta > > Output : > > Unable to find srcfeature gnl|Prokka|GCF_000455285_1 in the database. > > Perhaps you need to rerun your data load with the '--recreate_cache' option. at /usr/local/share/perl5/Bio/GMOD/DB/Adapter.pm line 4605, <GEN0> line 2. > > Bio::GMOD::DB::Adapter::src_second_chance('Bio::GMOD::DB::Adapter=HASH(0x33008f0)', 'Bio::SeqFeature::Annotated=HASH(0x35ac0a0)') called at chado-1.31/load/bin/gmod_bulk_load_gff3.pl <http://gmod_bulk_load_gff3.pl> line 851 > > > > > > > > > > _______________________________________________ > Gmod-schema mailing list > Gmo...@li... <mailto:Gmo...@li...> > https://lists.sourceforge.net/lists/listinfo/gmod-schema > > > > -- > > ------------------------------------------------------------------------ > Scott Cain, Ph. D. scott at scottcain dot net > GMOD Coordinator (http://gmod.org/) 216-392-3087 > Ontario Institute for Cancer Research > > > > -- > > ------------------------------------------------------------------------ > Scott Cain, Ph. D. scott at scottcain dot net > GMOD Coordinator (http://gmod.org/) 216-392-3087 > Ontario Institute for Cancer Research > > > > -- > > ------------------------------------------------------------------------ > Scott Cain, Ph. D. scott at scottcain dot net > GMOD Coordinator (http://gmod.org/) 216-392-3087 > Ontario Institute for Cancer Research -- ------------------------------------------------------------------------ Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research |
From: Scott C. <sc...@sc...> - 2020-01-20 19:38:56
|
Is_circular isn't a valid GFF tag. You can change it to is_circular to fix it. On Mon, Jan 20, 2020 at 11:24 AM RAJAONISON Andriamiharimamy < mir...@ya...> wrote: > Hi again Scott, > > > > Here is the modified header of the file : > > > > ##gff-version 3 > > NZ_CP027390.1 . chromosome 1 5901472 . . > . ID=NZ_CP027390.1;Name=NZ_CP027390.1 > > NZ_CP027391.1 . plasmid 1 98724 . . . > ID=NZ_CP027391.1;Name=NZ_CP027391.1 > > NZ_CP027390.1 RefSeq gene 931 1197 . + . > ID=gene0;Name=AX062_RS00010;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS00010 > > NZ_CP027390.1 RefSeq gene 1191 1577 . - . > ID=gene1;Name=AX062_RS00015;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS00015 > > > > And here is the output of the script. > > > > ------------- EXCEPTION: Bio::Root::Exception ------------- > > MSG: The following tag(s) are illegal and are causing this parser to die: > Is_circular > > STACK: Error::throw > > STACK: Bio::Root::Root::throw /usr/local/share/perl5/Bio/Root/Root.pm:447 > > STACK: Bio::FeatureIO::gff::_handle_feature > /usr/local/share/perl5/Bio/FeatureIO/gff.pm:787 > > STACK: Bio::FeatureIO::gff::next_feature > /usr/local/share/perl5/Bio/FeatureIO/gff.pm:187 > > STACK: chado-1.31/load/bin/gmod_bulk_load_gff3.pl:782 > > ----------------------------------------------------------- > > > > I checked the tag in the file and here is the snippet causing issue: > > > > NZ_CP027390.1 RefSeq region 1 5802748 . + . > ID=id0;Dbxref=taxon:562;Is_circular=true;Name=ANONYMOUS;collected-by=CDC;country=USA;gbkey=Src;genome=chromosome;mol_type=genomic > DNA;serov > > ar=E. coli O26:Pending;strain=2015C-4944 > > NZ_CP027390.1 cmsearch riboswitch 4764335 4764483 . > + . ID=id113;Dbxref=RFAM:RF00050;Note=FMN > riboswitch;bound_moiety=flavin > mononucleotide;gbkey=regulatory;inference=COORDINATES: > > profile:INFERNAL:1.1.1;regulatory_class=riboswitch > > NZ_CP027390.1 RefSeq direct_repeat 5085858 5086435 . + > . ID=id120;gbkey=repeat_region;inference=COORDINATES: > alignment:pilercr:v1.02;rpt_family=CRISPR;rpt_type=direct;rpt_unit_range=508586 > > 2..5085885;rpt_unit_seq=tccccgcgccagcggggataaacc > > NZ_CP027390.1 RefSeq direct_repeat 5112137 5112653 . + > . ID=id121;gbkey=repeat_region;inference=COORDINATES: > alignment:pilercr:v1.02;rpt_family=CRISPR;rpt_type=direct;rpt_unit_range=511213 > > 7..5112165;rpt_unit_seq=gtgttccccgcgccagcggggataaaccg > > NZ_CP027391.1 RefSeq region 1 98724 . + . > ID=id147;Dbxref=taxon:562;Is_circular=true;Name=unnamed;collected-by=CDC;country=USA;gbkey=Src;genome=plasmid;mol_type=genomic > DNA;plasmid- > > name=unnamed;serovar=E. coli O26:Pending;strain=2015C-4944 > > > > Thank you > > Miharimamy > > > > *De :* Scott Cain <sc...@sc...> > *Envoyé :* lundi 20 janvier 2020 22:03 > *À :* RAJAONISON Andriamiharimamy <mir...@ya...> > *Cc :* GMOD Schema/Chado List <gmo...@li...> > *Objet :* Re: [Gmod-schema] [ISSUE] Load gff in chado > > > > Hi Miharimamy, > > The snippet of GFF that you posted had a chromosome named "NZ_CP027390.1" > which looks fine to me, but the error message is about a reference sequence > named "NZ_CP027391.1". Is it defined anywhere in your GFF? > > Scott > > > On Mon, Jan 20, 2020 at 10:58 AM RAJAONISON Andriamiharimamy < > mir...@ya...> wrote: > > > > Hi Scott, > > > > > > > > Even with the --recreate_cache and --remove_lock, I still get the same > error. > > > > > > > > Miharimamy > > > > > > > > > > > > (Re)creating the uniquename cache in the database... > > > > Creating table... > > > > Populating table... > > > > Creating indexes... > > > > Adjusting the primary key sequences (if necessary)...Done. > > > > Preparing data for inserting into the chado database > > > > (This may take a while ...) > > > > Unable to find srcfeature NZ_CP027391.1 in the database. > > > > Perhaps you need to rerun your data load with the '--recreate_cache' > option. at /usr/local/share/perl5/Bio/GMOD/DB/Adapter.pm line 4605, <GEN0> > line 5939. > > > > > Bio::GMOD::DB::Adapter::src_second_chance('Bio::GMOD::DB::Adapter=HASH(0x25623e0)', > 'Bio::SeqFeature::Annotated=HASH(0x2680c18)') called at chado-1.31/load/bin/ > gmod_bulk_load_gff3.pl line 851 > > > > > > > > Abnormal termination, trying to clean up... > > > > > > > > Attempting to clean up the loader temp table (so that --recreate_cache > > > > won't be needed)... > > > > Trying to remove the run lock (so that --remove_lock won't be needed)... > > > > Exiting... > > > > > > > > > > > > > > > > De : Scott Cain <sc...@sc...> > > Envoyé : lundi 20 janvier 2020 20:56 > > À : RAJAONISON Andriamiharimamy <mir...@ya...> > > Cc : GMOD Schema/Chado List <gmo...@li...> > > Objet : Re: [Gmod-schema] [ISSUE] Load gff in chado > > > > > > > > Hi Miharimamy, > > > > > > > > Did you try running with the --recreate_cache option as suggested by the > error message? > > > > > > > > On Mon, Jan 20, 2020 at 6:54 AM RAJAONISON Andriamiharimamy < > mir...@ya...> wrote: > > > > Hi Scott, > > > > You were right, the issue lied in the line 2. The field separator was 4 > spaces instead of tabulation. > > > > I converted it and tried to load the file back but went back to the > first error. > > > > > > > > Thank you for your time, > > > > Miharimamy > > > > > > > > Preparing data for inserting into the chado database > > > > (This may take a while ...) > > > > Unable to find srcfeature NZ_CP027391.1 in the database. > > > > Perhaps you need to rerun your data load with the '--recreate_cache' > option. at /usr/local/share/perl5/Bio/GMOD/DB/Adapter.pm line 4605, <GEN0> > line 5939. > > > > > Bio::GMOD::DB::Adapter::src_second_chance('Bio::GMOD::DB::Adapter=HASH(0x297c120)', > 'Bio::SeqFeature::Annotated=HASH(0x2bcd420)') called at chado-1.31/load/bin/ > gmod_bulk_load_gff3.pl line 851 > > > > > > > > Abnormal termination, trying to clean up... > > > > > > > > Attempting to clean up the loader temp table (so that --recreate_cache > > > > won't be needed)... > > > > Trying to remove the run lock (so that --remove_lock won't be needed)... > > > > Exiting... > > > > > > > > > > > > ##gff-version 3 > > > > NZ_CP027390.1 . chromosome 1 5901472 . . > . ID=NZ_CP027390.1;Name=NZ_CP027390.1 > > > > NZ_CP027390.1 RefSeq gene 931 1197 . + . > ID=gene0;Name=AX062_RS00010;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS00010 > > > > NZ_CP027390.1 RefSeq gene 1191 1577 . - . > ID=gene1;Name=AX062_RS00015;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS00015 > > > > NZ_CP027390.1 RefSeq gene 10194 10736 . + . > ID=gene10;Name=AX062_RS00060;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS00060 > > > > NZ_CP027390.1 RefSeq gene 80547 81860 . + . > ID=gene100;Name=AX062_RS00510;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS00510 > > > > NZ_CP027390.1 RefSeq gene 918692 918997 . - . > ID=gene1000;Name=AX062_RS05010;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS05010 > > > > NZ_CP027390.1 RefSeq gene 919105 919815 . + . > ID=gene1001;Name=AX062_RS05015;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS05015 > > > > NZ_CP027390.1 RefSeq gene 919818 920378 . - . > ID=gene1002;Name=AX062_RS05020;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS05020 > > > > NZ_CP027390.1 RefSeq gene 920413 920754 . - . > ID=gene1003;Name=AX062_RS05025;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS05025 > > > > NZ_CP027390.1 RefSeq gene 920889 921215 . + . > ID=gene1004;Name=AX062_RS05030;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS05030 > > > > NZ_CP027390.1 RefSeq gene 921252 921440 . + . > ID=gene1005;Name=AX062_RS05035;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS05035 > > > > NZ_CP027390.1 RefSeq gene 921421 922635 . + . > ID=gene1006;Name=AX062_RS05040;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS05040 > > > > > > > > > > > > De : Scott Cain <sc...@sc...> > > Envoyé : vendredi 17 janvier 2020 21:49 > > À : RAJAONISON Andriamiharimamy <mir...@ya...> > > Cc : GMOD Schema/Chado List <gmo...@li...> > > Objet : Re: [Gmod-schema] [ISSUE] Load gff in chado > > > > > > > > Hi Miharimamy, > > > > > > > > The end of the error message where it says "line 2" I'm pretty sure > means the problem is in line 2 of your GFF file. Is that the line you > added? What does it look like? > > > > > > > > Scott > > > > > > > > > > > > On Fri, Jan 17, 2020 at 5:57 AM RAJAONISON Andriamiharimamy < > mir...@ya...> wrote: > > > > Hi Scott, > > > > > > > > Thank you again for your help because adding the parent feature removes > the srcfeature related error. > > > > Yet I get another issue that I don’t understand : > > > > > > > > --------------------- WARNING --------------------- > > > > MSG: Can not set Bio::Location::Simple::end() equal to start; start not > set > > > > --------------------------------------------------- > > > > Use of uninitialized value $featuretype in pattern match (m//) at > chado-1.31/load/bin/gmod_bulk_load_gff3.pl line 819, <GEN0> line 2. > > > > Use of uninitialized value $featuretype in pattern match (m//) at > chado-1.31/load/bin/gmod_bulk_load_gff3.pl line 820, <GEN0> line 2. > > > > > > > > ------------- EXCEPTION: Bio::Root::Exception ------------- > > > > MSG: no cvterm for > > > > STACK: Error::throw > > > > STACK: Bio::Root::Root::throw /usr/local/share/perl5/Bio/Root/Root.pm:447 > > > > STACK: Bio::GMOD::DB::Adapter::get_type > /usr/local/share/perl5/Bio/GMOD/DB/Adapter.pm:4629 > > > > STACK: chado-1.31/load/bin/gmod_bulk_load_gff3.pl:848 > > > > ----------------------------------------------------------- > > > > > > > > Abnormal termination, trying to clean up... > > > > > > > > Attempting to clean up the loader temp table (so that --recreate_cache > > > > won't be needed)... > > > > Trying to remove the run lock (so that --remove_lock won't be needed)... > > > > Exiting... > > > > > > > > Thank you > > > > > > > > Miharimamy > > > > > > > > > > > > De : Scott Cain <sc...@sc...> > > Envoyé : jeudi 16 janvier 2020 22:31 > > À : RAJAONISON Andriamiharimamy <mir...@ya...> > > Cc : GMOD Schema/Chado List <gmo...@li...> > > Objet : Re: [Gmod-schema] [ISSUE] Load gff in chado > > > > > > > > Hi Miharimamy, > > > > > > > > Thanks for sending this report. Generally, loading GFF into Chado can > be difficult, as the perl-based loader that you are using can be quite > particular about the format of the GFF and producers of GFF generally > aren't so particular. Since the loader makes the (in my view, correct) > decision to not load anything if it can't load everything in a file, it > quits. So, taking each of the problems you found in order: > > > > > > > > 1. "dbxref=NCBI:" isn't a valid accession (presumably there should be > something after the "NCBI:") and so the loader won't continue. Two options > here are to contact SGD and point out that there is a problem with their > GFF, or delete the offending item and try again, which it looks like you > did for item 2. > > > > > > > > 2. If the item Note is missing from the cvterm table, I think that > probably means that you didn't install the schema using the "make" > procedure that would have installed some necessary items in the cv and > cvterm table. > > > > > > > > 3 and 4: Messages that srcfeatures can't be found mean that the feature > on which the features in your GFF reside hasn't been defined yet (that is, > the thing referred to in column 1 of the GFF doesn't exist). Frequently, > creators of GFF don't define the reference sequence in the GFF for whatever > reason (it's not required by the GFF3 spec, since it might be credibly > defined elsewhere). To define it in the GFF you have, add a line before > anything else that looks something like this: > > > > > > > > NZ_CP027390.1 . chromosome 1 1234566 . . . > ID=NZ_CP027390.1;Name=NZ_CP027390.1 > > > > > > > > Where "NZ_CP027390.1" is the name of the reference sequence, > "chromosome" is the Sequence Ontology term for the type of thing (other > options would include "contig" and "supercontig"), and the "123456" is the > length of the sequence. > > > > > > > > Finally, I would add that using Chado through Tripal frequently makes > life easier, though I think all of these issues would still have been > problems (with the probable exception of item 2--Tripal would have > initialized the "Note" item in cvterm I think). > > > > > > > > Scott > > > > > > > > > > > > On Thu, Jan 16, 2020 at 11:15 AM RAJAONISON Andriamiharimamy via > Gmod-schema <gmo...@li...> wrote: > > > > Hi, > > > > > > > > Hi, > > > > Hope this mail will find you well. Send you my best wishes for this new > year. > > > > I am reaching to you because I have issues loading GFF files into you > Chado. I tried several files but none of them seems to work. > > > > > > > > Thank you in advance for you help > > > > > > > > Miharimamy > > > > > > > > The S.cerevisiae example in gmod.org > > > > Command : > > > > chado-1.31/load/bin/gmod_bulk_load_gff3.pl --organism yeast --gfffile > saccharomyces/saccharomyces_cerevisiae.gff.sorted --fastafile > saccharomyces/saccharomyces_cerevisiae.gff.sorted.fasta > > > > > > > > Output : > > > > ------------- EXCEPTION: Bio::Root::Exception ------------- > > > > MSG: Error in line: > > > > chrIII SGD chromosome 1 316620 . . . > ID=chrIII;dbxref=NCBI:;Name=chrIII > > > > > > > > Dbxref value 'NCBI:' did not conform to GFF3 specification > > > > STACK: Error::throw > > > > STACK: Bio::Root::Root::throw /usr/local/share/perl5/Bio/Root/Root.pm:447 > > > > STACK: Bio::FeatureIO::gff::_handle_feature > /usr/local/share/perl5/Bio/FeatureIO/gff.pm:659 > > > > STACK: Bio::FeatureIO::gff::next_feature > /usr/local/share/perl5/Bio/FeatureIO/gff.pm:187 > > > > STACK: chado-1.31/load/bin/gmod_bulk_load_gff3.pl:782 > > > > ----------------------------------------------------------- > > > > S.cerevisiae without dbxref > > > > Command : > > > > chado-1.31/load/bin/gmod_bulk_load_gff3.pl --organism yeast --gfffile > saccharomyces/test_saccharomyces_ncbi --fastafile > saccharomyces/saccharomyces_cerevisiae.gff.sorted.fasta > > > > > > > > Output : > > > > ------------- EXCEPTION: Bio::Root::Exception ------------- > > > > MSG: I couldn't find the 'Note' cvterm in the database; > > > > Did you load the feature property controlled vocabulary? > > > > STACK: Error::throw > > > > STACK: Bio::Root::Root::throw /usr/local/share/perl5/Bio/Root/Root.pm:447 > > > > STACK: Bio::GMOD::DB::Adapter::handle_note > /usr/local/share/perl5/Bio/GMOD/DB/Adapter.pm:3901 > > > > STACK: chado-1.31/load/bin/gmod_bulk_load_gff3.pl:974 > > > > > > > > A GFF from NCBI > > > > Command : > > > > chado-1.31/load/bin/gmod_bulk_load_gff3.pl --organism bacteria > --gfffile GCF_000008865.2/GCF_003018035.1_ASM301803v1_genomic.gff.sorted > > > > > > > > Output : > > > > Unable to find srcfeature NZ_CP027390.1 in the database. > > > > Perhaps you need to rerun your data load with the '--recreate_cache' > option. at /usr/local/share/perl5/Bio/GMOD/DB/Adapter.pm line 4605, <GEN0> > line 2. > > > > > Bio::GMOD::DB::Adapter::src_second_chance('Bio::GMOD::DB::Adapter=HASH(0x241a8d0)', > 'Bio::SeqFeature::Annotated=HASH(0x26c61d0)') called at chado-1.31/load/bin/ > gmod_bulk_load_gff3.pl line 851 > > > > > > > > A GFF3 from prokka > > > > Command : > > > > chado-1.31/load/bin/gmod_bulk_load_gff3.pl --organism bacteria -gfffile > GFF3/GCF_000455285.gff.sorted --fastafile > GFF3/GCF_000455285.gff.sorted.fasta > > > > Output : > > > > Unable to find srcfeature gnl|Prokka|GCF_000455285_1 in the database. > > > > Perhaps you need to rerun your data load with the '--recreate_cache' > option. at /usr/local/share/perl5/Bio/GMOD/DB/Adapter.pm line 4605, <GEN0> > line 2. > > > > > Bio::GMOD::DB::Adapter::src_second_chance('Bio::GMOD::DB::Adapter=HASH(0x33008f0)', > 'Bio::SeqFeature::Annotated=HASH(0x35ac0a0)') called at chado-1.31/load/bin/ > gmod_bulk_load_gff3.pl line 851 > > > > > > > > > > > > > > > > > > > > _______________________________________________ > > Gmod-schema mailing list > > Gmo...@li... > > https://lists.sourceforge.net/lists/listinfo/gmod-schema > > > > > > > > -- > > > > ------------------------------------------------------------------------ > > Scott Cain, Ph. D. scott at scottcain > dot net > > GMOD Coordinator (http://gmod.org/) 216-392-3087 > > Ontario Institute for Cancer Research > > > > > > > > -- > > > > ------------------------------------------------------------------------ > > Scott Cain, Ph. D. scott at scottcain > dot net > > GMOD Coordinator (http://gmod.org/) 216-392-3087 > > Ontario Institute for Cancer Research > > > > > > > > -- > > > > ------------------------------------------------------------------------ > > Scott Cain, Ph. D. scott at scottcain > dot net > > GMOD Coordinator (http://gmod.org/) 216-392-3087 > > Ontario Institute for Cancer Research > > > > -- > ------------------------------------------------------------------------ > Scott Cain, Ph. D. scott at scottcain > dot net > GMOD Coordinator (http://gmod.org/) 216-392-3087 > Ontario Institute for Cancer Research > -- ------------------------------------------------------------------------ Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research |
From: RAJAONISON A. <mir...@ya...> - 2020-01-20 20:45:53
|
Thank you, it worked but I run into the missing “'Note' cvterm”. ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: I couldn't find the 'Note' cvterm in the database; Did you load the feature property controlled vocabulary? STACK: Error::throw STACK: Bio::Root::Root::throw /usr/local/share/perl5/Bio/Root/Root.pm:447 STACK: Bio::GMOD::DB::Adapter::handle_note /usr/local/share/perl5/Bio/GMOD/DB/Adapter.pm:3901 STACK: chado-1.31/load/bin/gmod_bulk_load_gff3.pl:974 ----------------------------------------------------------- I used “make” to create database and “make ontologies” to load ontologies. I installed : Relationship Ontology, Sequence Ontology, Gene Ontology and Chado Feature Properties A select in the cv table shows : 1 "null" 2 "local" "Locally created terms" 3 "chado_properties" "Terms that are used in the chadoprop table to describe the state of the database" 4 "relationship" 5 "synonym_type" 6 "cvterm_property_type" 7 "anonymous" 8 "sequence" 9 "biological_process" 10 "molecular_function" 11 "cellular_component" 12 "external" Thank you Miharimamy De : Scott Cain <sc...@sc...> Envoyé : lundi 20 janvier 2020 22:39 À : RAJAONISON Andriamiharimamy <mir...@ya...> Cc : GMOD Schema/Chado List <gmo...@li...> Objet : Re: [Gmod-schema] [ISSUE] Load gff in chado Is_circular isn't a valid GFF tag. You can change it to is_circular to fix it. On Mon, Jan 20, 2020 at 11:24 AM RAJAONISON Andriamiharimamy <mir...@ya... <mailto:mir...@ya...> > wrote: Hi again Scott, Here is the modified header of the file : ##gff-version 3 NZ_CP027390.1 . chromosome 1 5901472 . . . ID=NZ_CP027390.1;Name=NZ_CP027390.1 NZ_CP027391.1 . plasmid 1 98724 . . . ID=NZ_CP027391.1;Name=NZ_CP027391.1 NZ_CP027390.1 RefSeq gene 931 1197 . + . ID=gene0;Name=AX062_RS00010;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS00010 NZ_CP027390.1 RefSeq gene 1191 1577 . - . ID=gene1;Name=AX062_RS00015;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS00015 And here is the output of the script. ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: The following tag(s) are illegal and are causing this parser to die: Is_circular STACK: Error::throw STACK: Bio::Root::Root::throw /usr/local/share/perl5/Bio/Root/Root.pm:447 STACK: Bio::FeatureIO::gff::_handle_feature /usr/local/share/perl5/Bio/FeatureIO/gff.pm:787 <http://gff.pm:787> STACK: Bio::FeatureIO::gff::next_feature /usr/local/share/perl5/Bio/FeatureIO/gff.pm:187 <http://gff.pm:187> STACK: chado-1.31/load/bin/gmod_bulk_load_gff3.pl:782 <http://gmod_bulk_load_gff3.pl:782> ----------------------------------------------------------- I checked the tag in the file and here is the snippet causing issue: NZ_CP027390.1 RefSeq region 1 5802748 . + . ID=id0;Dbxref=taxon:562;Is_circular=true;Name=ANONYMOUS;collected-by=CDC;country=USA;gbkey=Src;genome=chromosome;mol_type=genomic DNA;serov ar=E. coli O26:Pending;strain=2015C-4944 NZ_CP027390.1 cmsearch riboswitch 4764335 4764483 . + . ID=id113;Dbxref=RFAM:RF00050;Note=FMN riboswitch;bound_moiety=flavin mononucleotide;gbkey=regulatory;inference=COORDINATES: profile:INFERNAL:1.1.1;regulatory_class=riboswitch NZ_CP027390.1 RefSeq direct_repeat 5085858 5086435 . + . ID=id120;gbkey=repeat_region;inference=COORDINATES: alignment:pilercr:v1.02;rpt_family=CRISPR;rpt_type=direct;rpt_unit_range=508586 2..5085885;rpt_unit_seq=tccccgcgccagcggggataaacc NZ_CP027390.1 RefSeq direct_repeat 5112137 5112653 . + . ID=id121;gbkey=repeat_region;inference=COORDINATES: alignment:pilercr:v1.02;rpt_family=CRISPR;rpt_type=direct;rpt_unit_range=511213 7..5112165;rpt_unit_seq=gtgttccccgcgccagcggggataaaccg NZ_CP027391.1 RefSeq region 1 98724 . + . ID=id147;Dbxref=taxon:562;Is_circular=true;Name=unnamed;collected-by=CDC;country=USA;gbkey=Src;genome=plasmid;mol_type=genomic DNA;plasmid- name=unnamed;serovar=E. coli O26:Pending;strain=2015C-4944 Thank you Miharimamy De : Scott Cain <sc...@sc... <mailto:sc...@sc...> > Envoyé : lundi 20 janvier 2020 22:03 À : RAJAONISON Andriamiharimamy <mir...@ya... <mailto:mir...@ya...> > Cc : GMOD Schema/Chado List <gmo...@li... <mailto:gmo...@li...> > Objet : Re: [Gmod-schema] [ISSUE] Load gff in chado Hi Miharimamy, The snippet of GFF that you posted had a chromosome named "NZ_CP027390.1" which looks fine to me, but the error message is about a reference sequence named "NZ_CP027391.1". Is it defined anywhere in your GFF? Scott On Mon, Jan 20, 2020 at 10:58 AM RAJAONISON Andriamiharimamy <mir...@ya... <mailto:mir...@ya...> > wrote: > > Hi Scott, > > > > Even with the --recreate_cache and --remove_lock, I still get the same error. > > > > Miharimamy > > > > > > (Re)creating the uniquename cache in the database... > > Creating table... > > Populating table... > > Creating indexes... > > Adjusting the primary key sequences (if necessary)...Done. > > Preparing data for inserting into the chado database > > (This may take a while ...) > > Unable to find srcfeature NZ_CP027391.1 in the database. > > Perhaps you need to rerun your data load with the '--recreate_cache' option. at /usr/local/share/perl5/Bio/GMOD/DB/Adapter.pm line 4605, <GEN0> line 5939. > > Bio::GMOD::DB::Adapter::src_second_chance('Bio::GMOD::DB::Adapter=HASH(0x25623e0)', 'Bio::SeqFeature::Annotated=HASH(0x2680c18)') called at chado-1.31/load/bin/gmod_bulk_load_gff3.pl <http://gmod_bulk_load_gff3.pl> line 851 > > > > Abnormal termination, trying to clean up... > > > > Attempting to clean up the loader temp table (so that --recreate_cache > > won't be needed)... > > Trying to remove the run lock (so that --remove_lock won't be needed)... > > Exiting... > > > > > > > > De : Scott Cain <sc...@sc... <mailto:sc...@sc...> > > Envoyé : lundi 20 janvier 2020 20:56 > À : RAJAONISON Andriamiharimamy <mir...@ya... <mailto:mir...@ya...> > > Cc : GMOD Schema/Chado List <gmo...@li... <mailto:gmo...@li...> > > Objet : Re: [Gmod-schema] [ISSUE] Load gff in chado > > > > Hi Miharimamy, > > > > Did you try running with the --recreate_cache option as suggested by the error message? > > > > On Mon, Jan 20, 2020 at 6:54 AM RAJAONISON Andriamiharimamy <mir...@ya... <mailto:mir...@ya...> > wrote: > > Hi Scott, > > You were right, the issue lied in the line 2. The field separator was 4 spaces instead of tabulation. > > I converted it and tried to load the file back but went back to the first error. > > > > Thank you for your time, > > Miharimamy > > > > Preparing data for inserting into the chado database > > (This may take a while ...) > > Unable to find srcfeature NZ_CP027391.1 in the database. > > Perhaps you need to rerun your data load with the '--recreate_cache' option. at /usr/local/share/perl5/Bio/GMOD/DB/Adapter.pm line 4605, <GEN0> line 5939. > > Bio::GMOD::DB::Adapter::src_second_chance('Bio::GMOD::DB::Adapter=HASH(0x297c120)', 'Bio::SeqFeature::Annotated=HASH(0x2bcd420)') called at chado-1.31/load/bin/gmod_bulk_load_gff3.pl <http://gmod_bulk_load_gff3.pl> line 851 > > > > Abnormal termination, trying to clean up... > > > > Attempting to clean up the loader temp table (so that --recreate_cache > > won't be needed)... > > Trying to remove the run lock (so that --remove_lock won't be needed)... > > Exiting... > > > > > > ##gff-version 3 > > NZ_CP027390.1 . chromosome 1 5901472 . . . ID=NZ_CP027390.1;Name=NZ_CP027390.1 > > NZ_CP027390.1 RefSeq gene 931 1197 . + . ID=gene0;Name=AX062_RS00010;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS00010 > > NZ_CP027390.1 RefSeq gene 1191 1577 . - . ID=gene1;Name=AX062_RS00015;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS00015 > > NZ_CP027390.1 RefSeq gene 10194 10736 . + . ID=gene10;Name=AX062_RS00060;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS00060 > > NZ_CP027390.1 RefSeq gene 80547 81860 . + . ID=gene100;Name=AX062_RS00510;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS00510 > > NZ_CP027390.1 RefSeq gene 918692 918997 . - . ID=gene1000;Name=AX062_RS05010;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS05010 > > NZ_CP027390.1 RefSeq gene 919105 919815 . + . ID=gene1001;Name=AX062_RS05015;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS05015 > > NZ_CP027390.1 RefSeq gene 919818 920378 . - . ID=gene1002;Name=AX062_RS05020;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS05020 > > NZ_CP027390.1 RefSeq gene 920413 920754 . - . ID=gene1003;Name=AX062_RS05025;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS05025 > > NZ_CP027390.1 RefSeq gene 920889 921215 . + . ID=gene1004;Name=AX062_RS05030;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS05030 > > NZ_CP027390.1 RefSeq gene 921252 921440 . + . ID=gene1005;Name=AX062_RS05035;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS05035 > > NZ_CP027390.1 RefSeq gene 921421 922635 . + . ID=gene1006;Name=AX062_RS05040;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS05040 > > > > > > De : Scott Cain <sc...@sc... <mailto:sc...@sc...> > > Envoyé : vendredi 17 janvier 2020 21:49 > À : RAJAONISON Andriamiharimamy <mir...@ya... <mailto:mir...@ya...> > > Cc : GMOD Schema/Chado List <gmo...@li... <mailto:gmo...@li...> > > Objet : Re: [Gmod-schema] [ISSUE] Load gff in chado > > > > Hi Miharimamy, > > > > The end of the error message where it says "line 2" I'm pretty sure means the problem is in line 2 of your GFF file. Is that the line you added? What does it look like? > > > > Scott > > > > > > On Fri, Jan 17, 2020 at 5:57 AM RAJAONISON Andriamiharimamy <mir...@ya... <mailto:mir...@ya...> > wrote: > > Hi Scott, > > > > Thank you again for your help because adding the parent feature removes the srcfeature related error. > > Yet I get another issue that I don’t understand : > > > > --------------------- WARNING --------------------- > > MSG: Can not set Bio::Location::Simple::end() equal to start; start not set > > --------------------------------------------------- > > Use of uninitialized value $featuretype in pattern match (m//) at chado-1.31/load/bin/gmod_bulk_load_gff3.pl <http://gmod_bulk_load_gff3.pl> line 819, <GEN0> line 2. > > Use of uninitialized value $featuretype in pattern match (m//) at chado-1.31/load/bin/gmod_bulk_load_gff3.pl <http://gmod_bulk_load_gff3.pl> line 820, <GEN0> line 2. > > > > ------------- EXCEPTION: Bio::Root::Exception ------------- > > MSG: no cvterm for > > STACK: Error::throw > > STACK: Bio::Root::Root::throw /usr/local/share/perl5/Bio/Root/Root.pm:447 > > STACK: Bio::GMOD::DB::Adapter::get_type /usr/local/share/perl5/Bio/GMOD/DB/Adapter.pm:4629 > > STACK: chado-1.31/load/bin/gmod_bulk_load_gff3.pl:848 <http://gmod_bulk_load_gff3.pl:848> > > ----------------------------------------------------------- > > > > Abnormal termination, trying to clean up... > > > > Attempting to clean up the loader temp table (so that --recreate_cache > > won't be needed)... > > Trying to remove the run lock (so that --remove_lock won't be needed)... > > Exiting... > > > > Thank you > > > > Miharimamy > > > > > > De : Scott Cain <sc...@sc... <mailto:sc...@sc...> > > Envoyé : jeudi 16 janvier 2020 22:31 > À : RAJAONISON Andriamiharimamy <mir...@ya... <mailto:mir...@ya...> > > Cc : GMOD Schema/Chado List <gmo...@li... <mailto:gmo...@li...> > > Objet : Re: [Gmod-schema] [ISSUE] Load gff in chado > > > > Hi Miharimamy, > > > > Thanks for sending this report. Generally, loading GFF into Chado can be difficult, as the perl-based loader that you are using can be quite particular about the format of the GFF and producers of GFF generally aren't so particular. Since the loader makes the (in my view, correct) decision to not load anything if it can't load everything in a file, it quits. So, taking each of the problems you found in order: > > > > 1. "dbxref=NCBI:" isn't a valid accession (presumably there should be something after the "NCBI:") and so the loader won't continue. Two options here are to contact SGD and point out that there is a problem with their GFF, or delete the offending item and try again, which it looks like you did for item 2. > > > > 2. If the item Note is missing from the cvterm table, I think that probably means that you didn't install the schema using the "make" procedure that would have installed some necessary items in the cv and cvterm table. > > > > 3 and 4: Messages that srcfeatures can't be found mean that the feature on which the features in your GFF reside hasn't been defined yet (that is, the thing referred to in column 1 of the GFF doesn't exist). Frequently, creators of GFF don't define the reference sequence in the GFF for whatever reason (it's not required by the GFF3 spec, since it might be credibly defined elsewhere). To define it in the GFF you have, add a line before anything else that looks something like this: > > > > NZ_CP027390.1 . chromosome 1 1234566 . . . ID=NZ_CP027390.1;Name=NZ_CP027390.1 > > > > Where "NZ_CP027390.1" is the name of the reference sequence, "chromosome" is the Sequence Ontology term for the type of thing (other options would include "contig" and "supercontig"), and the "123456" is the length of the sequence. > > > > Finally, I would add that using Chado through Tripal frequently makes life easier, though I think all of these issues would still have been problems (with the probable exception of item 2--Tripal would have initialized the "Note" item in cvterm I think). > > > > Scott > > > > > > On Thu, Jan 16, 2020 at 11:15 AM RAJAONISON Andriamiharimamy via Gmod-schema <gmo...@li... <mailto:gmo...@li...> > wrote: > > Hi, > > > > Hi, > > Hope this mail will find you well. Send you my best wishes for this new year. > > I am reaching to you because I have issues loading GFF files into you Chado. I tried several files but none of them seems to work. > > > > Thank you in advance for you help > > > > Miharimamy > > > > The S.cerevisiae example in gmod.org <http://gmod.org> > > Command : > > chado-1.31/load/bin/gmod_bulk_load_gff3.pl <http://gmod_bulk_load_gff3.pl> --organism yeast --gfffile saccharomyces/saccharomyces_cerevisiae.gff.sorted --fastafile saccharomyces/saccharomyces_cerevisiae.gff.sorted.fasta > > > > Output : > > ------------- EXCEPTION: Bio::Root::Exception ------------- > > MSG: Error in line: > > chrIII SGD chromosome 1 316620 . . . ID=chrIII;dbxref=NCBI:;Name=chrIII > > > > Dbxref value 'NCBI:' did not conform to GFF3 specification > > STACK: Error::throw > > STACK: Bio::Root::Root::throw /usr/local/share/perl5/Bio/Root/Root.pm:447 > > STACK: Bio::FeatureIO::gff::_handle_feature /usr/local/share/perl5/Bio/FeatureIO/gff.pm:659 <http://gff.pm:659> > > STACK: Bio::FeatureIO::gff::next_feature /usr/local/share/perl5/Bio/FeatureIO/gff.pm:187 <http://gff.pm:187> > > STACK: chado-1.31/load/bin/gmod_bulk_load_gff3.pl:782 <http://gmod_bulk_load_gff3.pl:782> > > ----------------------------------------------------------- > > S.cerevisiae without dbxref > > Command : > > chado-1.31/load/bin/gmod_bulk_load_gff3.pl <http://gmod_bulk_load_gff3.pl> --organism yeast --gfffile saccharomyces/test_saccharomyces_ncbi --fastafile saccharomyces/saccharomyces_cerevisiae.gff.sorted.fasta > > > > Output : > > ------------- EXCEPTION: Bio::Root::Exception ------------- > > MSG: I couldn't find the 'Note' cvterm in the database; > > Did you load the feature property controlled vocabulary? > > STACK: Error::throw > > STACK: Bio::Root::Root::throw /usr/local/share/perl5/Bio/Root/Root.pm:447 > > STACK: Bio::GMOD::DB::Adapter::handle_note /usr/local/share/perl5/Bio/GMOD/DB/Adapter.pm:3901 > > STACK: chado-1.31/load/bin/gmod_bulk_load_gff3.pl:974 <http://gmod_bulk_load_gff3.pl:974> > > > > A GFF from NCBI > > Command : > > chado-1.31/load/bin/gmod_bulk_load_gff3.pl <http://gmod_bulk_load_gff3.pl> --organism bacteria --gfffile GCF_000008865.2/GCF_003018035.1_ASM301803v1_genomic.gff.sorted > > > > Output : > > Unable to find srcfeature NZ_CP027390.1 in the database. > > Perhaps you need to rerun your data load with the '--recreate_cache' option. at /usr/local/share/perl5/Bio/GMOD/DB/Adapter.pm line 4605, <GEN0> line 2. > > Bio::GMOD::DB::Adapter::src_second_chance('Bio::GMOD::DB::Adapter=HASH(0x241a8d0)', 'Bio::SeqFeature::Annotated=HASH(0x26c61d0)') called at chado-1.31/load/bin/gmod_bulk_load_gff3.pl <http://gmod_bulk_load_gff3.pl> line 851 > > > > A GFF3 from prokka > > Command : > > chado-1.31/load/bin/gmod_bulk_load_gff3.pl <http://gmod_bulk_load_gff3.pl> --organism bacteria -gfffile GFF3/GCF_000455285.gff.sorted --fastafile GFF3/GCF_000455285.gff.sorted.fasta > > Output : > > Unable to find srcfeature gnl|Prokka|GCF_000455285_1 in the database. > > Perhaps you need to rerun your data load with the '--recreate_cache' option. at /usr/local/share/perl5/Bio/GMOD/DB/Adapter.pm line 4605, <GEN0> line 2. > > Bio::GMOD::DB::Adapter::src_second_chance('Bio::GMOD::DB::Adapter=HASH(0x33008f0)', 'Bio::SeqFeature::Annotated=HASH(0x35ac0a0)') called at chado-1.31/load/bin/gmod_bulk_load_gff3.pl <http://gmod_bulk_load_gff3.pl> line 851 > > > > > > > > > > _______________________________________________ > Gmod-schema mailing list > Gmo...@li... <mailto:Gmo...@li...> > https://lists.sourceforge.net/lists/listinfo/gmod-schema > > > > -- > > ------------------------------------------------------------------------ > Scott Cain, Ph. D. scott at scottcain dot net > GMOD Coordinator (http://gmod.org/) 216-392-3087 > Ontario Institute for Cancer Research > > > > -- > > ------------------------------------------------------------------------ > Scott Cain, Ph. D. scott at scottcain dot net > GMOD Coordinator (http://gmod.org/) 216-392-3087 > Ontario Institute for Cancer Research > > > > -- > > ------------------------------------------------------------------------ > Scott Cain, Ph. D. scott at scottcain dot net > GMOD Coordinator (http://gmod.org/) 216-392-3087 > Ontario Institute for Cancer Research -- ------------------------------------------------------------------------ Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research -- ------------------------------------------------------------------------ Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research |
From: Scott C. <sc...@sc...> - 2020-01-20 20:47:49
|
Can you also do a "select * from cvterm where cv_id=3" and show us that? On Mon, Jan 20, 2020 at 12:45 PM RAJAONISON Andriamiharimamy < mir...@ya...> wrote: > Thank you, it worked but I run into the missing “'Note' cvterm”. > > > > ------------- EXCEPTION: Bio::Root::Exception ------------- > > MSG: I couldn't find the 'Note' cvterm in the database; > > Did you load the feature property controlled vocabulary? > > STACK: Error::throw > > STACK: Bio::Root::Root::throw /usr/local/share/perl5/Bio/Root/Root.pm:447 > > STACK: Bio::GMOD::DB::Adapter::handle_note > /usr/local/share/perl5/Bio/GMOD/DB/Adapter.pm:3901 > > STACK: chado-1.31/load/bin/gmod_bulk_load_gff3.pl:974 > > ----------------------------------------------------------- > > > > I used “make” to create database and “make ontologies” to load > ontologies. > > I installed : Relationship Ontology, Sequence Ontology, Gene Ontology and > Chado Feature Properties > > A select in the cv table shows : > > 1 "null" > > 2 "local" "Locally created terms" > > 3 "chado_properties" "Terms that are used in the > chadoprop table to describe the state of the database" > > 4 "relationship" > > 5 "synonym_type" > > 6 "cvterm_property_type" > > 7 "anonymous" > > 8 "sequence" > > 9 "biological_process" > > 10 "molecular_function" > > 11 "cellular_component" > > 12 "external" > > > > Thank you > > Miharimamy > > > > *De :* Scott Cain <sc...@sc...> > *Envoyé :* lundi 20 janvier 2020 22:39 > *À :* RAJAONISON Andriamiharimamy <mir...@ya...> > *Cc :* GMOD Schema/Chado List <gmo...@li...> > *Objet :* Re: [Gmod-schema] [ISSUE] Load gff in chado > > > > Is_circular isn't a valid GFF tag. You can change it to is_circular to > fix it. > > > > On Mon, Jan 20, 2020 at 11:24 AM RAJAONISON Andriamiharimamy < > mir...@ya...> wrote: > > Hi again Scott, > > > > Here is the modified header of the file : > > > > ##gff-version 3 > > NZ_CP027390.1 . chromosome 1 5901472 . . > . ID=NZ_CP027390.1;Name=NZ_CP027390.1 > > NZ_CP027391.1 . plasmid 1 98724 . . . > ID=NZ_CP027391.1;Name=NZ_CP027391.1 > > NZ_CP027390.1 RefSeq gene 931 1197 . + . > ID=gene0;Name=AX062_RS00010;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS00010 > > NZ_CP027390.1 RefSeq gene 1191 1577 . - . > ID=gene1;Name=AX062_RS00015;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS00015 > > > > And here is the output of the script. > > > > ------------- EXCEPTION: Bio::Root::Exception ------------- > > MSG: The following tag(s) are illegal and are causing this parser to die: > Is_circular > > STACK: Error::throw > > STACK: Bio::Root::Root::throw /usr/local/share/perl5/Bio/Root/Root.pm:447 > > STACK: Bio::FeatureIO::gff::_handle_feature > /usr/local/share/perl5/Bio/FeatureIO/gff.pm:787 > > STACK: Bio::FeatureIO::gff::next_feature > /usr/local/share/perl5/Bio/FeatureIO/gff.pm:187 > > STACK: chado-1.31/load/bin/gmod_bulk_load_gff3.pl:782 > > ----------------------------------------------------------- > > > > I checked the tag in the file and here is the snippet causing issue: > > > > NZ_CP027390.1 RefSeq region 1 5802748 . + . > ID=id0;Dbxref=taxon:562;Is_circular=true;Name=ANONYMOUS;collected-by=CDC;country=USA;gbkey=Src;genome=chromosome;mol_type=genomic > DNA;serov > > ar=E. coli O26:Pending;strain=2015C-4944 > > NZ_CP027390.1 cmsearch riboswitch 4764335 4764483 . > + . ID=id113;Dbxref=RFAM:RF00050;Note=FMN > riboswitch;bound_moiety=flavin > mononucleotide;gbkey=regulatory;inference=COORDINATES: > > profile:INFERNAL:1.1.1;regulatory_class=riboswitch > > NZ_CP027390.1 RefSeq direct_repeat 5085858 5086435 . + > . ID=id120;gbkey=repeat_region;inference=COORDINATES: > alignment:pilercr:v1.02;rpt_family=CRISPR;rpt_type=direct;rpt_unit_range=508586 > > 2..5085885;rpt_unit_seq=tccccgcgccagcggggataaacc > > NZ_CP027390.1 RefSeq direct_repeat 5112137 5112653 . + > . ID=id121;gbkey=repeat_region;inference=COORDINATES: > alignment:pilercr:v1.02;rpt_family=CRISPR;rpt_type=direct;rpt_unit_range=511213 > > 7..5112165;rpt_unit_seq=gtgttccccgcgccagcggggataaaccg > > NZ_CP027391.1 RefSeq region 1 98724 . + . > ID=id147;Dbxref=taxon:562;Is_circular=true;Name=unnamed;collected-by=CDC;country=USA;gbkey=Src;genome=plasmid;mol_type=genomic > DNA;plasmid- > > name=unnamed;serovar=E. coli O26:Pending;strain=2015C-4944 > > > > Thank you > > Miharimamy > > > > *De :* Scott Cain <sc...@sc...> > *Envoyé :* lundi 20 janvier 2020 22:03 > *À :* RAJAONISON Andriamiharimamy <mir...@ya...> > *Cc :* GMOD Schema/Chado List <gmo...@li...> > *Objet :* Re: [Gmod-schema] [ISSUE] Load gff in chado > > > > Hi Miharimamy, > > The snippet of GFF that you posted had a chromosome named "NZ_CP027390.1" > which looks fine to me, but the error message is about a reference sequence > named "NZ_CP027391.1". Is it defined anywhere in your GFF? > > Scott > > > On Mon, Jan 20, 2020 at 10:58 AM RAJAONISON Andriamiharimamy < > mir...@ya...> wrote: > > > > Hi Scott, > > > > > > > > Even with the --recreate_cache and --remove_lock, I still get the same > error. > > > > > > > > Miharimamy > > > > > > > > > > > > (Re)creating the uniquename cache in the database... > > > > Creating table... > > > > Populating table... > > > > Creating indexes... > > > > Adjusting the primary key sequences (if necessary)...Done. > > > > Preparing data for inserting into the chado database > > > > (This may take a while ...) > > > > Unable to find srcfeature NZ_CP027391.1 in the database. > > > > Perhaps you need to rerun your data load with the '--recreate_cache' > option. at /usr/local/share/perl5/Bio/GMOD/DB/Adapter.pm line 4605, <GEN0> > line 5939. > > > > > Bio::GMOD::DB::Adapter::src_second_chance('Bio::GMOD::DB::Adapter=HASH(0x25623e0)', > 'Bio::SeqFeature::Annotated=HASH(0x2680c18)') called at chado-1.31/load/bin/ > gmod_bulk_load_gff3.pl line 851 > > > > > > > > Abnormal termination, trying to clean up... > > > > > > > > Attempting to clean up the loader temp table (so that --recreate_cache > > > > won't be needed)... > > > > Trying to remove the run lock (so that --remove_lock won't be needed)... > > > > Exiting... > > > > > > > > > > > > > > > > De : Scott Cain <sc...@sc...> > > Envoyé : lundi 20 janvier 2020 20:56 > > À : RAJAONISON Andriamiharimamy <mir...@ya...> > > Cc : GMOD Schema/Chado List <gmo...@li...> > > Objet : Re: [Gmod-schema] [ISSUE] Load gff in chado > > > > > > > > Hi Miharimamy, > > > > > > > > Did you try running with the --recreate_cache option as suggested by the > error message? > > > > > > > > On Mon, Jan 20, 2020 at 6:54 AM RAJAONISON Andriamiharimamy < > mir...@ya...> wrote: > > > > Hi Scott, > > > > You were right, the issue lied in the line 2. The field separator was 4 > spaces instead of tabulation. > > > > I converted it and tried to load the file back but went back to the > first error. > > > > > > > > Thank you for your time, > > > > Miharimamy > > > > > > > > Preparing data for inserting into the chado database > > > > (This may take a while ...) > > > > Unable to find srcfeature NZ_CP027391.1 in the database. > > > > Perhaps you need to rerun your data load with the '--recreate_cache' > option. at /usr/local/share/perl5/Bio/GMOD/DB/Adapter.pm line 4605, <GEN0> > line 5939. > > > > > Bio::GMOD::DB::Adapter::src_second_chance('Bio::GMOD::DB::Adapter=HASH(0x297c120)', > 'Bio::SeqFeature::Annotated=HASH(0x2bcd420)') called at chado-1.31/load/bin/ > gmod_bulk_load_gff3.pl line 851 > > > > > > > > Abnormal termination, trying to clean up... > > > > > > > > Attempting to clean up the loader temp table (so that --recreate_cache > > > > won't be needed)... > > > > Trying to remove the run lock (so that --remove_lock won't be needed)... > > > > Exiting... > > > > > > > > > > > > ##gff-version 3 > > > > NZ_CP027390.1 . chromosome 1 5901472 . . > . ID=NZ_CP027390.1;Name=NZ_CP027390.1 > > > > NZ_CP027390.1 RefSeq gene 931 1197 . + . > ID=gene0;Name=AX062_RS00010;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS00010 > > > > NZ_CP027390.1 RefSeq gene 1191 1577 . - . > ID=gene1;Name=AX062_RS00015;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS00015 > > > > NZ_CP027390.1 RefSeq gene 10194 10736 . + . > ID=gene10;Name=AX062_RS00060;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS00060 > > > > NZ_CP027390.1 RefSeq gene 80547 81860 . + . > ID=gene100;Name=AX062_RS00510;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS00510 > > > > NZ_CP027390.1 RefSeq gene 918692 918997 . - . > ID=gene1000;Name=AX062_RS05010;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS05010 > > > > NZ_CP027390.1 RefSeq gene 919105 919815 . + . > ID=gene1001;Name=AX062_RS05015;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS05015 > > > > NZ_CP027390.1 RefSeq gene 919818 920378 . - . > ID=gene1002;Name=AX062_RS05020;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS05020 > > > > NZ_CP027390.1 RefSeq gene 920413 920754 . - . > ID=gene1003;Name=AX062_RS05025;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS05025 > > > > NZ_CP027390.1 RefSeq gene 920889 921215 . + . > ID=gene1004;Name=AX062_RS05030;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS05030 > > > > NZ_CP027390.1 RefSeq gene 921252 921440 . + . > ID=gene1005;Name=AX062_RS05035;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS05035 > > > > NZ_CP027390.1 RefSeq gene 921421 922635 . + . > ID=gene1006;Name=AX062_RS05040;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS05040 > > > > > > > > > > > > De : Scott Cain <sc...@sc...> > > Envoyé : vendredi 17 janvier 2020 21:49 > > À : RAJAONISON Andriamiharimamy <mir...@ya...> > > Cc : GMOD Schema/Chado List <gmo...@li...> > > Objet : Re: [Gmod-schema] [ISSUE] Load gff in chado > > > > > > > > Hi Miharimamy, > > > > > > > > The end of the error message where it says "line 2" I'm pretty sure > means the problem is in line 2 of your GFF file. Is that the line you > added? What does it look like? > > > > > > > > Scott > > > > > > > > > > > > On Fri, Jan 17, 2020 at 5:57 AM RAJAONISON Andriamiharimamy < > mir...@ya...> wrote: > > > > Hi Scott, > > > > > > > > Thank you again for your help because adding the parent feature removes > the srcfeature related error. > > > > Yet I get another issue that I don’t understand : > > > > > > > > --------------------- WARNING --------------------- > > > > MSG: Can not set Bio::Location::Simple::end() equal to start; start not > set > > > > --------------------------------------------------- > > > > Use of uninitialized value $featuretype in pattern match (m//) at > chado-1.31/load/bin/gmod_bulk_load_gff3.pl line 819, <GEN0> line 2. > > > > Use of uninitialized value $featuretype in pattern match (m//) at > chado-1.31/load/bin/gmod_bulk_load_gff3.pl line 820, <GEN0> line 2. > > > > > > > > ------------- EXCEPTION: Bio::Root::Exception ------------- > > > > MSG: no cvterm for > > > > STACK: Error::throw > > > > STACK: Bio::Root::Root::throw /usr/local/share/perl5/Bio/Root/Root.pm:447 > > > > STACK: Bio::GMOD::DB::Adapter::get_type > /usr/local/share/perl5/Bio/GMOD/DB/Adapter.pm:4629 > > > > STACK: chado-1.31/load/bin/gmod_bulk_load_gff3.pl:848 > > > > ----------------------------------------------------------- > > > > > > > > Abnormal termination, trying to clean up... > > > > > > > > Attempting to clean up the loader temp table (so that --recreate_cache > > > > won't be needed)... > > > > Trying to remove the run lock (so that --remove_lock won't be needed)... > > > > Exiting... > > > > > > > > Thank you > > > > > > > > Miharimamy > > > > > > > > > > > > De : Scott Cain <sc...@sc...> > > Envoyé : jeudi 16 janvier 2020 22:31 > > À : RAJAONISON Andriamiharimamy <mir...@ya...> > > Cc : GMOD Schema/Chado List <gmo...@li...> > > Objet : Re: [Gmod-schema] [ISSUE] Load gff in chado > > > > > > > > Hi Miharimamy, > > > > > > > > Thanks for sending this report. Generally, loading GFF into Chado can > be difficult, as the perl-based loader that you are using can be quite > particular about the format of the GFF and producers of GFF generally > aren't so particular. Since the loader makes the (in my view, correct) > decision to not load anything if it can't load everything in a file, it > quits. So, taking each of the problems you found in order: > > > > > > > > 1. "dbxref=NCBI:" isn't a valid accession (presumably there should be > something after the "NCBI:") and so the loader won't continue. Two options > here are to contact SGD and point out that there is a problem with their > GFF, or delete the offending item and try again, which it looks like you > did for item 2. > > > > > > > > 2. If the item Note is missing from the cvterm table, I think that > probably means that you didn't install the schema using the "make" > procedure that would have installed some necessary items in the cv and > cvterm table. > > > > > > > > 3 and 4: Messages that srcfeatures can't be found mean that the feature > on which the features in your GFF reside hasn't been defined yet (that is, > the thing referred to in column 1 of the GFF doesn't exist). Frequently, > creators of GFF don't define the reference sequence in the GFF for whatever > reason (it's not required by the GFF3 spec, since it might be credibly > defined elsewhere). To define it in the GFF you have, add a line before > anything else that looks something like this: > > > > > > > > NZ_CP027390.1 . chromosome 1 1234566 . . . > ID=NZ_CP027390.1;Name=NZ_CP027390.1 > > > > > > > > Where "NZ_CP027390.1" is the name of the reference sequence, > "chromosome" is the Sequence Ontology term for the type of thing (other > options would include "contig" and "supercontig"), and the "123456" is the > length of the sequence. > > > > > > > > Finally, I would add that using Chado through Tripal frequently makes > life easier, though I think all of these issues would still have been > problems (with the probable exception of item 2--Tripal would have > initialized the "Note" item in cvterm I think). > > > > > > > > Scott > > > > > > > > > > > > On Thu, Jan 16, 2020 at 11:15 AM RAJAONISON Andriamiharimamy via > Gmod-schema <gmo...@li...> wrote: > > > > Hi, > > > > > > > > Hi, > > > > Hope this mail will find you well. Send you my best wishes for this new > year. > > > > I am reaching to you because I have issues loading GFF files into you > Chado. I tried several files but none of them seems to work. > > > > > > > > Thank you in advance for you help > > > > > > > > Miharimamy > > > > > > > > The S.cerevisiae example in gmod.org > > > > Command : > > > > chado-1.31/load/bin/gmod_bulk_load_gff3.pl --organism yeast --gfffile > saccharomyces/saccharomyces_cerevisiae.gff.sorted --fastafile > saccharomyces/saccharomyces_cerevisiae.gff.sorted.fasta > > > > > > > > Output : > > > > ------------- EXCEPTION: Bio::Root::Exception ------------- > > > > MSG: Error in line: > > > > chrIII SGD chromosome 1 316620 . . . > ID=chrIII;dbxref=NCBI:;Name=chrIII > > > > > > > > Dbxref value 'NCBI:' did not conform to GFF3 specification > > > > STACK: Error::throw > > > > STACK: Bio::Root::Root::throw /usr/local/share/perl5/Bio/Root/Root.pm:447 > > > > STACK: Bio::FeatureIO::gff::_handle_feature > /usr/local/share/perl5/Bio/FeatureIO/gff.pm:659 > > > > STACK: Bio::FeatureIO::gff::next_feature > /usr/local/share/perl5/Bio/FeatureIO/gff.pm:187 > > > > STACK: chado-1.31/load/bin/gmod_bulk_load_gff3.pl:782 > > > > ----------------------------------------------------------- > > > > S.cerevisiae without dbxref > > > > Command : > > > > chado-1.31/load/bin/gmod_bulk_load_gff3.pl --organism yeast --gfffile > saccharomyces/test_saccharomyces_ncbi --fastafile > saccharomyces/saccharomyces_cerevisiae.gff.sorted.fasta > > > > > > > > Output : > > > > ------------- EXCEPTION: Bio::Root::Exception ------------- > > > > MSG: I couldn't find the 'Note' cvterm in the database; > > > > Did you load the feature property controlled vocabulary? > > > > STACK: Error::throw > > > > STACK: Bio::Root::Root::throw /usr/local/share/perl5/Bio/Root/Root.pm:447 > > > > STACK: Bio::GMOD::DB::Adapter::handle_note > /usr/local/share/perl5/Bio/GMOD/DB/Adapter.pm:3901 > > > > STACK: chado-1.31/load/bin/gmod_bulk_load_gff3.pl:974 > > > > > > > > A GFF from NCBI > > > > Command : > > > > chado-1.31/load/bin/gmod_bulk_load_gff3.pl --organism bacteria > --gfffile GCF_000008865.2/GCF_003018035.1_ASM301803v1_genomic.gff.sorted > > > > > > > > Output : > > > > Unable to find srcfeature NZ_CP027390.1 in the database. > > > > Perhaps you need to rerun your data load with the '--recreate_cache' > option. at /usr/local/share/perl5/Bio/GMOD/DB/Adapter.pm line 4605, <GEN0> > line 2. > > > > > Bio::GMOD::DB::Adapter::src_second_chance('Bio::GMOD::DB::Adapter=HASH(0x241a8d0)', > 'Bio::SeqFeature::Annotated=HASH(0x26c61d0)') called at chado-1.31/load/bin/ > gmod_bulk_load_gff3.pl line 851 > > > > > > > > A GFF3 from prokka > > > > Command : > > > > chado-1.31/load/bin/gmod_bulk_load_gff3.pl --organism bacteria -gfffile > GFF3/GCF_000455285.gff.sorted --fastafile > GFF3/GCF_000455285.gff.sorted.fasta > > > > Output : > > > > Unable to find srcfeature gnl|Prokka|GCF_000455285_1 in the database. > > > > Perhaps you need to rerun your data load with the '--recreate_cache' > option. at /usr/local/share/perl5/Bio/GMOD/DB/Adapter.pm line 4605, <GEN0> > line 2. > > > > > Bio::GMOD::DB::Adapter::src_second_chance('Bio::GMOD::DB::Adapter=HASH(0x33008f0)', > 'Bio::SeqFeature::Annotated=HASH(0x35ac0a0)') called at chado-1.31/load/bin/ > gmod_bulk_load_gff3.pl line 851 > > > > > > > > > > > > > > > > > > > > _______________________________________________ > > Gmod-schema mailing list > > Gmo...@li... > > https://lists.sourceforge.net/lists/listinfo/gmod-schema > > > > > > > > -- > > > > ------------------------------------------------------------------------ > > Scott Cain, Ph. D. scott at scottcain > dot net > > GMOD Coordinator (http://gmod.org/) 216-392-3087 > > Ontario Institute for Cancer Research > > > > > > > > -- > > > > ------------------------------------------------------------------------ > > Scott Cain, Ph. D. scott at scottcain > dot net > > GMOD Coordinator (http://gmod.org/) 216-392-3087 > > Ontario Institute for Cancer Research > > > > > > > > -- > > > > ------------------------------------------------------------------------ > > Scott Cain, Ph. D. scott at scottcain > dot net > > GMOD Coordinator (http://gmod.org/) 216-392-3087 > > Ontario Institute for Cancer Research > > > > -- > ------------------------------------------------------------------------ > Scott Cain, Ph. D. scott at scottcain > dot net > GMOD Coordinator (http://gmod.org/) 216-392-3087 > Ontario Institute for Cancer Research > > > > -- > > ------------------------------------------------------------------------ > Scott Cain, Ph. D. scott at scottcain > dot net > GMOD Coordinator (http://gmod.org/) 216-392-3087 > Ontario Institute for Cancer Research > -- ------------------------------------------------------------------------ Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research |
From: RAJAONISON A. <mir...@ya...> - 2020-01-20 21:11:56
|
Here it is select * from cvterm where cv_id=3 cvterm_id cv_id name definition 3 3 "version" "Chado schema version" De : Scott Cain <sc...@sc...> Envoyé : lundi 20 janvier 2020 23:47 À : RAJAONISON Andriamiharimamy <mir...@ya...> Cc : GMOD Schema/Chado List <gmo...@li...> Objet : Re: [Gmod-schema] [ISSUE] Load gff in chado Can you also do a "select * from cvterm where cv_id=3" and show us that? On Mon, Jan 20, 2020 at 12:45 PM RAJAONISON Andriamiharimamy <mir...@ya... <mailto:mir...@ya...> > wrote: Thank you, it worked but I run into the missing “'Note' cvterm”. ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: I couldn't find the 'Note' cvterm in the database; Did you load the feature property controlled vocabulary? STACK: Error::throw STACK: Bio::Root::Root::throw /usr/local/share/perl5/Bio/Root/Root.pm:447 STACK: Bio::GMOD::DB::Adapter::handle_note /usr/local/share/perl5/Bio/GMOD/DB/Adapter.pm:3901 STACK: chado-1.31/load/bin/gmod_bulk_load_gff3.pl:974 <http://gmod_bulk_load_gff3.pl:974> ----------------------------------------------------------- I used “make” to create database and “make ontologies” to load ontologies. I installed : Relationship Ontology, Sequence Ontology, Gene Ontology and Chado Feature Properties A select in the cv table shows : 1 "null" 2 "local" "Locally created terms" 3 "chado_properties" "Terms that are used in the chadoprop table to describe the state of the database" 4 "relationship" 5 "synonym_type" 6 "cvterm_property_type" 7 "anonymous" 8 "sequence" 9 "biological_process" 10 "molecular_function" 11 "cellular_component" 12 "external" Thank you Miharimamy De : Scott Cain <sc...@sc... <mailto:sc...@sc...> > Envoyé : lundi 20 janvier 2020 22:39 À : RAJAONISON Andriamiharimamy <mir...@ya... <mailto:mir...@ya...> > Cc : GMOD Schema/Chado List <gmo...@li... <mailto:gmo...@li...> > Objet : Re: [Gmod-schema] [ISSUE] Load gff in chado Is_circular isn't a valid GFF tag. You can change it to is_circular to fix it. On Mon, Jan 20, 2020 at 11:24 AM RAJAONISON Andriamiharimamy <mir...@ya... <mailto:mir...@ya...> > wrote: Hi again Scott, Here is the modified header of the file : ##gff-version 3 NZ_CP027390.1 . chromosome 1 5901472 . . . ID=NZ_CP027390.1;Name=NZ_CP027390.1 NZ_CP027391.1 . plasmid 1 98724 . . . ID=NZ_CP027391.1;Name=NZ_CP027391.1 NZ_CP027390.1 RefSeq gene 931 1197 . + . ID=gene0;Name=AX062_RS00010;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS00010 NZ_CP027390.1 RefSeq gene 1191 1577 . - . ID=gene1;Name=AX062_RS00015;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS00015 And here is the output of the script. ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: The following tag(s) are illegal and are causing this parser to die: Is_circular STACK: Error::throw STACK: Bio::Root::Root::throw /usr/local/share/perl5/Bio/Root/Root.pm:447 STACK: Bio::FeatureIO::gff::_handle_feature /usr/local/share/perl5/Bio/FeatureIO/gff.pm:787 <http://gff.pm:787> STACK: Bio::FeatureIO::gff::next_feature /usr/local/share/perl5/Bio/FeatureIO/gff.pm:187 <http://gff.pm:187> STACK: chado-1.31/load/bin/gmod_bulk_load_gff3.pl:782 <http://gmod_bulk_load_gff3.pl:782> ----------------------------------------------------------- I checked the tag in the file and here is the snippet causing issue: NZ_CP027390.1 RefSeq region 1 5802748 . + . ID=id0;Dbxref=taxon:562;Is_circular=true;Name=ANONYMOUS;collected-by=CDC;country=USA;gbkey=Src;genome=chromosome;mol_type=genomic DNA;serov ar=E. coli O26:Pending;strain=2015C-4944 NZ_CP027390.1 cmsearch riboswitch 4764335 4764483 . + . ID=id113;Dbxref=RFAM:RF00050;Note=FMN riboswitch;bound_moiety=flavin mononucleotide;gbkey=regulatory;inference=COORDINATES: profile:INFERNAL:1.1.1;regulatory_class=riboswitch NZ_CP027390.1 RefSeq direct_repeat 5085858 5086435 . + . ID=id120;gbkey=repeat_region;inference=COORDINATES: alignment:pilercr:v1.02;rpt_family=CRISPR;rpt_type=direct;rpt_unit_range=508586 2..5085885;rpt_unit_seq=tccccgcgccagcggggataaacc NZ_CP027390.1 RefSeq direct_repeat 5112137 5112653 . + . ID=id121;gbkey=repeat_region;inference=COORDINATES: alignment:pilercr:v1.02;rpt_family=CRISPR;rpt_type=direct;rpt_unit_range=511213 7..5112165;rpt_unit_seq=gtgttccccgcgccagcggggataaaccg NZ_CP027391.1 RefSeq region 1 98724 . + . ID=id147;Dbxref=taxon:562;Is_circular=true;Name=unnamed;collected-by=CDC;country=USA;gbkey=Src;genome=plasmid;mol_type=genomic DNA;plasmid- name=unnamed;serovar=E. coli O26:Pending;strain=2015C-4944 Thank you Miharimamy De : Scott Cain <sc...@sc... <mailto:sc...@sc...> > Envoyé : lundi 20 janvier 2020 22:03 À : RAJAONISON Andriamiharimamy <mir...@ya... <mailto:mir...@ya...> > Cc : GMOD Schema/Chado List <gmo...@li... <mailto:gmo...@li...> > Objet : Re: [Gmod-schema] [ISSUE] Load gff in chado Hi Miharimamy, The snippet of GFF that you posted had a chromosome named "NZ_CP027390.1" which looks fine to me, but the error message is about a reference sequence named "NZ_CP027391.1". Is it defined anywhere in your GFF? Scott On Mon, Jan 20, 2020 at 10:58 AM RAJAONISON Andriamiharimamy <mir...@ya... <mailto:mir...@ya...> > wrote: > > Hi Scott, > > > > Even with the --recreate_cache and --remove_lock, I still get the same error. > > > > Miharimamy > > > > > > (Re)creating the uniquename cache in the database... > > Creating table... > > Populating table... > > Creating indexes... > > Adjusting the primary key sequences (if necessary)...Done. > > Preparing data for inserting into the chado database > > (This may take a while ...) > > Unable to find srcfeature NZ_CP027391.1 in the database. > > Perhaps you need to rerun your data load with the '--recreate_cache' option. at /usr/local/share/perl5/Bio/GMOD/DB/Adapter.pm line 4605, <GEN0> line 5939. > > Bio::GMOD::DB::Adapter::src_second_chance('Bio::GMOD::DB::Adapter=HASH(0x25623e0)', 'Bio::SeqFeature::Annotated=HASH(0x2680c18)') called at chado-1.31/load/bin/gmod_bulk_load_gff3.pl <http://gmod_bulk_load_gff3.pl> line 851 > > > > Abnormal termination, trying to clean up... > > > > Attempting to clean up the loader temp table (so that --recreate_cache > > won't be needed)... > > Trying to remove the run lock (so that --remove_lock won't be needed)... > > Exiting... > > > > > > > > De : Scott Cain <sc...@sc... <mailto:sc...@sc...> > > Envoyé : lundi 20 janvier 2020 20:56 > À : RAJAONISON Andriamiharimamy <mir...@ya... <mailto:mir...@ya...> > > Cc : GMOD Schema/Chado List <gmo...@li... <mailto:gmo...@li...> > > Objet : Re: [Gmod-schema] [ISSUE] Load gff in chado > > > > Hi Miharimamy, > > > > Did you try running with the --recreate_cache option as suggested by the error message? > > > > On Mon, Jan 20, 2020 at 6:54 AM RAJAONISON Andriamiharimamy <mir...@ya... <mailto:mir...@ya...> > wrote: > > Hi Scott, > > You were right, the issue lied in the line 2. The field separator was 4 spaces instead of tabulation. > > I converted it and tried to load the file back but went back to the first error. > > > > Thank you for your time, > > Miharimamy > > > > Preparing data for inserting into the chado database > > (This may take a while ...) > > Unable to find srcfeature NZ_CP027391.1 in the database. > > Perhaps you need to rerun your data load with the '--recreate_cache' option. at /usr/local/share/perl5/Bio/GMOD/DB/Adapter.pm line 4605, <GEN0> line 5939. > > Bio::GMOD::DB::Adapter::src_second_chance('Bio::GMOD::DB::Adapter=HASH(0x297c120)', 'Bio::SeqFeature::Annotated=HASH(0x2bcd420)') called at chado-1.31/load/bin/gmod_bulk_load_gff3.pl <http://gmod_bulk_load_gff3.pl> line 851 > > > > Abnormal termination, trying to clean up... > > > > Attempting to clean up the loader temp table (so that --recreate_cache > > won't be needed)... > > Trying to remove the run lock (so that --remove_lock won't be needed)... > > Exiting... > > > > > > ##gff-version 3 > > NZ_CP027390.1 . chromosome 1 5901472 . . . ID=NZ_CP027390.1;Name=NZ_CP027390.1 > > NZ_CP027390.1 RefSeq gene 931 1197 . + . ID=gene0;Name=AX062_RS00010;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS00010 > > NZ_CP027390.1 RefSeq gene 1191 1577 . - . ID=gene1;Name=AX062_RS00015;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS00015 > > NZ_CP027390.1 RefSeq gene 10194 10736 . + . ID=gene10;Name=AX062_RS00060;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS00060 > > NZ_CP027390.1 RefSeq gene 80547 81860 . + . ID=gene100;Name=AX062_RS00510;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS00510 > > NZ_CP027390.1 RefSeq gene 918692 918997 . - . ID=gene1000;Name=AX062_RS05010;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS05010 > > NZ_CP027390.1 RefSeq gene 919105 919815 . + . ID=gene1001;Name=AX062_RS05015;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS05015 > > NZ_CP027390.1 RefSeq gene 919818 920378 . - . ID=gene1002;Name=AX062_RS05020;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS05020 > > NZ_CP027390.1 RefSeq gene 920413 920754 . - . ID=gene1003;Name=AX062_RS05025;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS05025 > > NZ_CP027390.1 RefSeq gene 920889 921215 . + . ID=gene1004;Name=AX062_RS05030;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS05030 > > NZ_CP027390.1 RefSeq gene 921252 921440 . + . ID=gene1005;Name=AX062_RS05035;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS05035 > > NZ_CP027390.1 RefSeq gene 921421 922635 . + . ID=gene1006;Name=AX062_RS05040;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS05040 > > > > > > De : Scott Cain <sc...@sc... <mailto:sc...@sc...> > > Envoyé : vendredi 17 janvier 2020 21:49 > À : RAJAONISON Andriamiharimamy <mir...@ya... <mailto:mir...@ya...> > > Cc : GMOD Schema/Chado List <gmo...@li... <mailto:gmo...@li...> > > Objet : Re: [Gmod-schema] [ISSUE] Load gff in chado > > > > Hi Miharimamy, > > > > The end of the error message where it says "line 2" I'm pretty sure means the problem is in line 2 of your GFF file. Is that the line you added? What does it look like? > > > > Scott > > > > > > On Fri, Jan 17, 2020 at 5:57 AM RAJAONISON Andriamiharimamy <mir...@ya... <mailto:mir...@ya...> > wrote: > > Hi Scott, > > > > Thank you again for your help because adding the parent feature removes the srcfeature related error. > > Yet I get another issue that I don’t understand : > > > > --------------------- WARNING --------------------- > > MSG: Can not set Bio::Location::Simple::end() equal to start; start not set > > --------------------------------------------------- > > Use of uninitialized value $featuretype in pattern match (m//) at chado-1.31/load/bin/gmod_bulk_load_gff3.pl <http://gmod_bulk_load_gff3.pl> line 819, <GEN0> line 2. > > Use of uninitialized value $featuretype in pattern match (m//) at chado-1.31/load/bin/gmod_bulk_load_gff3.pl <http://gmod_bulk_load_gff3.pl> line 820, <GEN0> line 2. > > > > ------------- EXCEPTION: Bio::Root::Exception ------------- > > MSG: no cvterm for > > STACK: Error::throw > > STACK: Bio::Root::Root::throw /usr/local/share/perl5/Bio/Root/Root.pm:447 > > STACK: Bio::GMOD::DB::Adapter::get_type /usr/local/share/perl5/Bio/GMOD/DB/Adapter.pm:4629 > > STACK: chado-1.31/load/bin/gmod_bulk_load_gff3.pl:848 <http://gmod_bulk_load_gff3.pl:848> > > ----------------------------------------------------------- > > > > Abnormal termination, trying to clean up... > > > > Attempting to clean up the loader temp table (so that --recreate_cache > > won't be needed)... > > Trying to remove the run lock (so that --remove_lock won't be needed)... > > Exiting... > > > > Thank you > > > > Miharimamy > > > > > > De : Scott Cain <sc...@sc... <mailto:sc...@sc...> > > Envoyé : jeudi 16 janvier 2020 22:31 > À : RAJAONISON Andriamiharimamy <mir...@ya... <mailto:mir...@ya...> > > Cc : GMOD Schema/Chado List <gmo...@li... <mailto:gmo...@li...> > > Objet : Re: [Gmod-schema] [ISSUE] Load gff in chado > > > > Hi Miharimamy, > > > > Thanks for sending this report. Generally, loading GFF into Chado can be difficult, as the perl-based loader that you are using can be quite particular about the format of the GFF and producers of GFF generally aren't so particular. Since the loader makes the (in my view, correct) decision to not load anything if it can't load everything in a file, it quits. So, taking each of the problems you found in order: > > > > 1. "dbxref=NCBI:" isn't a valid accession (presumably there should be something after the "NCBI:") and so the loader won't continue. Two options here are to contact SGD and point out that there is a problem with their GFF, or delete the offending item and try again, which it looks like you did for item 2. > > > > 2. If the item Note is missing from the cvterm table, I think that probably means that you didn't install the schema using the "make" procedure that would have installed some necessary items in the cv and cvterm table. > > > > 3 and 4: Messages that srcfeatures can't be found mean that the feature on which the features in your GFF reside hasn't been defined yet (that is, the thing referred to in column 1 of the GFF doesn't exist). Frequently, creators of GFF don't define the reference sequence in the GFF for whatever reason (it's not required by the GFF3 spec, since it might be credibly defined elsewhere). To define it in the GFF you have, add a line before anything else that looks something like this: > > > > NZ_CP027390.1 . chromosome 1 1234566 . . . ID=NZ_CP027390.1;Name=NZ_CP027390.1 > > > > Where "NZ_CP027390.1" is the name of the reference sequence, "chromosome" is the Sequence Ontology term for the type of thing (other options would include "contig" and "supercontig"), and the "123456" is the length of the sequence. > > > > Finally, I would add that using Chado through Tripal frequently makes life easier, though I think all of these issues would still have been problems (with the probable exception of item 2--Tripal would have initialized the "Note" item in cvterm I think). > > > > Scott > > > > > > On Thu, Jan 16, 2020 at 11:15 AM RAJAONISON Andriamiharimamy via Gmod-schema <gmo...@li... <mailto:gmo...@li...> > wrote: > > Hi, > > > > Hi, > > Hope this mail will find you well. Send you my best wishes for this new year. > > I am reaching to you because I have issues loading GFF files into you Chado. I tried several files but none of them seems to work. > > > > Thank you in advance for you help > > > > Miharimamy > > > > The S.cerevisiae example in gmod.org <http://gmod.org> > > Command : > > chado-1.31/load/bin/gmod_bulk_load_gff3.pl <http://gmod_bulk_load_gff3.pl> --organism yeast --gfffile saccharomyces/saccharomyces_cerevisiae.gff.sorted --fastafile saccharomyces/saccharomyces_cerevisiae.gff.sorted.fasta > > > > Output : > > ------------- EXCEPTION: Bio::Root::Exception ------------- > > MSG: Error in line: > > chrIII SGD chromosome 1 316620 . . . ID=chrIII;dbxref=NCBI:;Name=chrIII > > > > Dbxref value 'NCBI:' did not conform to GFF3 specification > > STACK: Error::throw > > STACK: Bio::Root::Root::throw /usr/local/share/perl5/Bio/Root/Root.pm:447 > > STACK: Bio::FeatureIO::gff::_handle_feature /usr/local/share/perl5/Bio/FeatureIO/gff.pm:659 <http://gff.pm:659> > > STACK: Bio::FeatureIO::gff::next_feature /usr/local/share/perl5/Bio/FeatureIO/gff.pm:187 <http://gff.pm:187> > > STACK: chado-1.31/load/bin/gmod_bulk_load_gff3.pl:782 <http://gmod_bulk_load_gff3.pl:782> > > ----------------------------------------------------------- > > S.cerevisiae without dbxref > > Command : > > chado-1.31/load/bin/gmod_bulk_load_gff3.pl <http://gmod_bulk_load_gff3.pl> --organism yeast --gfffile saccharomyces/test_saccharomyces_ncbi --fastafile saccharomyces/saccharomyces_cerevisiae.gff.sorted.fasta > > > > Output : > > ------------- EXCEPTION: Bio::Root::Exception ------------- > > MSG: I couldn't find the 'Note' cvterm in the database; > > Did you load the feature property controlled vocabulary? > > STACK: Error::throw > > STACK: Bio::Root::Root::throw /usr/local/share/perl5/Bio/Root/Root.pm:447 > > STACK: Bio::GMOD::DB::Adapter::handle_note /usr/local/share/perl5/Bio/GMOD/DB/Adapter.pm:3901 > > STACK: chado-1.31/load/bin/gmod_bulk_load_gff3.pl:974 <http://gmod_bulk_load_gff3.pl:974> > > > > A GFF from NCBI > > Command : > > chado-1.31/load/bin/gmod_bulk_load_gff3.pl <http://gmod_bulk_load_gff3.pl> --organism bacteria --gfffile GCF_000008865.2/GCF_003018035.1_ASM301803v1_genomic.gff.sorted > > > > Output : > > Unable to find srcfeature NZ_CP027390.1 in the database. > > Perhaps you need to rerun your data load with the '--recreate_cache' option. at /usr/local/share/perl5/Bio/GMOD/DB/Adapter.pm line 4605, <GEN0> line 2. > > Bio::GMOD::DB::Adapter::src_second_chance('Bio::GMOD::DB::Adapter=HASH(0x241a8d0)', 'Bio::SeqFeature::Annotated=HASH(0x26c61d0)') called at chado-1.31/load/bin/gmod_bulk_load_gff3.pl <http://gmod_bulk_load_gff3.pl> line 851 > > > > A GFF3 from prokka > > Command : > > chado-1.31/load/bin/gmod_bulk_load_gff3.pl <http://gmod_bulk_load_gff3.pl> --organism bacteria -gfffile GFF3/GCF_000455285.gff.sorted --fastafile GFF3/GCF_000455285.gff.sorted.fasta > > Output : > > Unable to find srcfeature gnl|Prokka|GCF_000455285_1 in the database. > > Perhaps you need to rerun your data load with the '--recreate_cache' option. at /usr/local/share/perl5/Bio/GMOD/DB/Adapter.pm line 4605, <GEN0> line 2. > > Bio::GMOD::DB::Adapter::src_second_chance('Bio::GMOD::DB::Adapter=HASH(0x33008f0)', 'Bio::SeqFeature::Annotated=HASH(0x35ac0a0)') called at chado-1.31/load/bin/gmod_bulk_load_gff3.pl <http://gmod_bulk_load_gff3.pl> line 851 > > > > > > > > > > _______________________________________________ > Gmod-schema mailing list > Gmo...@li... <mailto:Gmo...@li...> > https://lists.sourceforge.net/lists/listinfo/gmod-schema > > > > -- > > ------------------------------------------------------------------------ > Scott Cain, Ph. D. scott at scottcain dot net > GMOD Coordinator (http://gmod.org/) 216-392-3087 > Ontario Institute for Cancer Research > > > > -- > > ------------------------------------------------------------------------ > Scott Cain, Ph. D. scott at scottcain dot net > GMOD Coordinator (http://gmod.org/) 216-392-3087 > Ontario Institute for Cancer Research > > > > -- > > ------------------------------------------------------------------------ > Scott Cain, Ph. D. scott at scottcain dot net > GMOD Coordinator (http://gmod.org/) 216-392-3087 > Ontario Institute for Cancer Research -- ------------------------------------------------------------------------ Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research -- ------------------------------------------------------------------------ Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research -- ------------------------------------------------------------------------ Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research |
From: Scott C. <sc...@sc...> - 2020-01-20 21:33:48
|
Ah, you have chado_properties (cv_id 3) but not feature_properties, which is where Note comes from. That should come from the make command but it appears to have not worked. To run the make command again, my recollection is that you have to remove a directory from the "load" directory. Do this: try running that make command again and select feature properties from the menu. If nothing happens, my recollection is that running "make clean" will remove the lock files that prevent loading an ontology twice (which is what it thinks you'll be trying to do I think). On Mon, Jan 20, 2020 at 1:11 PM RAJAONISON Andriamiharimamy < mir...@ya...> wrote: > Here it is > > select * from cvterm where cv_id=3 > > > > cvterm_id cv_id name definition > > 3 3 "version" "Chado schema version" > > *De :* Scott Cain <sc...@sc...> > *Envoyé :* lundi 20 janvier 2020 23:47 > *À :* RAJAONISON Andriamiharimamy <mir...@ya...> > *Cc :* GMOD Schema/Chado List <gmo...@li...> > *Objet :* Re: [Gmod-schema] [ISSUE] Load gff in chado > > > > Can you also do a "select * from cvterm where cv_id=3" and show us that? > > > > On Mon, Jan 20, 2020 at 12:45 PM RAJAONISON Andriamiharimamy < > mir...@ya...> wrote: > > Thank you, it worked but I run into the missing “'Note' cvterm”. > > > > ------------- EXCEPTION: Bio::Root::Exception ------------- > > MSG: I couldn't find the 'Note' cvterm in the database; > > Did you load the feature property controlled vocabulary? > > STACK: Error::throw > > STACK: Bio::Root::Root::throw /usr/local/share/perl5/Bio/Root/Root.pm:447 > > STACK: Bio::GMOD::DB::Adapter::handle_note > /usr/local/share/perl5/Bio/GMOD/DB/Adapter.pm:3901 > > STACK: chado-1.31/load/bin/gmod_bulk_load_gff3.pl:974 > > ----------------------------------------------------------- > > > > I used “make” to create database and “make ontologies” to load > ontologies. > > I installed : Relationship Ontology, Sequence Ontology, Gene Ontology and > Chado Feature Properties > > A select in the cv table shows : > > 1 "null" > > 2 "local" "Locally created terms" > > 3 "chado_properties" "Terms that are used in the > chadoprop table to describe the state of the database" > > 4 "relationship" > > 5 "synonym_type" > > 6 "cvterm_property_type" > > 7 "anonymous" > > 8 "sequence" > > 9 "biological_process" > > 10 "molecular_function" > > 11 "cellular_component" > > 12 "external" > > > > Thank you > > Miharimamy > > > > *De :* Scott Cain <sc...@sc...> > *Envoyé :* lundi 20 janvier 2020 22:39 > *À :* RAJAONISON Andriamiharimamy <mir...@ya...> > *Cc :* GMOD Schema/Chado List <gmo...@li...> > *Objet :* Re: [Gmod-schema] [ISSUE] Load gff in chado > > > > Is_circular isn't a valid GFF tag. You can change it to is_circular to > fix it. > > > > On Mon, Jan 20, 2020 at 11:24 AM RAJAONISON Andriamiharimamy < > mir...@ya...> wrote: > > Hi again Scott, > > > > Here is the modified header of the file : > > > > ##gff-version 3 > > NZ_CP027390.1 . chromosome 1 5901472 . . > . ID=NZ_CP027390.1;Name=NZ_CP027390.1 > > NZ_CP027391.1 . plasmid 1 98724 . . . > ID=NZ_CP027391.1;Name=NZ_CP027391.1 > > NZ_CP027390.1 RefSeq gene 931 1197 . + . > ID=gene0;Name=AX062_RS00010;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS00010 > > NZ_CP027390.1 RefSeq gene 1191 1577 . - . > ID=gene1;Name=AX062_RS00015;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS00015 > > > > And here is the output of the script. > > > > ------------- EXCEPTION: Bio::Root::Exception ------------- > > MSG: The following tag(s) are illegal and are causing this parser to die: > Is_circular > > STACK: Error::throw > > STACK: Bio::Root::Root::throw /usr/local/share/perl5/Bio/Root/Root.pm:447 > > STACK: Bio::FeatureIO::gff::_handle_feature > /usr/local/share/perl5/Bio/FeatureIO/gff.pm:787 > > STACK: Bio::FeatureIO::gff::next_feature > /usr/local/share/perl5/Bio/FeatureIO/gff.pm:187 > > STACK: chado-1.31/load/bin/gmod_bulk_load_gff3.pl:782 > > ----------------------------------------------------------- > > > > I checked the tag in the file and here is the snippet causing issue: > > > > NZ_CP027390.1 RefSeq region 1 5802748 . + . > ID=id0;Dbxref=taxon:562;Is_circular=true;Name=ANONYMOUS;collected-by=CDC;country=USA;gbkey=Src;genome=chromosome;mol_type=genomic > DNA;serov > > ar=E. coli O26:Pending;strain=2015C-4944 > > NZ_CP027390.1 cmsearch riboswitch 4764335 4764483 . > + . ID=id113;Dbxref=RFAM:RF00050;Note=FMN > riboswitch;bound_moiety=flavin > mononucleotide;gbkey=regulatory;inference=COORDINATES: > > profile:INFERNAL:1.1.1;regulatory_class=riboswitch > > NZ_CP027390.1 RefSeq direct_repeat 5085858 5086435 . + > . ID=id120;gbkey=repeat_region;inference=COORDINATES: > alignment:pilercr:v1.02;rpt_family=CRISPR;rpt_type=direct;rpt_unit_range=508586 > > 2..5085885;rpt_unit_seq=tccccgcgccagcggggataaacc > > NZ_CP027390.1 RefSeq direct_repeat 5112137 5112653 . + > . ID=id121;gbkey=repeat_region;inference=COORDINATES: > alignment:pilercr:v1.02;rpt_family=CRISPR;rpt_type=direct;rpt_unit_range=511213 > > 7..5112165;rpt_unit_seq=gtgttccccgcgccagcggggataaaccg > > NZ_CP027391.1 RefSeq region 1 98724 . + . > ID=id147;Dbxref=taxon:562;Is_circular=true;Name=unnamed;collected-by=CDC;country=USA;gbkey=Src;genome=plasmid;mol_type=genomic > DNA;plasmid- > > name=unnamed;serovar=E. coli O26:Pending;strain=2015C-4944 > > > > Thank you > > Miharimamy > > > > *De :* Scott Cain <sc...@sc...> > *Envoyé :* lundi 20 janvier 2020 22:03 > *À :* RAJAONISON Andriamiharimamy <mir...@ya...> > *Cc :* GMOD Schema/Chado List <gmo...@li...> > *Objet :* Re: [Gmod-schema] [ISSUE] Load gff in chado > > > > Hi Miharimamy, > > The snippet of GFF that you posted had a chromosome named "NZ_CP027390.1" > which looks fine to me, but the error message is about a reference sequence > named "NZ_CP027391.1". Is it defined anywhere in your GFF? > > Scott > > > On Mon, Jan 20, 2020 at 10:58 AM RAJAONISON Andriamiharimamy < > mir...@ya...> wrote: > > > > Hi Scott, > > > > > > > > Even with the --recreate_cache and --remove_lock, I still get the same > error. > > > > > > > > Miharimamy > > > > > > > > > > > > (Re)creating the uniquename cache in the database... > > > > Creating table... > > > > Populating table... > > > > Creating indexes... > > > > Adjusting the primary key sequences (if necessary)...Done. > > > > Preparing data for inserting into the chado database > > > > (This may take a while ...) > > > > Unable to find srcfeature NZ_CP027391.1 in the database. > > > > Perhaps you need to rerun your data load with the '--recreate_cache' > option. at /usr/local/share/perl5/Bio/GMOD/DB/Adapter.pm line 4605, <GEN0> > line 5939. > > > > > Bio::GMOD::DB::Adapter::src_second_chance('Bio::GMOD::DB::Adapter=HASH(0x25623e0)', > 'Bio::SeqFeature::Annotated=HASH(0x2680c18)') called at chado-1.31/load/bin/ > gmod_bulk_load_gff3.pl line 851 > > > > > > > > Abnormal termination, trying to clean up... > > > > > > > > Attempting to clean up the loader temp table (so that --recreate_cache > > > > won't be needed)... > > > > Trying to remove the run lock (so that --remove_lock won't be needed)... > > > > Exiting... > > > > > > > > > > > > > > > > De : Scott Cain <sc...@sc...> > > Envoyé : lundi 20 janvier 2020 20:56 > > À : RAJAONISON Andriamiharimamy <mir...@ya...> > > Cc : GMOD Schema/Chado List <gmo...@li...> > > Objet : Re: [Gmod-schema] [ISSUE] Load gff in chado > > > > > > > > Hi Miharimamy, > > > > > > > > Did you try running with the --recreate_cache option as suggested by the > error message? > > > > > > > > On Mon, Jan 20, 2020 at 6:54 AM RAJAONISON Andriamiharimamy < > mir...@ya...> wrote: > > > > Hi Scott, > > > > You were right, the issue lied in the line 2. The field separator was 4 > spaces instead of tabulation. > > > > I converted it and tried to load the file back but went back to the > first error. > > > > > > > > Thank you for your time, > > > > Miharimamy > > > > > > > > Preparing data for inserting into the chado database > > > > (This may take a while ...) > > > > Unable to find srcfeature NZ_CP027391.1 in the database. > > > > Perhaps you need to rerun your data load with the '--recreate_cache' > option. at /usr/local/share/perl5/Bio/GMOD/DB/Adapter.pm line 4605, <GEN0> > line 5939. > > > > > Bio::GMOD::DB::Adapter::src_second_chance('Bio::GMOD::DB::Adapter=HASH(0x297c120)', > 'Bio::SeqFeature::Annotated=HASH(0x2bcd420)') called at chado-1.31/load/bin/ > gmod_bulk_load_gff3.pl line 851 > > > > > > > > Abnormal termination, trying to clean up... > > > > > > > > Attempting to clean up the loader temp table (so that --recreate_cache > > > > won't be needed)... > > > > Trying to remove the run lock (so that --remove_lock won't be needed)... > > > > Exiting... > > > > > > > > > > > > ##gff-version 3 > > > > NZ_CP027390.1 . chromosome 1 5901472 . . > . ID=NZ_CP027390.1;Name=NZ_CP027390.1 > > > > NZ_CP027390.1 RefSeq gene 931 1197 . + . > ID=gene0;Name=AX062_RS00010;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS00010 > > > > NZ_CP027390.1 RefSeq gene 1191 1577 . - . > ID=gene1;Name=AX062_RS00015;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS00015 > > > > NZ_CP027390.1 RefSeq gene 10194 10736 . + . > ID=gene10;Name=AX062_RS00060;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS00060 > > > > NZ_CP027390.1 RefSeq gene 80547 81860 . + . > ID=gene100;Name=AX062_RS00510;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS00510 > > > > NZ_CP027390.1 RefSeq gene 918692 918997 . - . > ID=gene1000;Name=AX062_RS05010;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS05010 > > > > NZ_CP027390.1 RefSeq gene 919105 919815 . + . > ID=gene1001;Name=AX062_RS05015;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS05015 > > > > NZ_CP027390.1 RefSeq gene 919818 920378 . - . > ID=gene1002;Name=AX062_RS05020;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS05020 > > > > NZ_CP027390.1 RefSeq gene 920413 920754 . - . > ID=gene1003;Name=AX062_RS05025;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS05025 > > > > NZ_CP027390.1 RefSeq gene 920889 921215 . + . > ID=gene1004;Name=AX062_RS05030;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS05030 > > > > NZ_CP027390.1 RefSeq gene 921252 921440 . + . > ID=gene1005;Name=AX062_RS05035;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS05035 > > > > NZ_CP027390.1 RefSeq gene 921421 922635 . + . > ID=gene1006;Name=AX062_RS05040;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS05040 > > > > > > > > > > > > De : Scott Cain <sc...@sc...> > > Envoyé : vendredi 17 janvier 2020 21:49 > > À : RAJAONISON Andriamiharimamy <mir...@ya...> > > Cc : GMOD Schema/Chado List <gmo...@li...> > > Objet : Re: [Gmod-schema] [ISSUE] Load gff in chado > > > > > > > > Hi Miharimamy, > > > > > > > > The end of the error message where it says "line 2" I'm pretty sure > means the problem is in line 2 of your GFF file. Is that the line you > added? What does it look like? > > > > > > > > Scott > > > > > > > > > > > > On Fri, Jan 17, 2020 at 5:57 AM RAJAONISON Andriamiharimamy < > mir...@ya...> wrote: > > > > Hi Scott, > > > > > > > > Thank you again for your help because adding the parent feature removes > the srcfeature related error. > > > > Yet I get another issue that I don’t understand : > > > > > > > > --------------------- WARNING --------------------- > > > > MSG: Can not set Bio::Location::Simple::end() equal to start; start not > set > > > > --------------------------------------------------- > > > > Use of uninitialized value $featuretype in pattern match (m//) at > chado-1.31/load/bin/gmod_bulk_load_gff3.pl line 819, <GEN0> line 2. > > > > Use of uninitialized value $featuretype in pattern match (m//) at > chado-1.31/load/bin/gmod_bulk_load_gff3.pl line 820, <GEN0> line 2. > > > > > > > > ------------- EXCEPTION: Bio::Root::Exception ------------- > > > > MSG: no cvterm for > > > > STACK: Error::throw > > > > STACK: Bio::Root::Root::throw /usr/local/share/perl5/Bio/Root/Root.pm:447 > > > > STACK: Bio::GMOD::DB::Adapter::get_type > /usr/local/share/perl5/Bio/GMOD/DB/Adapter.pm:4629 > > > > STACK: chado-1.31/load/bin/gmod_bulk_load_gff3.pl:848 > > > > ----------------------------------------------------------- > > > > > > > > Abnormal termination, trying to clean up... > > > > > > > > Attempting to clean up the loader temp table (so that --recreate_cache > > > > won't be needed)... > > > > Trying to remove the run lock (so that --remove_lock won't be needed)... > > > > Exiting... > > > > > > > > Thank you > > > > > > > > Miharimamy > > > > > > > > > > > > De : Scott Cain <sc...@sc...> > > Envoyé : jeudi 16 janvier 2020 22:31 > > À : RAJAONISON Andriamiharimamy <mir...@ya...> > > Cc : GMOD Schema/Chado List <gmo...@li...> > > Objet : Re: [Gmod-schema] [ISSUE] Load gff in chado > > > > > > > > Hi Miharimamy, > > > > > > > > Thanks for sending this report. Generally, loading GFF into Chado can > be difficult, as the perl-based loader that you are using can be quite > particular about the format of the GFF and producers of GFF generally > aren't so particular. Since the loader makes the (in my view, correct) > decision to not load anything if it can't load everything in a file, it > quits. So, taking each of the problems you found in order: > > > > > > > > 1. "dbxref=NCBI:" isn't a valid accession (presumably there should be > something after the "NCBI:") and so the loader won't continue. Two options > here are to contact SGD and point out that there is a problem with their > GFF, or delete the offending item and try again, which it looks like you > did for item 2. > > > > > > > > 2. If the item Note is missing from the cvterm table, I think that > probably means that you didn't install the schema using the "make" > procedure that would have installed some necessary items in the cv and > cvterm table. > > > > > > > > 3 and 4: Messages that srcfeatures can't be found mean that the feature > on which the features in your GFF reside hasn't been defined yet (that is, > the thing referred to in column 1 of the GFF doesn't exist). Frequently, > creators of GFF don't define the reference sequence in the GFF for whatever > reason (it's not required by the GFF3 spec, since it might be credibly > defined elsewhere). To define it in the GFF you have, add a line before > anything else that looks something like this: > > > > > > > > NZ_CP027390.1 . chromosome 1 1234566 . . . > ID=NZ_CP027390.1;Name=NZ_CP027390.1 > > > > > > > > Where "NZ_CP027390.1" is the name of the reference sequence, > "chromosome" is the Sequence Ontology term for the type of thing (other > options would include "contig" and "supercontig"), and the "123456" is the > length of the sequence. > > > > > > > > Finally, I would add that using Chado through Tripal frequently makes > life easier, though I think all of these issues would still have been > problems (with the probable exception of item 2--Tripal would have > initialized the "Note" item in cvterm I think). > > > > > > > > Scott > > > > > > > > > > > > On Thu, Jan 16, 2020 at 11:15 AM RAJAONISON Andriamiharimamy via > Gmod-schema <gmo...@li...> wrote: > > > > Hi, > > > > > > > > Hi, > > > > Hope this mail will find you well. Send you my best wishes for this new > year. > > > > I am reaching to you because I have issues loading GFF files into you > Chado. I tried several files but none of them seems to work. > > > > > > > > Thank you in advance for you help > > > > > > > > Miharimamy > > > > > > > > The S.cerevisiae example in gmod.org > > > > Command : > > > > chado-1.31/load/bin/gmod_bulk_load_gff3.pl --organism yeast --gfffile > saccharomyces/saccharomyces_cerevisiae.gff.sorted --fastafile > saccharomyces/saccharomyces_cerevisiae.gff.sorted.fasta > > > > > > > > Output : > > > > ------------- EXCEPTION: Bio::Root::Exception ------------- > > > > MSG: Error in line: > > > > chrIII SGD chromosome 1 316620 . . . > ID=chrIII;dbxref=NCBI:;Name=chrIII > > > > > > > > Dbxref value 'NCBI:' did not conform to GFF3 specification > > > > STACK: Error::throw > > > > STACK: Bio::Root::Root::throw /usr/local/share/perl5/Bio/Root/Root.pm:447 > > > > STACK: Bio::FeatureIO::gff::_handle_feature > /usr/local/share/perl5/Bio/FeatureIO/gff.pm:659 > > > > STACK: Bio::FeatureIO::gff::next_feature > /usr/local/share/perl5/Bio/FeatureIO/gff.pm:187 > > > > STACK: chado-1.31/load/bin/gmod_bulk_load_gff3.pl:782 > > > > ----------------------------------------------------------- > > > > S.cerevisiae without dbxref > > > > Command : > > > > chado-1.31/load/bin/gmod_bulk_load_gff3.pl --organism yeast --gfffile > saccharomyces/test_saccharomyces_ncbi --fastafile > saccharomyces/saccharomyces_cerevisiae.gff.sorted.fasta > > > > > > > > Output : > > > > ------------- EXCEPTION: Bio::Root::Exception ------------- > > > > MSG: I couldn't find the 'Note' cvterm in the database; > > > > Did you load the feature property controlled vocabulary? > > > > STACK: Error::throw > > > > STACK: Bio::Root::Root::throw /usr/local/share/perl5/Bio/Root/Root.pm:447 > > > > STACK: Bio::GMOD::DB::Adapter::handle_note > /usr/local/share/perl5/Bio/GMOD/DB/Adapter.pm:3901 > > > > STACK: chado-1.31/load/bin/gmod_bulk_load_gff3.pl:974 > > > > > > > > A GFF from NCBI > > > > Command : > > > > chado-1.31/load/bin/gmod_bulk_load_gff3.pl --organism bacteria > --gfffile GCF_000008865.2/GCF_003018035.1_ASM301803v1_genomic.gff.sorted > > > > > > > > Output : > > > > Unable to find srcfeature NZ_CP027390.1 in the database. > > > > Perhaps you need to rerun your data load with the '--recreate_cache' > option. at /usr/local/share/perl5/Bio/GMOD/DB/Adapter.pm line 4605, <GEN0> > line 2. > > > > > Bio::GMOD::DB::Adapter::src_second_chance('Bio::GMOD::DB::Adapter=HASH(0x241a8d0)', > 'Bio::SeqFeature::Annotated=HASH(0x26c61d0)') called at chado-1.31/load/bin/ > gmod_bulk_load_gff3.pl line 851 > > > > > > > > A GFF3 from prokka > > > > Command : > > > > chado-1.31/load/bin/gmod_bulk_load_gff3.pl --organism bacteria -gfffile > GFF3/GCF_000455285.gff.sorted --fastafile > GFF3/GCF_000455285.gff.sorted.fasta > > > > Output : > > > > Unable to find srcfeature gnl|Prokka|GCF_000455285_1 in the database. > > > > Perhaps you need to rerun your data load with the '--recreate_cache' > option. at /usr/local/share/perl5/Bio/GMOD/DB/Adapter.pm line 4605, <GEN0> > line 2. > > > > > Bio::GMOD::DB::Adapter::src_second_chance('Bio::GMOD::DB::Adapter=HASH(0x33008f0)', > 'Bio::SeqFeature::Annotated=HASH(0x35ac0a0)') called at chado-1.31/load/bin/ > gmod_bulk_load_gff3.pl line 851 > > > > > > > > > > > > > > > > > > > > _______________________________________________ > > Gmod-schema mailing list > > Gmo...@li... > > https://lists.sourceforge.net/lists/listinfo/gmod-schema > > > > > > > > -- > > > > ------------------------------------------------------------------------ > > Scott Cain, Ph. D. scott at scottcain > dot net > > GMOD Coordinator (http://gmod.org/) 216-392-3087 > > Ontario Institute for Cancer Research > > > > > > > > -- > > > > ------------------------------------------------------------------------ > > Scott Cain, Ph. D. scott at scottcain > dot net > > GMOD Coordinator (http://gmod.org/) 216-392-3087 > > Ontario Institute for Cancer Research > > > > > > > > -- > > > > ------------------------------------------------------------------------ > > Scott Cain, Ph. D. scott at scottcain > dot net > > GMOD Coordinator (http://gmod.org/) 216-392-3087 > > Ontario Institute for Cancer Research > > > > -- > ------------------------------------------------------------------------ > Scott Cain, Ph. D. scott at scottcain > dot net > GMOD Coordinator (http://gmod.org/) 216-392-3087 > Ontario Institute for Cancer Research > > > > -- > > ------------------------------------------------------------------------ > Scott Cain, Ph. D. scott at scottcain > dot net > GMOD Coordinator (http://gmod.org/) 216-392-3087 > Ontario Institute for Cancer Research > > > > -- > > ------------------------------------------------------------------------ > Scott Cain, Ph. D. scott at scottcain > dot net > GMOD Coordinator (http://gmod.org/) 216-392-3087 > Ontario Institute for Cancer Research > -- ------------------------------------------------------------------------ Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research |
From: RAJAONISON A. <mir...@ya...> - 2020-01-20 21:50:29
|
👌👌 Works perfectly ! It is quite complicated though. I thought of translating the perl script to python at some point. Thank you for your help Scott ! Miharimamy De : Scott Cain <sc...@sc...> Envoyé : mardi 21 janvier 2020 00:33 À : RAJAONISON Andriamiharimamy <mir...@ya...> Cc : GMOD Schema/Chado List <gmo...@li...> Objet : Re: [Gmod-schema] [ISSUE] Load gff in chado Ah, you have chado_properties (cv_id 3) but not feature_properties, which is where Note comes from. That should come from the make command but it appears to have not worked. To run the make command again, my recollection is that you have to remove a directory from the "load" directory. Do this: try running that make command again and select feature properties from the menu. If nothing happens, my recollection is that running "make clean" will remove the lock files that prevent loading an ontology twice (which is what it thinks you'll be trying to do I think). On Mon, Jan 20, 2020 at 1:11 PM RAJAONISON Andriamiharimamy <mir...@ya... <mailto:mir...@ya...> > wrote: Here it is select * from cvterm where cv_id=3 cvterm_id cv_id name definition 3 3 "version" "Chado schema version" De : Scott Cain <sc...@sc... <mailto:sc...@sc...> > Envoyé : lundi 20 janvier 2020 23:47 À : RAJAONISON Andriamiharimamy <mir...@ya... <mailto:mir...@ya...> > Cc : GMOD Schema/Chado List <gmo...@li... <mailto:gmo...@li...> > Objet : Re: [Gmod-schema] [ISSUE] Load gff in chado Can you also do a "select * from cvterm where cv_id=3" and show us that? On Mon, Jan 20, 2020 at 12:45 PM RAJAONISON Andriamiharimamy <mir...@ya... <mailto:mir...@ya...> > wrote: Thank you, it worked but I run into the missing “'Note' cvterm”. ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: I couldn't find the 'Note' cvterm in the database; Did you load the feature property controlled vocabulary? STACK: Error::throw STACK: Bio::Root::Root::throw /usr/local/share/perl5/Bio/Root/Root.pm:447 STACK: Bio::GMOD::DB::Adapter::handle_note /usr/local/share/perl5/Bio/GMOD/DB/Adapter.pm:3901 STACK: chado-1.31/load/bin/gmod_bulk_load_gff3.pl:974 <http://gmod_bulk_load_gff3.pl:974> ----------------------------------------------------------- I used “make” to create database and “make ontologies” to load ontologies. I installed : Relationship Ontology, Sequence Ontology, Gene Ontology and Chado Feature Properties A select in the cv table shows : 1 "null" 2 "local" "Locally created terms" 3 "chado_properties" "Terms that are used in the chadoprop table to describe the state of the database" 4 "relationship" 5 "synonym_type" 6 "cvterm_property_type" 7 "anonymous" 8 "sequence" 9 "biological_process" 10 "molecular_function" 11 "cellular_component" 12 "external" Thank you Miharimamy De : Scott Cain <sc...@sc... <mailto:sc...@sc...> > Envoyé : lundi 20 janvier 2020 22:39 À : RAJAONISON Andriamiharimamy <mir...@ya... <mailto:mir...@ya...> > Cc : GMOD Schema/Chado List <gmo...@li... <mailto:gmo...@li...> > Objet : Re: [Gmod-schema] [ISSUE] Load gff in chado Is_circular isn't a valid GFF tag. You can change it to is_circular to fix it. On Mon, Jan 20, 2020 at 11:24 AM RAJAONISON Andriamiharimamy <mir...@ya... <mailto:mir...@ya...> > wrote: Hi again Scott, Here is the modified header of the file : ##gff-version 3 NZ_CP027390.1 . chromosome 1 5901472 . . . ID=NZ_CP027390.1;Name=NZ_CP027390.1 NZ_CP027391.1 . plasmid 1 98724 . . . ID=NZ_CP027391.1;Name=NZ_CP027391.1 NZ_CP027390.1 RefSeq gene 931 1197 . + . ID=gene0;Name=AX062_RS00010;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS00010 NZ_CP027390.1 RefSeq gene 1191 1577 . - . ID=gene1;Name=AX062_RS00015;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS00015 And here is the output of the script. ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: The following tag(s) are illegal and are causing this parser to die: Is_circular STACK: Error::throw STACK: Bio::Root::Root::throw /usr/local/share/perl5/Bio/Root/Root.pm:447 STACK: Bio::FeatureIO::gff::_handle_feature /usr/local/share/perl5/Bio/FeatureIO/gff.pm:787 <http://gff.pm:787> STACK: Bio::FeatureIO::gff::next_feature /usr/local/share/perl5/Bio/FeatureIO/gff.pm:187 <http://gff.pm:187> STACK: chado-1.31/load/bin/gmod_bulk_load_gff3.pl:782 <http://gmod_bulk_load_gff3.pl:782> ----------------------------------------------------------- I checked the tag in the file and here is the snippet causing issue: NZ_CP027390.1 RefSeq region 1 5802748 . + . ID=id0;Dbxref=taxon:562;Is_circular=true;Name=ANONYMOUS;collected-by=CDC;country=USA;gbkey=Src;genome=chromosome;mol_type=genomic DNA;serov ar=E. coli O26:Pending;strain=2015C-4944 NZ_CP027390.1 cmsearch riboswitch 4764335 4764483 . + . ID=id113;Dbxref=RFAM:RF00050;Note=FMN riboswitch;bound_moiety=flavin mononucleotide;gbkey=regulatory;inference=COORDINATES: profile:INFERNAL:1.1.1;regulatory_class=riboswitch NZ_CP027390.1 RefSeq direct_repeat 5085858 5086435 . + . ID=id120;gbkey=repeat_region;inference=COORDINATES: alignment:pilercr:v1.02;rpt_family=CRISPR;rpt_type=direct;rpt_unit_range=508586 2..5085885;rpt_unit_seq=tccccgcgccagcggggataaacc NZ_CP027390.1 RefSeq direct_repeat 5112137 5112653 . + . ID=id121;gbkey=repeat_region;inference=COORDINATES: alignment:pilercr:v1.02;rpt_family=CRISPR;rpt_type=direct;rpt_unit_range=511213 7..5112165;rpt_unit_seq=gtgttccccgcgccagcggggataaaccg NZ_CP027391.1 RefSeq region 1 98724 . + . ID=id147;Dbxref=taxon:562;Is_circular=true;Name=unnamed;collected-by=CDC;country=USA;gbkey=Src;genome=plasmid;mol_type=genomic DNA;plasmid- name=unnamed;serovar=E. coli O26:Pending;strain=2015C-4944 Thank you Miharimamy De : Scott Cain <sc...@sc... <mailto:sc...@sc...> > Envoyé : lundi 20 janvier 2020 22:03 À : RAJAONISON Andriamiharimamy <mir...@ya... <mailto:mir...@ya...> > Cc : GMOD Schema/Chado List <gmo...@li... <mailto:gmo...@li...> > Objet : Re: [Gmod-schema] [ISSUE] Load gff in chado Hi Miharimamy, The snippet of GFF that you posted had a chromosome named "NZ_CP027390.1" which looks fine to me, but the error message is about a reference sequence named "NZ_CP027391.1". Is it defined anywhere in your GFF? Scott On Mon, Jan 20, 2020 at 10:58 AM RAJAONISON Andriamiharimamy <mir...@ya... <mailto:mir...@ya...> > wrote: > > Hi Scott, > > > > Even with the --recreate_cache and --remove_lock, I still get the same error. > > > > Miharimamy > > > > > > (Re)creating the uniquename cache in the database... > > Creating table... > > Populating table... > > Creating indexes... > > Adjusting the primary key sequences (if necessary)...Done. > > Preparing data for inserting into the chado database > > (This may take a while ...) > > Unable to find srcfeature NZ_CP027391.1 in the database. > > Perhaps you need to rerun your data load with the '--recreate_cache' option. at /usr/local/share/perl5/Bio/GMOD/DB/Adapter.pm line 4605, <GEN0> line 5939. > > Bio::GMOD::DB::Adapter::src_second_chance('Bio::GMOD::DB::Adapter=HASH(0x25623e0)', 'Bio::SeqFeature::Annotated=HASH(0x2680c18)') called at chado-1.31/load/bin/gmod_bulk_load_gff3.pl <http://gmod_bulk_load_gff3.pl> line 851 > > > > Abnormal termination, trying to clean up... > > > > Attempting to clean up the loader temp table (so that --recreate_cache > > won't be needed)... > > Trying to remove the run lock (so that --remove_lock won't be needed)... > > Exiting... > > > > > > > > De : Scott Cain <sc...@sc... <mailto:sc...@sc...> > > Envoyé : lundi 20 janvier 2020 20:56 > À : RAJAONISON Andriamiharimamy <mir...@ya... <mailto:mir...@ya...> > > Cc : GMOD Schema/Chado List <gmo...@li... <mailto:gmo...@li...> > > Objet : Re: [Gmod-schema] [ISSUE] Load gff in chado > > > > Hi Miharimamy, > > > > Did you try running with the --recreate_cache option as suggested by the error message? > > > > On Mon, Jan 20, 2020 at 6:54 AM RAJAONISON Andriamiharimamy <mir...@ya... <mailto:mir...@ya...> > wrote: > > Hi Scott, > > You were right, the issue lied in the line 2. The field separator was 4 spaces instead of tabulation. > > I converted it and tried to load the file back but went back to the first error. > > > > Thank you for your time, > > Miharimamy > > > > Preparing data for inserting into the chado database > > (This may take a while ...) > > Unable to find srcfeature NZ_CP027391.1 in the database. > > Perhaps you need to rerun your data load with the '--recreate_cache' option. at /usr/local/share/perl5/Bio/GMOD/DB/Adapter.pm line 4605, <GEN0> line 5939. > > Bio::GMOD::DB::Adapter::src_second_chance('Bio::GMOD::DB::Adapter=HASH(0x297c120)', 'Bio::SeqFeature::Annotated=HASH(0x2bcd420)') called at chado-1.31/load/bin/gmod_bulk_load_gff3.pl <http://gmod_bulk_load_gff3.pl> line 851 > > > > Abnormal termination, trying to clean up... > > > > Attempting to clean up the loader temp table (so that --recreate_cache > > won't be needed)... > > Trying to remove the run lock (so that --remove_lock won't be needed)... > > Exiting... > > > > > > ##gff-version 3 > > NZ_CP027390.1 . chromosome 1 5901472 . . . ID=NZ_CP027390.1;Name=NZ_CP027390.1 > > NZ_CP027390.1 RefSeq gene 931 1197 . + . ID=gene0;Name=AX062_RS00010;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS00010 > > NZ_CP027390.1 RefSeq gene 1191 1577 . - . ID=gene1;Name=AX062_RS00015;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS00015 > > NZ_CP027390.1 RefSeq gene 10194 10736 . + . ID=gene10;Name=AX062_RS00060;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS00060 > > NZ_CP027390.1 RefSeq gene 80547 81860 . + . ID=gene100;Name=AX062_RS00510;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS00510 > > NZ_CP027390.1 RefSeq gene 918692 918997 . - . ID=gene1000;Name=AX062_RS05010;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS05010 > > NZ_CP027390.1 RefSeq gene 919105 919815 . + . ID=gene1001;Name=AX062_RS05015;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS05015 > > NZ_CP027390.1 RefSeq gene 919818 920378 . - . ID=gene1002;Name=AX062_RS05020;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS05020 > > NZ_CP027390.1 RefSeq gene 920413 920754 . - . ID=gene1003;Name=AX062_RS05025;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS05025 > > NZ_CP027390.1 RefSeq gene 920889 921215 . + . ID=gene1004;Name=AX062_RS05030;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS05030 > > NZ_CP027390.1 RefSeq gene 921252 921440 . + . ID=gene1005;Name=AX062_RS05035;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS05035 > > NZ_CP027390.1 RefSeq gene 921421 922635 . + . ID=gene1006;Name=AX062_RS05040;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS05040 > > > > > > De : Scott Cain <sc...@sc... <mailto:sc...@sc...> > > Envoyé : vendredi 17 janvier 2020 21:49 > À : RAJAONISON Andriamiharimamy <mir...@ya... <mailto:mir...@ya...> > > Cc : GMOD Schema/Chado List <gmo...@li... <mailto:gmo...@li...> > > Objet : Re: [Gmod-schema] [ISSUE] Load gff in chado > > > > Hi Miharimamy, > > > > The end of the error message where it says "line 2" I'm pretty sure means the problem is in line 2 of your GFF file. Is that the line you added? What does it look like? > > > > Scott > > > > > > On Fri, Jan 17, 2020 at 5:57 AM RAJAONISON Andriamiharimamy <mir...@ya... <mailto:mir...@ya...> > wrote: > > Hi Scott, > > > > Thank you again for your help because adding the parent feature removes the srcfeature related error. > > Yet I get another issue that I don’t understand : > > > > --------------------- WARNING --------------------- > > MSG: Can not set Bio::Location::Simple::end() equal to start; start not set > > --------------------------------------------------- > > Use of uninitialized value $featuretype in pattern match (m//) at chado-1.31/load/bin/gmod_bulk_load_gff3.pl <http://gmod_bulk_load_gff3.pl> line 819, <GEN0> line 2. > > Use of uninitialized value $featuretype in pattern match (m//) at chado-1.31/load/bin/gmod_bulk_load_gff3.pl <http://gmod_bulk_load_gff3.pl> line 820, <GEN0> line 2. > > > > ------------- EXCEPTION: Bio::Root::Exception ------------- > > MSG: no cvterm for > > STACK: Error::throw > > STACK: Bio::Root::Root::throw /usr/local/share/perl5/Bio/Root/Root.pm:447 > > STACK: Bio::GMOD::DB::Adapter::get_type /usr/local/share/perl5/Bio/GMOD/DB/Adapter.pm:4629 > > STACK: chado-1.31/load/bin/gmod_bulk_load_gff3.pl:848 <http://gmod_bulk_load_gff3.pl:848> > > ----------------------------------------------------------- > > > > Abnormal termination, trying to clean up... > > > > Attempting to clean up the loader temp table (so that --recreate_cache > > won't be needed)... > > Trying to remove the run lock (so that --remove_lock won't be needed)... > > Exiting... > > > > Thank you > > > > Miharimamy > > > > > > De : Scott Cain <sc...@sc... <mailto:sc...@sc...> > > Envoyé : jeudi 16 janvier 2020 22:31 > À : RAJAONISON Andriamiharimamy <mir...@ya... <mailto:mir...@ya...> > > Cc : GMOD Schema/Chado List <gmo...@li... <mailto:gmo...@li...> > > Objet : Re: [Gmod-schema] [ISSUE] Load gff in chado > > > > Hi Miharimamy, > > > > Thanks for sending this report. Generally, loading GFF into Chado can be difficult, as the perl-based loader that you are using can be quite particular about the format of the GFF and producers of GFF generally aren't so particular. Since the loader makes the (in my view, correct) decision to not load anything if it can't load everything in a file, it quits. So, taking each of the problems you found in order: > > > > 1. "dbxref=NCBI:" isn't a valid accession (presumably there should be something after the "NCBI:") and so the loader won't continue. Two options here are to contact SGD and point out that there is a problem with their GFF, or delete the offending item and try again, which it looks like you did for item 2. > > > > 2. If the item Note is missing from the cvterm table, I think that probably means that you didn't install the schema using the "make" procedure that would have installed some necessary items in the cv and cvterm table. > > > > 3 and 4: Messages that srcfeatures can't be found mean that the feature on which the features in your GFF reside hasn't been defined yet (that is, the thing referred to in column 1 of the GFF doesn't exist). Frequently, creators of GFF don't define the reference sequence in the GFF for whatever reason (it's not required by the GFF3 spec, since it might be credibly defined elsewhere). To define it in the GFF you have, add a line before anything else that looks something like this: > > > > NZ_CP027390.1 . chromosome 1 1234566 . . . ID=NZ_CP027390.1;Name=NZ_CP027390.1 > > > > Where "NZ_CP027390.1" is the name of the reference sequence, "chromosome" is the Sequence Ontology term for the type of thing (other options would include "contig" and "supercontig"), and the "123456" is the length of the sequence. > > > > Finally, I would add that using Chado through Tripal frequently makes life easier, though I think all of these issues would still have been problems (with the probable exception of item 2--Tripal would have initialized the "Note" item in cvterm I think). > > > > Scott > > > > > > On Thu, Jan 16, 2020 at 11:15 AM RAJAONISON Andriamiharimamy via Gmod-schema <gmo...@li... <mailto:gmo...@li...> > wrote: > > Hi, > > > > Hi, > > Hope this mail will find you well. Send you my best wishes for this new year. > > I am reaching to you because I have issues loading GFF files into you Chado. I tried several files but none of them seems to work. > > > > Thank you in advance for you help > > > > Miharimamy > > > > The S.cerevisiae example in gmod.org <http://gmod.org> > > Command : > > chado-1.31/load/bin/gmod_bulk_load_gff3.pl <http://gmod_bulk_load_gff3.pl> --organism yeast --gfffile saccharomyces/saccharomyces_cerevisiae.gff.sorted --fastafile saccharomyces/saccharomyces_cerevisiae.gff.sorted.fasta > > > > Output : > > ------------- EXCEPTION: Bio::Root::Exception ------------- > > MSG: Error in line: > > chrIII SGD chromosome 1 316620 . . . ID=chrIII;dbxref=NCBI:;Name=chrIII > > > > Dbxref value 'NCBI:' did not conform to GFF3 specification > > STACK: Error::throw > > STACK: Bio::Root::Root::throw /usr/local/share/perl5/Bio/Root/Root.pm:447 > > STACK: Bio::FeatureIO::gff::_handle_feature /usr/local/share/perl5/Bio/FeatureIO/gff.pm:659 <http://gff.pm:659> > > STACK: Bio::FeatureIO::gff::next_feature /usr/local/share/perl5/Bio/FeatureIO/gff.pm:187 <http://gff.pm:187> > > STACK: chado-1.31/load/bin/gmod_bulk_load_gff3.pl:782 <http://gmod_bulk_load_gff3.pl:782> > > ----------------------------------------------------------- > > S.cerevisiae without dbxref > > Command : > > chado-1.31/load/bin/gmod_bulk_load_gff3.pl <http://gmod_bulk_load_gff3.pl> --organism yeast --gfffile saccharomyces/test_saccharomyces_ncbi --fastafile saccharomyces/saccharomyces_cerevisiae.gff.sorted.fasta > > > > Output : > > ------------- EXCEPTION: Bio::Root::Exception ------------- > > MSG: I couldn't find the 'Note' cvterm in the database; > > Did you load the feature property controlled vocabulary? > > STACK: Error::throw > > STACK: Bio::Root::Root::throw /usr/local/share/perl5/Bio/Root/Root.pm:447 > > STACK: Bio::GMOD::DB::Adapter::handle_note /usr/local/share/perl5/Bio/GMOD/DB/Adapter.pm:3901 > > STACK: chado-1.31/load/bin/gmod_bulk_load_gff3.pl:974 <http://gmod_bulk_load_gff3.pl:974> > > > > A GFF from NCBI > > Command : > > chado-1.31/load/bin/gmod_bulk_load_gff3.pl <http://gmod_bulk_load_gff3.pl> --organism bacteria --gfffile GCF_000008865.2/GCF_003018035.1_ASM301803v1_genomic.gff.sorted > > > > Output : > > Unable to find srcfeature NZ_CP027390.1 in the database. > > Perhaps you need to rerun your data load with the '--recreate_cache' option. at /usr/local/share/perl5/Bio/GMOD/DB/Adapter.pm line 4605, <GEN0> line 2. > > Bio::GMOD::DB::Adapter::src_second_chance('Bio::GMOD::DB::Adapter=HASH(0x241a8d0)', 'Bio::SeqFeature::Annotated=HASH(0x26c61d0)') called at chado-1.31/load/bin/gmod_bulk_load_gff3.pl <http://gmod_bulk_load_gff3.pl> line 851 > > > > A GFF3 from prokka > > Command : > > chado-1.31/load/bin/gmod_bulk_load_gff3.pl <http://gmod_bulk_load_gff3.pl> --organism bacteria -gfffile GFF3/GCF_000455285.gff.sorted --fastafile GFF3/GCF_000455285.gff.sorted.fasta > > Output : > > Unable to find srcfeature gnl|Prokka|GCF_000455285_1 in the database. > > Perhaps you need to rerun your data load with the '--recreate_cache' option. at /usr/local/share/perl5/Bio/GMOD/DB/Adapter.pm line 4605, <GEN0> line 2. > > Bio::GMOD::DB::Adapter::src_second_chance('Bio::GMOD::DB::Adapter=HASH(0x33008f0)', 'Bio::SeqFeature::Annotated=HASH(0x35ac0a0)') called at chado-1.31/load/bin/gmod_bulk_load_gff3.pl <http://gmod_bulk_load_gff3.pl> line 851 > > > > > > > > > > _______________________________________________ > Gmod-schema mailing list > Gmo...@li... <mailto:Gmo...@li...> > https://lists.sourceforge.net/lists/listinfo/gmod-schema > > > > -- > > ------------------------------------------------------------------------ > Scott Cain, Ph. D. scott at scottcain dot net > GMOD Coordinator (http://gmod.org/) 216-392-3087 > Ontario Institute for Cancer Research > > > > -- > > ------------------------------------------------------------------------ > Scott Cain, Ph. D. scott at scottcain dot net > GMOD Coordinator (http://gmod.org/) 216-392-3087 > Ontario Institute for Cancer Research > > > > -- > > ------------------------------------------------------------------------ > Scott Cain, Ph. D. scott at scottcain dot net > GMOD Coordinator (http://gmod.org/) 216-392-3087 > Ontario Institute for Cancer Research -- ------------------------------------------------------------------------ Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research -- ------------------------------------------------------------------------ Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research -- ------------------------------------------------------------------------ Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research -- ------------------------------------------------------------------------ Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research |
From: Scott C. <sc...@sc...> - 2020-01-20 21:54:59
|
I agree that it's overly complicated. These days, most people who want to use Chado do so by installing Tripal, which makes it a lot easier. Believe me, I would be thrilled to have a simpler python script that still does the same job! On Mon, Jan 20, 2020 at 1:50 PM RAJAONISON Andriamiharimamy < mir...@ya...> wrote: > 👌👌 > > Works perfectly ! > > It is quite complicated though. I thought of translating the perl script > to python at some point. > > Thank you for your help Scott ! > > Miharimamy > > > > *De :* Scott Cain <sc...@sc...> > *Envoyé :* mardi 21 janvier 2020 00:33 > *À :* RAJAONISON Andriamiharimamy <mir...@ya...> > *Cc :* GMOD Schema/Chado List <gmo...@li...> > *Objet :* Re: [Gmod-schema] [ISSUE] Load gff in chado > > > > Ah, you have chado_properties (cv_id 3) but not feature_properties, which > is where Note comes from. That should come from the make command but it > appears to have not worked. To run the make command again, my recollection > is that you have to remove a directory from the "load" directory. Do this: > try running that make command again and select feature properties from the > menu. If nothing happens, my recollection is that running "make clean" > will remove the lock files that prevent loading an ontology twice (which is > what it thinks you'll be trying to do I think). > > > > > > On Mon, Jan 20, 2020 at 1:11 PM RAJAONISON Andriamiharimamy < > mir...@ya...> wrote: > > Here it is > > select * from cvterm where cv_id=3 > > > > cvterm_id cv_id name definition > > 3 3 "version" "Chado schema version" > > *De :* Scott Cain <sc...@sc...> > *Envoyé :* lundi 20 janvier 2020 23:47 > *À :* RAJAONISON Andriamiharimamy <mir...@ya...> > *Cc :* GMOD Schema/Chado List <gmo...@li...> > *Objet :* Re: [Gmod-schema] [ISSUE] Load gff in chado > > > > Can you also do a "select * from cvterm where cv_id=3" and show us that? > > > > On Mon, Jan 20, 2020 at 12:45 PM RAJAONISON Andriamiharimamy < > mir...@ya...> wrote: > > Thank you, it worked but I run into the missing “'Note' cvterm”. > > > > ------------- EXCEPTION: Bio::Root::Exception ------------- > > MSG: I couldn't find the 'Note' cvterm in the database; > > Did you load the feature property controlled vocabulary? > > STACK: Error::throw > > STACK: Bio::Root::Root::throw /usr/local/share/perl5/Bio/Root/Root.pm:447 > > STACK: Bio::GMOD::DB::Adapter::handle_note > /usr/local/share/perl5/Bio/GMOD/DB/Adapter.pm:3901 > > STACK: chado-1.31/load/bin/gmod_bulk_load_gff3.pl:974 > > ----------------------------------------------------------- > > > > I used “make” to create database and “make ontologies” to load > ontologies. > > I installed : Relationship Ontology, Sequence Ontology, Gene Ontology and > Chado Feature Properties > > A select in the cv table shows : > > 1 "null" > > 2 "local" "Locally created terms" > > 3 "chado_properties" "Terms that are used in the > chadoprop table to describe the state of the database" > > 4 "relationship" > > 5 "synonym_type" > > 6 "cvterm_property_type" > > 7 "anonymous" > > 8 "sequence" > > 9 "biological_process" > > 10 "molecular_function" > > 11 "cellular_component" > > 12 "external" > > > > Thank you > > Miharimamy > > > > *De :* Scott Cain <sc...@sc...> > *Envoyé :* lundi 20 janvier 2020 22:39 > *À :* RAJAONISON Andriamiharimamy <mir...@ya...> > *Cc :* GMOD Schema/Chado List <gmo...@li...> > *Objet :* Re: [Gmod-schema] [ISSUE] Load gff in chado > > > > Is_circular isn't a valid GFF tag. You can change it to is_circular to > fix it. > > > > On Mon, Jan 20, 2020 at 11:24 AM RAJAONISON Andriamiharimamy < > mir...@ya...> wrote: > > Hi again Scott, > > > > Here is the modified header of the file : > > > > ##gff-version 3 > > NZ_CP027390.1 . chromosome 1 5901472 . . > . ID=NZ_CP027390.1;Name=NZ_CP027390.1 > > NZ_CP027391.1 . plasmid 1 98724 . . . > ID=NZ_CP027391.1;Name=NZ_CP027391.1 > > NZ_CP027390.1 RefSeq gene 931 1197 . + . > ID=gene0;Name=AX062_RS00010;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS00010 > > NZ_CP027390.1 RefSeq gene 1191 1577 . - . > ID=gene1;Name=AX062_RS00015;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS00015 > > > > And here is the output of the script. > > > > ------------- EXCEPTION: Bio::Root::Exception ------------- > > MSG: The following tag(s) are illegal and are causing this parser to die: > Is_circular > > STACK: Error::throw > > STACK: Bio::Root::Root::throw /usr/local/share/perl5/Bio/Root/Root.pm:447 > > STACK: Bio::FeatureIO::gff::_handle_feature > /usr/local/share/perl5/Bio/FeatureIO/gff.pm:787 > > STACK: Bio::FeatureIO::gff::next_feature > /usr/local/share/perl5/Bio/FeatureIO/gff.pm:187 > > STACK: chado-1.31/load/bin/gmod_bulk_load_gff3.pl:782 > > ----------------------------------------------------------- > > > > I checked the tag in the file and here is the snippet causing issue: > > > > NZ_CP027390.1 RefSeq region 1 5802748 . + . > ID=id0;Dbxref=taxon:562;Is_circular=true;Name=ANONYMOUS;collected-by=CDC;country=USA;gbkey=Src;genome=chromosome;mol_type=genomic > DNA;serov > > ar=E. coli O26:Pending;strain=2015C-4944 > > NZ_CP027390.1 cmsearch riboswitch 4764335 4764483 . > + . ID=id113;Dbxref=RFAM:RF00050;Note=FMN > riboswitch;bound_moiety=flavin > mononucleotide;gbkey=regulatory;inference=COORDINATES: > > profile:INFERNAL:1.1.1;regulatory_class=riboswitch > > NZ_CP027390.1 RefSeq direct_repeat 5085858 5086435 . + > . ID=id120;gbkey=repeat_region;inference=COORDINATES: > alignment:pilercr:v1.02;rpt_family=CRISPR;rpt_type=direct;rpt_unit_range=508586 > > 2..5085885;rpt_unit_seq=tccccgcgccagcggggataaacc > > NZ_CP027390.1 RefSeq direct_repeat 5112137 5112653 . + > . ID=id121;gbkey=repeat_region;inference=COORDINATES: > alignment:pilercr:v1.02;rpt_family=CRISPR;rpt_type=direct;rpt_unit_range=511213 > > 7..5112165;rpt_unit_seq=gtgttccccgcgccagcggggataaaccg > > NZ_CP027391.1 RefSeq region 1 98724 . + . > ID=id147;Dbxref=taxon:562;Is_circular=true;Name=unnamed;collected-by=CDC;country=USA;gbkey=Src;genome=plasmid;mol_type=genomic > DNA;plasmid- > > name=unnamed;serovar=E. coli O26:Pending;strain=2015C-4944 > > > > Thank you > > Miharimamy > > > > *De :* Scott Cain <sc...@sc...> > *Envoyé :* lundi 20 janvier 2020 22:03 > *À :* RAJAONISON Andriamiharimamy <mir...@ya...> > *Cc :* GMOD Schema/Chado List <gmo...@li...> > *Objet :* Re: [Gmod-schema] [ISSUE] Load gff in chado > > > > Hi Miharimamy, > > The snippet of GFF that you posted had a chromosome named "NZ_CP027390.1" > which looks fine to me, but the error message is about a reference sequence > named "NZ_CP027391.1". Is it defined anywhere in your GFF? > > Scott > > > On Mon, Jan 20, 2020 at 10:58 AM RAJAONISON Andriamiharimamy < > mir...@ya...> wrote: > > > > Hi Scott, > > > > > > > > Even with the --recreate_cache and --remove_lock, I still get the same > error. > > > > > > > > Miharimamy > > > > > > > > > > > > (Re)creating the uniquename cache in the database... > > > > Creating table... > > > > Populating table... > > > > Creating indexes... > > > > Adjusting the primary key sequences (if necessary)...Done. > > > > Preparing data for inserting into the chado database > > > > (This may take a while ...) > > > > Unable to find srcfeature NZ_CP027391.1 in the database. > > > > Perhaps you need to rerun your data load with the '--recreate_cache' > option. at /usr/local/share/perl5/Bio/GMOD/DB/Adapter.pm line 4605, <GEN0> > line 5939. > > > > > Bio::GMOD::DB::Adapter::src_second_chance('Bio::GMOD::DB::Adapter=HASH(0x25623e0)', > 'Bio::SeqFeature::Annotated=HASH(0x2680c18)') called at chado-1.31/load/bin/ > gmod_bulk_load_gff3.pl line 851 > > > > > > > > Abnormal termination, trying to clean up... > > > > > > > > Attempting to clean up the loader temp table (so that --recreate_cache > > > > won't be needed)... > > > > Trying to remove the run lock (so that --remove_lock won't be needed)... > > > > Exiting... > > > > > > > > > > > > > > > > De : Scott Cain <sc...@sc...> > > Envoyé : lundi 20 janvier 2020 20:56 > > À : RAJAONISON Andriamiharimamy <mir...@ya...> > > Cc : GMOD Schema/Chado List <gmo...@li...> > > Objet : Re: [Gmod-schema] [ISSUE] Load gff in chado > > > > > > > > Hi Miharimamy, > > > > > > > > Did you try running with the --recreate_cache option as suggested by the > error message? > > > > > > > > On Mon, Jan 20, 2020 at 6:54 AM RAJAONISON Andriamiharimamy < > mir...@ya...> wrote: > > > > Hi Scott, > > > > You were right, the issue lied in the line 2. The field separator was 4 > spaces instead of tabulation. > > > > I converted it and tried to load the file back but went back to the > first error. > > > > > > > > Thank you for your time, > > > > Miharimamy > > > > > > > > Preparing data for inserting into the chado database > > > > (This may take a while ...) > > > > Unable to find srcfeature NZ_CP027391.1 in the database. > > > > Perhaps you need to rerun your data load with the '--recreate_cache' > option. at /usr/local/share/perl5/Bio/GMOD/DB/Adapter.pm line 4605, <GEN0> > line 5939. > > > > > Bio::GMOD::DB::Adapter::src_second_chance('Bio::GMOD::DB::Adapter=HASH(0x297c120)', > 'Bio::SeqFeature::Annotated=HASH(0x2bcd420)') called at chado-1.31/load/bin/ > gmod_bulk_load_gff3.pl line 851 > > > > > > > > Abnormal termination, trying to clean up... > > > > > > > > Attempting to clean up the loader temp table (so that --recreate_cache > > > > won't be needed)... > > > > Trying to remove the run lock (so that --remove_lock won't be needed)... > > > > Exiting... > > > > > > > > > > > > ##gff-version 3 > > > > NZ_CP027390.1 . chromosome 1 5901472 . . > . ID=NZ_CP027390.1;Name=NZ_CP027390.1 > > > > NZ_CP027390.1 RefSeq gene 931 1197 . + . > ID=gene0;Name=AX062_RS00010;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS00010 > > > > NZ_CP027390.1 RefSeq gene 1191 1577 . - . > ID=gene1;Name=AX062_RS00015;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS00015 > > > > NZ_CP027390.1 RefSeq gene 10194 10736 . + . > ID=gene10;Name=AX062_RS00060;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS00060 > > > > NZ_CP027390.1 RefSeq gene 80547 81860 . + . > ID=gene100;Name=AX062_RS00510;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS00510 > > > > NZ_CP027390.1 RefSeq gene 918692 918997 . - . > ID=gene1000;Name=AX062_RS05010;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS05010 > > > > NZ_CP027390.1 RefSeq gene 919105 919815 . + . > ID=gene1001;Name=AX062_RS05015;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS05015 > > > > NZ_CP027390.1 RefSeq gene 919818 920378 . - . > ID=gene1002;Name=AX062_RS05020;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS05020 > > > > NZ_CP027390.1 RefSeq gene 920413 920754 . - . > ID=gene1003;Name=AX062_RS05025;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS05025 > > > > NZ_CP027390.1 RefSeq gene 920889 921215 . + . > ID=gene1004;Name=AX062_RS05030;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS05030 > > > > NZ_CP027390.1 RefSeq gene 921252 921440 . + . > ID=gene1005;Name=AX062_RS05035;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS05035 > > > > NZ_CP027390.1 RefSeq gene 921421 922635 . + . > ID=gene1006;Name=AX062_RS05040;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS05040 > > > > > > > > > > > > De : Scott Cain <sc...@sc...> > > Envoyé : vendredi 17 janvier 2020 21:49 > > À : RAJAONISON Andriamiharimamy <mir...@ya...> > > Cc : GMOD Schema/Chado List <gmo...@li...> > > Objet : Re: [Gmod-schema] [ISSUE] Load gff in chado > > > > > > > > Hi Miharimamy, > > > > > > > > The end of the error message where it says "line 2" I'm pretty sure > means the problem is in line 2 of your GFF file. Is that the line you > added? What does it look like? > > > > > > > > Scott > > > > > > > > > > > > On Fri, Jan 17, 2020 at 5:57 AM RAJAONISON Andriamiharimamy < > mir...@ya...> wrote: > > > > Hi Scott, > > > > > > > > Thank you again for your help because adding the parent feature removes > the srcfeature related error. > > > > Yet I get another issue that I don’t understand : > > > > > > > > --------------------- WARNING --------------------- > > > > MSG: Can not set Bio::Location::Simple::end() equal to start; start not > set > > > > --------------------------------------------------- > > > > Use of uninitialized value $featuretype in pattern match (m//) at > chado-1.31/load/bin/gmod_bulk_load_gff3.pl line 819, <GEN0> line 2. > > > > Use of uninitialized value $featuretype in pattern match (m//) at > chado-1.31/load/bin/gmod_bulk_load_gff3.pl line 820, <GEN0> line 2. > > > > > > > > ------------- EXCEPTION: Bio::Root::Exception ------------- > > > > MSG: no cvterm for > > > > STACK: Error::throw > > > > STACK: Bio::Root::Root::throw /usr/local/share/perl5/Bio/Root/Root.pm:447 > > > > STACK: Bio::GMOD::DB::Adapter::get_type > /usr/local/share/perl5/Bio/GMOD/DB/Adapter.pm:4629 > > > > STACK: chado-1.31/load/bin/gmod_bulk_load_gff3.pl:848 > > > > ----------------------------------------------------------- > > > > > > > > Abnormal termination, trying to clean up... > > > > > > > > Attempting to clean up the loader temp table (so that --recreate_cache > > > > won't be needed)... > > > > Trying to remove the run lock (so that --remove_lock won't be needed)... > > > > Exiting... > > > > > > > > Thank you > > > > > > > > Miharimamy > > > > > > > > > > > > De : Scott Cain <sc...@sc...> > > Envoyé : jeudi 16 janvier 2020 22:31 > > À : RAJAONISON Andriamiharimamy <mir...@ya...> > > Cc : GMOD Schema/Chado List <gmo...@li...> > > Objet : Re: [Gmod-schema] [ISSUE] Load gff in chado > > > > > > > > Hi Miharimamy, > > > > > > > > Thanks for sending this report. Generally, loading GFF into Chado can > be difficult, as the perl-based loader that you are using can be quite > particular about the format of the GFF and producers of GFF generally > aren't so particular. Since the loader makes the (in my view, correct) > decision to not load anything if it can't load everything in a file, it > quits. So, taking each of the problems you found in order: > > > > > > > > 1. "dbxref=NCBI:" isn't a valid accession (presumably there should be > something after the "NCBI:") and so the loader won't continue. Two options > here are to contact SGD and point out that there is a problem with their > GFF, or delete the offending item and try again, which it looks like you > did for item 2. > > > > > > > > 2. If the item Note is missing from the cvterm table, I think that > probably means that you didn't install the schema using the "make" > procedure that would have installed some necessary items in the cv and > cvterm table. > > > > > > > > 3 and 4: Messages that srcfeatures can't be found mean that the feature > on which the features in your GFF reside hasn't been defined yet (that is, > the thing referred to in column 1 of the GFF doesn't exist). Frequently, > creators of GFF don't define the reference sequence in the GFF for whatever > reason (it's not required by the GFF3 spec, since it might be credibly > defined elsewhere). To define it in the GFF you have, add a line before > anything else that looks something like this: > > > > > > > > NZ_CP027390.1 . chromosome 1 1234566 . . . > ID=NZ_CP027390.1;Name=NZ_CP027390.1 > > > > > > > > Where "NZ_CP027390.1" is the name of the reference sequence, > "chromosome" is the Sequence Ontology term for the type of thing (other > options would include "contig" and "supercontig"), and the "123456" is the > length of the sequence. > > > > > > > > Finally, I would add that using Chado through Tripal frequently makes > life easier, though I think all of these issues would still have been > problems (with the probable exception of item 2--Tripal would have > initialized the "Note" item in cvterm I think). > > > > > > > > Scott > > > > > > > > > > > > On Thu, Jan 16, 2020 at 11:15 AM RAJAONISON Andriamiharimamy via > Gmod-schema <gmo...@li...> wrote: > > > > Hi, > > > > > > > > Hi, > > > > Hope this mail will find you well. Send you my best wishes for this new > year. > > > > I am reaching to you because I have issues loading GFF files into you > Chado. I tried several files but none of them seems to work. > > > > > > > > Thank you in advance for you help > > > > > > > > Miharimamy > > > > > > > > The S.cerevisiae example in gmod.org > > > > Command : > > > > chado-1.31/load/bin/gmod_bulk_load_gff3.pl --organism yeast --gfffile > saccharomyces/saccharomyces_cerevisiae.gff.sorted --fastafile > saccharomyces/saccharomyces_cerevisiae.gff.sorted.fasta > > > > > > > > Output : > > > > ------------- EXCEPTION: Bio::Root::Exception ------------- > > > > MSG: Error in line: > > > > chrIII SGD chromosome 1 316620 . . . > ID=chrIII;dbxref=NCBI:;Name=chrIII > > > > > > > > Dbxref value 'NCBI:' did not conform to GFF3 specification > > > > STACK: Error::throw > > > > STACK: Bio::Root::Root::throw /usr/local/share/perl5/Bio/Root/Root.pm:447 > > > > STACK: Bio::FeatureIO::gff::_handle_feature > /usr/local/share/perl5/Bio/FeatureIO/gff.pm:659 > > > > STACK: Bio::FeatureIO::gff::next_feature > /usr/local/share/perl5/Bio/FeatureIO/gff.pm:187 > > > > STACK: chado-1.31/load/bin/gmod_bulk_load_gff3.pl:782 > > > > ----------------------------------------------------------- > > > > S.cerevisiae without dbxref > > > > Command : > > > > chado-1.31/load/bin/gmod_bulk_load_gff3.pl --organism yeast --gfffile > saccharomyces/test_saccharomyces_ncbi --fastafile > saccharomyces/saccharomyces_cerevisiae.gff.sorted.fasta > > > > > > > > Output : > > > > ------------- EXCEPTION: Bio::Root::Exception ------------- > > > > MSG: I couldn't find the 'Note' cvterm in the database; > > > > Did you load the feature property controlled vocabulary? > > > > STACK: Error::throw > > > > STACK: Bio::Root::Root::throw /usr/local/share/perl5/Bio/Root/Root.pm:447 > > > > STACK: Bio::GMOD::DB::Adapter::handle_note > /usr/local/share/perl5/Bio/GMOD/DB/Adapter.pm:3901 > > > > STACK: chado-1.31/load/bin/gmod_bulk_load_gff3.pl:974 > > > > > > > > A GFF from NCBI > > > > Command : > > > > chado-1.31/load/bin/gmod_bulk_load_gff3.pl --organism bacteria > --gfffile GCF_000008865.2/GCF_003018035.1_ASM301803v1_genomic.gff.sorted > > > > > > > > Output : > > > > Unable to find srcfeature NZ_CP027390.1 in the database. > > > > Perhaps you need to rerun your data load with the '--recreate_cache' > option. at /usr/local/share/perl5/Bio/GMOD/DB/Adapter.pm line 4605, <GEN0> > line 2. > > > > > Bio::GMOD::DB::Adapter::src_second_chance('Bio::GMOD::DB::Adapter=HASH(0x241a8d0)', > 'Bio::SeqFeature::Annotated=HASH(0x26c61d0)') called at chado-1.31/load/bin/ > gmod_bulk_load_gff3.pl line 851 > > > > > > > > A GFF3 from prokka > > > > Command : > > > > chado-1.31/load/bin/gmod_bulk_load_gff3.pl --organism bacteria -gfffile > GFF3/GCF_000455285.gff.sorted --fastafile > GFF3/GCF_000455285.gff.sorted.fasta > > > > Output : > > > > Unable to find srcfeature gnl|Prokka|GCF_000455285_1 in the database. > > > > Perhaps you need to rerun your data load with the '--recreate_cache' > option. at /usr/local/share/perl5/Bio/GMOD/DB/Adapter.pm line 4605, <GEN0> > line 2. > > > > > Bio::GMOD::DB::Adapter::src_second_chance('Bio::GMOD::DB::Adapter=HASH(0x33008f0)', > 'Bio::SeqFeature::Annotated=HASH(0x35ac0a0)') called at chado-1.31/load/bin/ > gmod_bulk_load_gff3.pl line 851 > > > > > > > > > > > > > > > > > > > > _______________________________________________ > > Gmod-schema mailing list > > Gmo...@li... > > https://lists.sourceforge.net/lists/listinfo/gmod-schema > > > > > > > > -- > > > > ------------------------------------------------------------------------ > > Scott Cain, Ph. D. scott at scottcain > dot net > > GMOD Coordinator (http://gmod.org/) 216-392-3087 > > Ontario Institute for Cancer Research > > > > > > > > -- > > > > ------------------------------------------------------------------------ > > Scott Cain, Ph. D. scott at scottcain > dot net > > GMOD Coordinator (http://gmod.org/) 216-392-3087 > > Ontario Institute for Cancer Research > > > > > > > > -- > > > > ------------------------------------------------------------------------ > > Scott Cain, Ph. D. scott at scottcain > dot net > > GMOD Coordinator (http://gmod.org/) 216-392-3087 > > Ontario Institute for Cancer Research > > > > -- > ------------------------------------------------------------------------ > Scott Cain, Ph. D. scott at scottcain > dot net > GMOD Coordinator (http://gmod.org/) 216-392-3087 > Ontario Institute for Cancer Research > > > > -- > > ------------------------------------------------------------------------ > Scott Cain, Ph. D. scott at scottcain > dot net > GMOD Coordinator (http://gmod.org/) 216-392-3087 > Ontario Institute for Cancer Research > > > > -- > > ------------------------------------------------------------------------ > Scott Cain, Ph. D. scott at scottcain > dot net > GMOD Coordinator (http://gmod.org/) 216-392-3087 > Ontario Institute for Cancer Research > > > > -- > > ------------------------------------------------------------------------ > Scott Cain, Ph. D. scott at scottcain > dot net > GMOD Coordinator (http://gmod.org/) 216-392-3087 > Ontario Institute for Cancer Research > -- ------------------------------------------------------------------------ Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research |
From: RAJAONISON A. <mir...@ya...> - 2020-01-20 21:59:18
|
I’m planning to use it inside a data warehouse (ETL etc…) project. So I don’t know if tripal gives me the freedom to do so. Here is the output : This GFF file has CDS and/or UTR features that do not belong to a 'central dogma' gene (ie, gene/transcript/CDS). The features of this type are being stored in the database as is. Skipping organism table since the load file is empty... Skipping analysis table since the load file is empty... Skipping db table since the load file is empty... Skipping dbxref table since the load file is empty... Skipping cv table since the load file is empty... Skipping cvterm table since the load file is empty... Loading data into feature table ... Loading data into featureloc table ... Loading data into feature_relationship table ... Loading data into featureprop table ... Skipping feature_cvterm table since the load file is empty... Loading data into synonym table ... Loading data into feature_synonym table ... Loading data into feature_dbxref table ... Skipping analysisfeature table since the load file is empty... Adding cvtermprop=MapReferenceType for 'plasmid' ... Loading sequences (if any) ... Done. Is everything allright ? De : Scott Cain <sc...@sc...> Envoyé : mardi 21 janvier 2020 00:55 À : RAJAONISON Andriamiharimamy <mir...@ya...> Cc : GMOD Schema/Chado List <gmo...@li...> Objet : Re: [Gmod-schema] [ISSUE] Load gff in chado I agree that it's overly complicated. These days, most people who want to use Chado do so by installing Tripal, which makes it a lot easier. Believe me, I would be thrilled to have a simpler python script that still does the same job! On Mon, Jan 20, 2020 at 1:50 PM RAJAONISON Andriamiharimamy <mir...@ya... <mailto:mir...@ya...> > wrote: 👌👌 Works perfectly ! It is quite complicated though. I thought of translating the perl script to python at some point. Thank you for your help Scott ! Miharimamy De : Scott Cain <sc...@sc... <mailto:sc...@sc...> > Envoyé : mardi 21 janvier 2020 00:33 À : RAJAONISON Andriamiharimamy <mir...@ya... <mailto:mir...@ya...> > Cc : GMOD Schema/Chado List <gmo...@li... <mailto:gmo...@li...> > Objet : Re: [Gmod-schema] [ISSUE] Load gff in chado Ah, you have chado_properties (cv_id 3) but not feature_properties, which is where Note comes from. That should come from the make command but it appears to have not worked. To run the make command again, my recollection is that you have to remove a directory from the "load" directory. Do this: try running that make command again and select feature properties from the menu. If nothing happens, my recollection is that running "make clean" will remove the lock files that prevent loading an ontology twice (which is what it thinks you'll be trying to do I think). On Mon, Jan 20, 2020 at 1:11 PM RAJAONISON Andriamiharimamy <mir...@ya... <mailto:mir...@ya...> > wrote: Here it is select * from cvterm where cv_id=3 cvterm_id cv_id name definition 3 3 "version" "Chado schema version" De : Scott Cain <sc...@sc... <mailto:sc...@sc...> > Envoyé : lundi 20 janvier 2020 23:47 À : RAJAONISON Andriamiharimamy <mir...@ya... <mailto:mir...@ya...> > Cc : GMOD Schema/Chado List <gmo...@li... <mailto:gmo...@li...> > Objet : Re: [Gmod-schema] [ISSUE] Load gff in chado Can you also do a "select * from cvterm where cv_id=3" and show us that? On Mon, Jan 20, 2020 at 12:45 PM RAJAONISON Andriamiharimamy <mir...@ya... <mailto:mir...@ya...> > wrote: Thank you, it worked but I run into the missing “'Note' cvterm”. ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: I couldn't find the 'Note' cvterm in the database; Did you load the feature property controlled vocabulary? STACK: Error::throw STACK: Bio::Root::Root::throw /usr/local/share/perl5/Bio/Root/Root.pm:447 STACK: Bio::GMOD::DB::Adapter::handle_note /usr/local/share/perl5/Bio/GMOD/DB/Adapter.pm:3901 STACK: chado-1.31/load/bin/gmod_bulk_load_gff3.pl:974 <http://gmod_bulk_load_gff3.pl:974> ----------------------------------------------------------- I used “make” to create database and “make ontologies” to load ontologies. I installed : Relationship Ontology, Sequence Ontology, Gene Ontology and Chado Feature Properties A select in the cv table shows : 1 "null" 2 "local" "Locally created terms" 3 "chado_properties" "Terms that are used in the chadoprop table to describe the state of the database" 4 "relationship" 5 "synonym_type" 6 "cvterm_property_type" 7 "anonymous" 8 "sequence" 9 "biological_process" 10 "molecular_function" 11 "cellular_component" 12 "external" Thank you Miharimamy De : Scott Cain <sc...@sc... <mailto:sc...@sc...> > Envoyé : lundi 20 janvier 2020 22:39 À : RAJAONISON Andriamiharimamy <mir...@ya... <mailto:mir...@ya...> > Cc : GMOD Schema/Chado List <gmo...@li... <mailto:gmo...@li...> > Objet : Re: [Gmod-schema] [ISSUE] Load gff in chado Is_circular isn't a valid GFF tag. You can change it to is_circular to fix it. On Mon, Jan 20, 2020 at 11:24 AM RAJAONISON Andriamiharimamy <mir...@ya... <mailto:mir...@ya...> > wrote: Hi again Scott, Here is the modified header of the file : ##gff-version 3 NZ_CP027390.1 . chromosome 1 5901472 . . . ID=NZ_CP027390.1;Name=NZ_CP027390.1 NZ_CP027391.1 . plasmid 1 98724 . . . ID=NZ_CP027391.1;Name=NZ_CP027391.1 NZ_CP027390.1 RefSeq gene 931 1197 . + . ID=gene0;Name=AX062_RS00010;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS00010 NZ_CP027390.1 RefSeq gene 1191 1577 . - . ID=gene1;Name=AX062_RS00015;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS00015 And here is the output of the script. ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: The following tag(s) are illegal and are causing this parser to die: Is_circular STACK: Error::throw STACK: Bio::Root::Root::throw /usr/local/share/perl5/Bio/Root/Root.pm:447 STACK: Bio::FeatureIO::gff::_handle_feature /usr/local/share/perl5/Bio/FeatureIO/gff.pm:787 <http://gff.pm:787> STACK: Bio::FeatureIO::gff::next_feature /usr/local/share/perl5/Bio/FeatureIO/gff.pm:187 <http://gff.pm:187> STACK: chado-1.31/load/bin/gmod_bulk_load_gff3.pl:782 <http://gmod_bulk_load_gff3.pl:782> ----------------------------------------------------------- I checked the tag in the file and here is the snippet causing issue: NZ_CP027390.1 RefSeq region 1 5802748 . + . ID=id0;Dbxref=taxon:562;Is_circular=true;Name=ANONYMOUS;collected-by=CDC;country=USA;gbkey=Src;genome=chromosome;mol_type=genomic DNA;serov ar=E. coli O26:Pending;strain=2015C-4944 NZ_CP027390.1 cmsearch riboswitch 4764335 4764483 . + . ID=id113;Dbxref=RFAM:RF00050;Note=FMN riboswitch;bound_moiety=flavin mononucleotide;gbkey=regulatory;inference=COORDINATES: profile:INFERNAL:1.1.1;regulatory_class=riboswitch NZ_CP027390.1 RefSeq direct_repeat 5085858 5086435 . + . ID=id120;gbkey=repeat_region;inference=COORDINATES: alignment:pilercr:v1.02;rpt_family=CRISPR;rpt_type=direct;rpt_unit_range=508586 2..5085885;rpt_unit_seq=tccccgcgccagcggggataaacc NZ_CP027390.1 RefSeq direct_repeat 5112137 5112653 . + . ID=id121;gbkey=repeat_region;inference=COORDINATES: alignment:pilercr:v1.02;rpt_family=CRISPR;rpt_type=direct;rpt_unit_range=511213 7..5112165;rpt_unit_seq=gtgttccccgcgccagcggggataaaccg NZ_CP027391.1 RefSeq region 1 98724 . + . ID=id147;Dbxref=taxon:562;Is_circular=true;Name=unnamed;collected-by=CDC;country=USA;gbkey=Src;genome=plasmid;mol_type=genomic DNA;plasmid- name=unnamed;serovar=E. coli O26:Pending;strain=2015C-4944 Thank you Miharimamy De : Scott Cain <sc...@sc... <mailto:sc...@sc...> > Envoyé : lundi 20 janvier 2020 22:03 À : RAJAONISON Andriamiharimamy <mir...@ya... <mailto:mir...@ya...> > Cc : GMOD Schema/Chado List <gmo...@li... <mailto:gmo...@li...> > Objet : Re: [Gmod-schema] [ISSUE] Load gff in chado Hi Miharimamy, The snippet of GFF that you posted had a chromosome named "NZ_CP027390.1" which looks fine to me, but the error message is about a reference sequence named "NZ_CP027391.1". Is it defined anywhere in your GFF? Scott On Mon, Jan 20, 2020 at 10:58 AM RAJAONISON Andriamiharimamy <mir...@ya... <mailto:mir...@ya...> > wrote: > > Hi Scott, > > > > Even with the --recreate_cache and --remove_lock, I still get the same error. > > > > Miharimamy > > > > > > (Re)creating the uniquename cache in the database... > > Creating table... > > Populating table... > > Creating indexes... > > Adjusting the primary key sequences (if necessary)...Done. > > Preparing data for inserting into the chado database > > (This may take a while ...) > > Unable to find srcfeature NZ_CP027391.1 in the database. > > Perhaps you need to rerun your data load with the '--recreate_cache' option. at /usr/local/share/perl5/Bio/GMOD/DB/Adapter.pm line 4605, <GEN0> line 5939. > > Bio::GMOD::DB::Adapter::src_second_chance('Bio::GMOD::DB::Adapter=HASH(0x25623e0)', 'Bio::SeqFeature::Annotated=HASH(0x2680c18)') called at chado-1.31/load/bin/gmod_bulk_load_gff3.pl <http://gmod_bulk_load_gff3.pl> line 851 > > > > Abnormal termination, trying to clean up... > > > > Attempting to clean up the loader temp table (so that --recreate_cache > > won't be needed)... > > Trying to remove the run lock (so that --remove_lock won't be needed)... > > Exiting... > > > > > > > > De : Scott Cain <sc...@sc... <mailto:sc...@sc...> > > Envoyé : lundi 20 janvier 2020 20:56 > À : RAJAONISON Andriamiharimamy <mir...@ya... <mailto:mir...@ya...> > > Cc : GMOD Schema/Chado List <gmo...@li... <mailto:gmo...@li...> > > Objet : Re: [Gmod-schema] [ISSUE] Load gff in chado > > > > Hi Miharimamy, > > > > Did you try running with the --recreate_cache option as suggested by the error message? > > > > On Mon, Jan 20, 2020 at 6:54 AM RAJAONISON Andriamiharimamy <mir...@ya... <mailto:mir...@ya...> > wrote: > > Hi Scott, > > You were right, the issue lied in the line 2. The field separator was 4 spaces instead of tabulation. > > I converted it and tried to load the file back but went back to the first error. > > > > Thank you for your time, > > Miharimamy > > > > Preparing data for inserting into the chado database > > (This may take a while ...) > > Unable to find srcfeature NZ_CP027391.1 in the database. > > Perhaps you need to rerun your data load with the '--recreate_cache' option. at /usr/local/share/perl5/Bio/GMOD/DB/Adapter.pm line 4605, <GEN0> line 5939. > > Bio::GMOD::DB::Adapter::src_second_chance('Bio::GMOD::DB::Adapter=HASH(0x297c120)', 'Bio::SeqFeature::Annotated=HASH(0x2bcd420)') called at chado-1.31/load/bin/gmod_bulk_load_gff3.pl <http://gmod_bulk_load_gff3.pl> line 851 > > > > Abnormal termination, trying to clean up... > > > > Attempting to clean up the loader temp table (so that --recreate_cache > > won't be needed)... > > Trying to remove the run lock (so that --remove_lock won't be needed)... > > Exiting... > > > > > > ##gff-version 3 > > NZ_CP027390.1 . chromosome 1 5901472 . . . ID=NZ_CP027390.1;Name=NZ_CP027390.1 > > NZ_CP027390.1 RefSeq gene 931 1197 . + . ID=gene0;Name=AX062_RS00010;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS00010 > > NZ_CP027390.1 RefSeq gene 1191 1577 . - . ID=gene1;Name=AX062_RS00015;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS00015 > > NZ_CP027390.1 RefSeq gene 10194 10736 . + . ID=gene10;Name=AX062_RS00060;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS00060 > > NZ_CP027390.1 RefSeq gene 80547 81860 . + . ID=gene100;Name=AX062_RS00510;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS00510 > > NZ_CP027390.1 RefSeq gene 918692 918997 . - . ID=gene1000;Name=AX062_RS05010;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS05010 > > NZ_CP027390.1 RefSeq gene 919105 919815 . + . ID=gene1001;Name=AX062_RS05015;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS05015 > > NZ_CP027390.1 RefSeq gene 919818 920378 . - . ID=gene1002;Name=AX062_RS05020;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS05020 > > NZ_CP027390.1 RefSeq gene 920413 920754 . - . ID=gene1003;Name=AX062_RS05025;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS05025 > > NZ_CP027390.1 RefSeq gene 920889 921215 . + . ID=gene1004;Name=AX062_RS05030;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS05030 > > NZ_CP027390.1 RefSeq gene 921252 921440 . + . ID=gene1005;Name=AX062_RS05035;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS05035 > > NZ_CP027390.1 RefSeq gene 921421 922635 . + . ID=gene1006;Name=AX062_RS05040;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS05040 > > > > > > De : Scott Cain <sc...@sc... <mailto:sc...@sc...> > > Envoyé : vendredi 17 janvier 2020 21:49 > À : RAJAONISON Andriamiharimamy <mir...@ya... <mailto:mir...@ya...> > > Cc : GMOD Schema/Chado List <gmo...@li... <mailto:gmo...@li...> > > Objet : Re: [Gmod-schema] [ISSUE] Load gff in chado > > > > Hi Miharimamy, > > > > The end of the error message where it says "line 2" I'm pretty sure means the problem is in line 2 of your GFF file. Is that the line you added? What does it look like? > > > > Scott > > > > > > On Fri, Jan 17, 2020 at 5:57 AM RAJAONISON Andriamiharimamy <mir...@ya... <mailto:mir...@ya...> > wrote: > > Hi Scott, > > > > Thank you again for your help because adding the parent feature removes the srcfeature related error. > > Yet I get another issue that I don’t understand : > > > > --------------------- WARNING --------------------- > > MSG: Can not set Bio::Location::Simple::end() equal to start; start not set > > --------------------------------------------------- > > Use of uninitialized value $featuretype in pattern match (m//) at chado-1.31/load/bin/gmod_bulk_load_gff3.pl <http://gmod_bulk_load_gff3.pl> line 819, <GEN0> line 2. > > Use of uninitialized value $featuretype in pattern match (m//) at chado-1.31/load/bin/gmod_bulk_load_gff3.pl <http://gmod_bulk_load_gff3.pl> line 820, <GEN0> line 2. > > > > ------------- EXCEPTION: Bio::Root::Exception ------------- > > MSG: no cvterm for > > STACK: Error::throw > > STACK: Bio::Root::Root::throw /usr/local/share/perl5/Bio/Root/Root.pm:447 > > STACK: Bio::GMOD::DB::Adapter::get_type /usr/local/share/perl5/Bio/GMOD/DB/Adapter.pm:4629 > > STACK: chado-1.31/load/bin/gmod_bulk_load_gff3.pl:848 <http://gmod_bulk_load_gff3.pl:848> > > ----------------------------------------------------------- > > > > Abnormal termination, trying to clean up... > > > > Attempting to clean up the loader temp table (so that --recreate_cache > > won't be needed)... > > Trying to remove the run lock (so that --remove_lock won't be needed)... > > Exiting... > > > > Thank you > > > > Miharimamy > > > > > > De : Scott Cain <sc...@sc... <mailto:sc...@sc...> > > Envoyé : jeudi 16 janvier 2020 22:31 > À : RAJAONISON Andriamiharimamy <mir...@ya... <mailto:mir...@ya...> > > Cc : GMOD Schema/Chado List <gmo...@li... <mailto:gmo...@li...> > > Objet : Re: [Gmod-schema] [ISSUE] Load gff in chado > > > > Hi Miharimamy, > > > > Thanks for sending this report. Generally, loading GFF into Chado can be difficult, as the perl-based loader that you are using can be quite particular about the format of the GFF and producers of GFF generally aren't so particular. Since the loader makes the (in my view, correct) decision to not load anything if it can't load everything in a file, it quits. So, taking each of the problems you found in order: > > > > 1. "dbxref=NCBI:" isn't a valid accession (presumably there should be something after the "NCBI:") and so the loader won't continue. Two options here are to contact SGD and point out that there is a problem with their GFF, or delete the offending item and try again, which it looks like you did for item 2. > > > > 2. If the item Note is missing from the cvterm table, I think that probably means that you didn't install the schema using the "make" procedure that would have installed some necessary items in the cv and cvterm table. > > > > 3 and 4: Messages that srcfeatures can't be found mean that the feature on which the features in your GFF reside hasn't been defined yet (that is, the thing referred to in column 1 of the GFF doesn't exist). Frequently, creators of GFF don't define the reference sequence in the GFF for whatever reason (it's not required by the GFF3 spec, since it might be credibly defined elsewhere). To define it in the GFF you have, add a line before anything else that looks something like this: > > > > NZ_CP027390.1 . chromosome 1 1234566 . . . ID=NZ_CP027390.1;Name=NZ_CP027390.1 > > > > Where "NZ_CP027390.1" is the name of the reference sequence, "chromosome" is the Sequence Ontology term for the type of thing (other options would include "contig" and "supercontig"), and the "123456" is the length of the sequence. > > > > Finally, I would add that using Chado through Tripal frequently makes life easier, though I think all of these issues would still have been problems (with the probable exception of item 2--Tripal would have initialized the "Note" item in cvterm I think). > > > > Scott > > > > > > On Thu, Jan 16, 2020 at 11:15 AM RAJAONISON Andriamiharimamy via Gmod-schema <gmo...@li... <mailto:gmo...@li...> > wrote: > > Hi, > > > > Hi, > > Hope this mail will find you well. Send you my best wishes for this new year. > > I am reaching to you because I have issues loading GFF files into you Chado. I tried several files but none of them seems to work. > > > > Thank you in advance for you help > > > > Miharimamy > > > > The S.cerevisiae example in gmod.org <http://gmod.org> > > Command : > > chado-1.31/load/bin/gmod_bulk_load_gff3.pl <http://gmod_bulk_load_gff3.pl> --organism yeast --gfffile saccharomyces/saccharomyces_cerevisiae.gff.sorted --fastafile saccharomyces/saccharomyces_cerevisiae.gff.sorted.fasta > > > > Output : > > ------------- EXCEPTION: Bio::Root::Exception ------------- > > MSG: Error in line: > > chrIII SGD chromosome 1 316620 . . . ID=chrIII;dbxref=NCBI:;Name=chrIII > > > > Dbxref value 'NCBI:' did not conform to GFF3 specification > > STACK: Error::throw > > STACK: Bio::Root::Root::throw /usr/local/share/perl5/Bio/Root/Root.pm:447 > > STACK: Bio::FeatureIO::gff::_handle_feature /usr/local/share/perl5/Bio/FeatureIO/gff.pm:659 <http://gff.pm:659> > > STACK: Bio::FeatureIO::gff::next_feature /usr/local/share/perl5/Bio/FeatureIO/gff.pm:187 <http://gff.pm:187> > > STACK: chado-1.31/load/bin/gmod_bulk_load_gff3.pl:782 <http://gmod_bulk_load_gff3.pl:782> > > ----------------------------------------------------------- > > S.cerevisiae without dbxref > > Command : > > chado-1.31/load/bin/gmod_bulk_load_gff3.pl <http://gmod_bulk_load_gff3.pl> --organism yeast --gfffile saccharomyces/test_saccharomyces_ncbi --fastafile saccharomyces/saccharomyces_cerevisiae.gff.sorted.fasta > > > > Output : > > ------------- EXCEPTION: Bio::Root::Exception ------------- > > MSG: I couldn't find the 'Note' cvterm in the database; > > Did you load the feature property controlled vocabulary? > > STACK: Error::throw > > STACK: Bio::Root::Root::throw /usr/local/share/perl5/Bio/Root/Root.pm:447 > > STACK: Bio::GMOD::DB::Adapter::handle_note /usr/local/share/perl5/Bio/GMOD/DB/Adapter.pm:3901 > > STACK: chado-1.31/load/bin/gmod_bulk_load_gff3.pl:974 <http://gmod_bulk_load_gff3.pl:974> > > > > A GFF from NCBI > > Command : > > chado-1.31/load/bin/gmod_bulk_load_gff3.pl <http://gmod_bulk_load_gff3.pl> --organism bacteria --gfffile GCF_000008865.2/GCF_003018035.1_ASM301803v1_genomic.gff.sorted > > > > Output : > > Unable to find srcfeature NZ_CP027390.1 in the database. > > Perhaps you need to rerun your data load with the '--recreate_cache' option. at /usr/local/share/perl5/Bio/GMOD/DB/Adapter.pm line 4605, <GEN0> line 2. > > Bio::GMOD::DB::Adapter::src_second_chance('Bio::GMOD::DB::Adapter=HASH(0x241a8d0)', 'Bio::SeqFeature::Annotated=HASH(0x26c61d0)') called at chado-1.31/load/bin/gmod_bulk_load_gff3.pl <http://gmod_bulk_load_gff3.pl> line 851 > > > > A GFF3 from prokka > > Command : > > chado-1.31/load/bin/gmod_bulk_load_gff3.pl <http://gmod_bulk_load_gff3.pl> --organism bacteria -gfffile GFF3/GCF_000455285.gff.sorted --fastafile GFF3/GCF_000455285.gff.sorted.fasta > > Output : > > Unable to find srcfeature gnl|Prokka|GCF_000455285_1 in the database. > > Perhaps you need to rerun your data load with the '--recreate_cache' option. at /usr/local/share/perl5/Bio/GMOD/DB/Adapter.pm line 4605, <GEN0> line 2. > > Bio::GMOD::DB::Adapter::src_second_chance('Bio::GMOD::DB::Adapter=HASH(0x33008f0)', 'Bio::SeqFeature::Annotated=HASH(0x35ac0a0)') called at chado-1.31/load/bin/gmod_bulk_load_gff3.pl <http://gmod_bulk_load_gff3.pl> line 851 > > > > > > > > > > _______________________________________________ > Gmod-schema mailing list > Gmo...@li... <mailto:Gmo...@li...> > https://lists.sourceforge.net/lists/listinfo/gmod-schema > > > > -- > > ------------------------------------------------------------------------ > Scott Cain, Ph. D. scott at scottcain dot net > GMOD Coordinator (http://gmod.org/) 216-392-3087 > Ontario Institute for Cancer Research > > > > -- > > ------------------------------------------------------------------------ > Scott Cain, Ph. D. scott at scottcain dot net > GMOD Coordinator (http://gmod.org/) 216-392-3087 > Ontario Institute for Cancer Research > > > > -- > > ------------------------------------------------------------------------ > Scott Cain, Ph. D. scott at scottcain dot net > GMOD Coordinator (http://gmod.org/) 216-392-3087 > Ontario Institute for Cancer Research -- ------------------------------------------------------------------------ Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research -- ------------------------------------------------------------------------ Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research -- ------------------------------------------------------------------------ Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research -- ------------------------------------------------------------------------ Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research -- ------------------------------------------------------------------------ Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research |
From: Scott C. <sc...@sc...> - 2020-01-20 22:00:47
|
Yep, looks good! On Mon, Jan 20, 2020 at 1:59 PM RAJAONISON Andriamiharimamy < mir...@ya...> wrote: > I’m planning to use it inside a data warehouse (ETL etc…) project. So I > don’t know if tripal gives me the freedom to do so. > > > > Here is the output : > > > > This GFF file has CDS and/or UTR features that do not belong to a > > 'central dogma' gene (ie, gene/transcript/CDS). The features of > > this type are being stored in the database as is. > > > > Skipping organism table since the load file is empty... > > Skipping analysis table since the load file is empty... > > Skipping db table since the load file is empty... > > Skipping dbxref table since the load file is empty... > > Skipping cv table since the load file is empty... > > Skipping cvterm table since the load file is empty... > > Loading data into feature table ... > > Loading data into featureloc table ... > > Loading data into feature_relationship table ... > > Loading data into featureprop table ... > > Skipping feature_cvterm table since the load file is empty... > > Loading data into synonym table ... > > Loading data into feature_synonym table ... > > Loading data into feature_dbxref table ... > > Skipping analysisfeature table since the load file is empty... > > Adding cvtermprop=MapReferenceType for 'plasmid' ... > > Loading sequences (if any) ... > > > > Done. > > > > Is everything allright ? > > > > *De :* Scott Cain <sc...@sc...> > *Envoyé :* mardi 21 janvier 2020 00:55 > *À :* RAJAONISON Andriamiharimamy <mir...@ya...> > *Cc :* GMOD Schema/Chado List <gmo...@li...> > *Objet :* Re: [Gmod-schema] [ISSUE] Load gff in chado > > > > I agree that it's overly complicated. These days, most people who want to > use Chado do so by installing Tripal, which makes it a lot easier. Believe > me, I would be thrilled to have a simpler python script that still does the > same job! > > > > On Mon, Jan 20, 2020 at 1:50 PM RAJAONISON Andriamiharimamy < > mir...@ya...> wrote: > > 👌👌 > > Works perfectly ! > > It is quite complicated though. I thought of translating the perl script > to python at some point. > > Thank you for your help Scott ! > > Miharimamy > > > > *De :* Scott Cain <sc...@sc...> > *Envoyé :* mardi 21 janvier 2020 00:33 > *À :* RAJAONISON Andriamiharimamy <mir...@ya...> > *Cc :* GMOD Schema/Chado List <gmo...@li...> > *Objet :* Re: [Gmod-schema] [ISSUE] Load gff in chado > > > > Ah, you have chado_properties (cv_id 3) but not feature_properties, which > is where Note comes from. That should come from the make command but it > appears to have not worked. To run the make command again, my recollection > is that you have to remove a directory from the "load" directory. Do this: > try running that make command again and select feature properties from the > menu. If nothing happens, my recollection is that running "make clean" > will remove the lock files that prevent loading an ontology twice (which is > what it thinks you'll be trying to do I think). > > > > > > On Mon, Jan 20, 2020 at 1:11 PM RAJAONISON Andriamiharimamy < > mir...@ya...> wrote: > > Here it is > > select * from cvterm where cv_id=3 > > > > cvterm_id cv_id name definition > > 3 3 "version" "Chado schema version" > > *De :* Scott Cain <sc...@sc...> > *Envoyé :* lundi 20 janvier 2020 23:47 > *À :* RAJAONISON Andriamiharimamy <mir...@ya...> > *Cc :* GMOD Schema/Chado List <gmo...@li...> > *Objet :* Re: [Gmod-schema] [ISSUE] Load gff in chado > > > > Can you also do a "select * from cvterm where cv_id=3" and show us that? > > > > On Mon, Jan 20, 2020 at 12:45 PM RAJAONISON Andriamiharimamy < > mir...@ya...> wrote: > > Thank you, it worked but I run into the missing “'Note' cvterm”. > > > > ------------- EXCEPTION: Bio::Root::Exception ------------- > > MSG: I couldn't find the 'Note' cvterm in the database; > > Did you load the feature property controlled vocabulary? > > STACK: Error::throw > > STACK: Bio::Root::Root::throw /usr/local/share/perl5/Bio/Root/Root.pm:447 > > STACK: Bio::GMOD::DB::Adapter::handle_note > /usr/local/share/perl5/Bio/GMOD/DB/Adapter.pm:3901 > > STACK: chado-1.31/load/bin/gmod_bulk_load_gff3.pl:974 > > ----------------------------------------------------------- > > > > I used “make” to create database and “make ontologies” to load > ontologies. > > I installed : Relationship Ontology, Sequence Ontology, Gene Ontology and > Chado Feature Properties > > A select in the cv table shows : > > 1 "null" > > 2 "local" "Locally created terms" > > 3 "chado_properties" "Terms that are used in the > chadoprop table to describe the state of the database" > > 4 "relationship" > > 5 "synonym_type" > > 6 "cvterm_property_type" > > 7 "anonymous" > > 8 "sequence" > > 9 "biological_process" > > 10 "molecular_function" > > 11 "cellular_component" > > 12 "external" > > > > Thank you > > Miharimamy > > > > *De :* Scott Cain <sc...@sc...> > *Envoyé :* lundi 20 janvier 2020 22:39 > *À :* RAJAONISON Andriamiharimamy <mir...@ya...> > *Cc :* GMOD Schema/Chado List <gmo...@li...> > *Objet :* Re: [Gmod-schema] [ISSUE] Load gff in chado > > > > Is_circular isn't a valid GFF tag. You can change it to is_circular to > fix it. > > > > On Mon, Jan 20, 2020 at 11:24 AM RAJAONISON Andriamiharimamy < > mir...@ya...> wrote: > > Hi again Scott, > > > > Here is the modified header of the file : > > > > ##gff-version 3 > > NZ_CP027390.1 . chromosome 1 5901472 . . > . ID=NZ_CP027390.1;Name=NZ_CP027390.1 > > NZ_CP027391.1 . plasmid 1 98724 . . . > ID=NZ_CP027391.1;Name=NZ_CP027391.1 > > NZ_CP027390.1 RefSeq gene 931 1197 . + . > ID=gene0;Name=AX062_RS00010;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS00010 > > NZ_CP027390.1 RefSeq gene 1191 1577 . - . > ID=gene1;Name=AX062_RS00015;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS00015 > > > > And here is the output of the script. > > > > ------------- EXCEPTION: Bio::Root::Exception ------------- > > MSG: The following tag(s) are illegal and are causing this parser to die: > Is_circular > > STACK: Error::throw > > STACK: Bio::Root::Root::throw /usr/local/share/perl5/Bio/Root/Root.pm:447 > > STACK: Bio::FeatureIO::gff::_handle_feature > /usr/local/share/perl5/Bio/FeatureIO/gff.pm:787 > > STACK: Bio::FeatureIO::gff::next_feature > /usr/local/share/perl5/Bio/FeatureIO/gff.pm:187 > > STACK: chado-1.31/load/bin/gmod_bulk_load_gff3.pl:782 > > ----------------------------------------------------------- > > > > I checked the tag in the file and here is the snippet causing issue: > > > > NZ_CP027390.1 RefSeq region 1 5802748 . + . > ID=id0;Dbxref=taxon:562;Is_circular=true;Name=ANONYMOUS;collected-by=CDC;country=USA;gbkey=Src;genome=chromosome;mol_type=genomic > DNA;serov > > ar=E. coli O26:Pending;strain=2015C-4944 > > NZ_CP027390.1 cmsearch riboswitch 4764335 4764483 . > + . ID=id113;Dbxref=RFAM:RF00050;Note=FMN > riboswitch;bound_moiety=flavin > mononucleotide;gbkey=regulatory;inference=COORDINATES: > > profile:INFERNAL:1.1.1;regulatory_class=riboswitch > > NZ_CP027390.1 RefSeq direct_repeat 5085858 5086435 . + > . ID=id120;gbkey=repeat_region;inference=COORDINATES: > alignment:pilercr:v1.02;rpt_family=CRISPR;rpt_type=direct;rpt_unit_range=508586 > > 2..5085885;rpt_unit_seq=tccccgcgccagcggggataaacc > > NZ_CP027390.1 RefSeq direct_repeat 5112137 5112653 . + > . ID=id121;gbkey=repeat_region;inference=COORDINATES: > alignment:pilercr:v1.02;rpt_family=CRISPR;rpt_type=direct;rpt_unit_range=511213 > > 7..5112165;rpt_unit_seq=gtgttccccgcgccagcggggataaaccg > > NZ_CP027391.1 RefSeq region 1 98724 . + . > ID=id147;Dbxref=taxon:562;Is_circular=true;Name=unnamed;collected-by=CDC;country=USA;gbkey=Src;genome=plasmid;mol_type=genomic > DNA;plasmid- > > name=unnamed;serovar=E. coli O26:Pending;strain=2015C-4944 > > > > Thank you > > Miharimamy > > > > *De :* Scott Cain <sc...@sc...> > *Envoyé :* lundi 20 janvier 2020 22:03 > *À :* RAJAONISON Andriamiharimamy <mir...@ya...> > *Cc :* GMOD Schema/Chado List <gmo...@li...> > *Objet :* Re: [Gmod-schema] [ISSUE] Load gff in chado > > > > Hi Miharimamy, > > The snippet of GFF that you posted had a chromosome named "NZ_CP027390.1" > which looks fine to me, but the error message is about a reference sequence > named "NZ_CP027391.1". Is it defined anywhere in your GFF? > > Scott > > > On Mon, Jan 20, 2020 at 10:58 AM RAJAONISON Andriamiharimamy < > mir...@ya...> wrote: > > > > Hi Scott, > > > > > > > > Even with the --recreate_cache and --remove_lock, I still get the same > error. > > > > > > > > Miharimamy > > > > > > > > > > > > (Re)creating the uniquename cache in the database... > > > > Creating table... > > > > Populating table... > > > > Creating indexes... > > > > Adjusting the primary key sequences (if necessary)...Done. > > > > Preparing data for inserting into the chado database > > > > (This may take a while ...) > > > > Unable to find srcfeature NZ_CP027391.1 in the database. > > > > Perhaps you need to rerun your data load with the '--recreate_cache' > option. at /usr/local/share/perl5/Bio/GMOD/DB/Adapter.pm line 4605, <GEN0> > line 5939. > > > > > Bio::GMOD::DB::Adapter::src_second_chance('Bio::GMOD::DB::Adapter=HASH(0x25623e0)', > 'Bio::SeqFeature::Annotated=HASH(0x2680c18)') called at chado-1.31/load/bin/ > gmod_bulk_load_gff3.pl line 851 > > > > > > > > Abnormal termination, trying to clean up... > > > > > > > > Attempting to clean up the loader temp table (so that --recreate_cache > > > > won't be needed)... > > > > Trying to remove the run lock (so that --remove_lock won't be needed)... > > > > Exiting... > > > > > > > > > > > > > > > > De : Scott Cain <sc...@sc...> > > Envoyé : lundi 20 janvier 2020 20:56 > > À : RAJAONISON Andriamiharimamy <mir...@ya...> > > Cc : GMOD Schema/Chado List <gmo...@li...> > > Objet : Re: [Gmod-schema] [ISSUE] Load gff in chado > > > > > > > > Hi Miharimamy, > > > > > > > > Did you try running with the --recreate_cache option as suggested by the > error message? > > > > > > > > On Mon, Jan 20, 2020 at 6:54 AM RAJAONISON Andriamiharimamy < > mir...@ya...> wrote: > > > > Hi Scott, > > > > You were right, the issue lied in the line 2. The field separator was 4 > spaces instead of tabulation. > > > > I converted it and tried to load the file back but went back to the > first error. > > > > > > > > Thank you for your time, > > > > Miharimamy > > > > > > > > Preparing data for inserting into the chado database > > > > (This may take a while ...) > > > > Unable to find srcfeature NZ_CP027391.1 in the database. > > > > Perhaps you need to rerun your data load with the '--recreate_cache' > option. at /usr/local/share/perl5/Bio/GMOD/DB/Adapter.pm line 4605, <GEN0> > line 5939. > > > > > Bio::GMOD::DB::Adapter::src_second_chance('Bio::GMOD::DB::Adapter=HASH(0x297c120)', > 'Bio::SeqFeature::Annotated=HASH(0x2bcd420)') called at chado-1.31/load/bin/ > gmod_bulk_load_gff3.pl line 851 > > > > > > > > Abnormal termination, trying to clean up... > > > > > > > > Attempting to clean up the loader temp table (so that --recreate_cache > > > > won't be needed)... > > > > Trying to remove the run lock (so that --remove_lock won't be needed)... > > > > Exiting... > > > > > > > > > > > > ##gff-version 3 > > > > NZ_CP027390.1 . chromosome 1 5901472 . . > . ID=NZ_CP027390.1;Name=NZ_CP027390.1 > > > > NZ_CP027390.1 RefSeq gene 931 1197 . + . > ID=gene0;Name=AX062_RS00010;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS00010 > > > > NZ_CP027390.1 RefSeq gene 1191 1577 . - . > ID=gene1;Name=AX062_RS00015;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS00015 > > > > NZ_CP027390.1 RefSeq gene 10194 10736 . + . > ID=gene10;Name=AX062_RS00060;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS00060 > > > > NZ_CP027390.1 RefSeq gene 80547 81860 . + . > ID=gene100;Name=AX062_RS00510;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS00510 > > > > NZ_CP027390.1 RefSeq gene 918692 918997 . - . > ID=gene1000;Name=AX062_RS05010;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS05010 > > > > NZ_CP027390.1 RefSeq gene 919105 919815 . + . > ID=gene1001;Name=AX062_RS05015;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS05015 > > > > NZ_CP027390.1 RefSeq gene 919818 920378 . - . > ID=gene1002;Name=AX062_RS05020;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS05020 > > > > NZ_CP027390.1 RefSeq gene 920413 920754 . - . > ID=gene1003;Name=AX062_RS05025;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS05025 > > > > NZ_CP027390.1 RefSeq gene 920889 921215 . + . > ID=gene1004;Name=AX062_RS05030;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS05030 > > > > NZ_CP027390.1 RefSeq gene 921252 921440 . + . > ID=gene1005;Name=AX062_RS05035;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS05035 > > > > NZ_CP027390.1 RefSeq gene 921421 922635 . + . > ID=gene1006;Name=AX062_RS05040;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS05040 > > > > > > > > > > > > De : Scott Cain <sc...@sc...> > > Envoyé : vendredi 17 janvier 2020 21:49 > > À : RAJAONISON Andriamiharimamy <mir...@ya...> > > Cc : GMOD Schema/Chado List <gmo...@li...> > > Objet : Re: [Gmod-schema] [ISSUE] Load gff in chado > > > > > > > > Hi Miharimamy, > > > > > > > > The end of the error message where it says "line 2" I'm pretty sure > means the problem is in line 2 of your GFF file. Is that the line you > added? What does it look like? > > > > > > > > Scott > > > > > > > > > > > > On Fri, Jan 17, 2020 at 5:57 AM RAJAONISON Andriamiharimamy < > mir...@ya...> wrote: > > > > Hi Scott, > > > > > > > > Thank you again for your help because adding the parent feature removes > the srcfeature related error. > > > > Yet I get another issue that I don’t understand : > > > > > > > > --------------------- WARNING --------------------- > > > > MSG: Can not set Bio::Location::Simple::end() equal to start; start not > set > > > > --------------------------------------------------- > > > > Use of uninitialized value $featuretype in pattern match (m//) at > chado-1.31/load/bin/gmod_bulk_load_gff3.pl line 819, <GEN0> line 2. > > > > Use of uninitialized value $featuretype in pattern match (m//) at > chado-1.31/load/bin/gmod_bulk_load_gff3.pl line 820, <GEN0> line 2. > > > > > > > > ------------- EXCEPTION: Bio::Root::Exception ------------- > > > > MSG: no cvterm for > > > > STACK: Error::throw > > > > STACK: Bio::Root::Root::throw /usr/local/share/perl5/Bio/Root/Root.pm:447 > > > > STACK: Bio::GMOD::DB::Adapter::get_type > /usr/local/share/perl5/Bio/GMOD/DB/Adapter.pm:4629 > > > > STACK: chado-1.31/load/bin/gmod_bulk_load_gff3.pl:848 > > > > ----------------------------------------------------------- > > > > > > > > Abnormal termination, trying to clean up... > > > > > > > > Attempting to clean up the loader temp table (so that --recreate_cache > > > > won't be needed)... > > > > Trying to remove the run lock (so that --remove_lock won't be needed)... > > > > Exiting... > > > > > > > > Thank you > > > > > > > > Miharimamy > > > > > > > > > > > > De : Scott Cain <sc...@sc...> > > Envoyé : jeudi 16 janvier 2020 22:31 > > À : RAJAONISON Andriamiharimamy <mir...@ya...> > > Cc : GMOD Schema/Chado List <gmo...@li...> > > Objet : Re: [Gmod-schema] [ISSUE] Load gff in chado > > > > > > > > Hi Miharimamy, > > > > > > > > Thanks for sending this report. Generally, loading GFF into Chado can > be difficult, as the perl-based loader that you are using can be quite > particular about the format of the GFF and producers of GFF generally > aren't so particular. Since the loader makes the (in my view, correct) > decision to not load anything if it can't load everything in a file, it > quits. So, taking each of the problems you found in order: > > > > > > > > 1. "dbxref=NCBI:" isn't a valid accession (presumably there should be > something after the "NCBI:") and so the loader won't continue. Two options > here are to contact SGD and point out that there is a problem with their > GFF, or delete the offending item and try again, which it looks like you > did for item 2. > > > > > > > > 2. If the item Note is missing from the cvterm table, I think that > probably means that you didn't install the schema using the "make" > procedure that would have installed some necessary items in the cv and > cvterm table. > > > > > > > > 3 and 4: Messages that srcfeatures can't be found mean that the feature > on which the features in your GFF reside hasn't been defined yet (that is, > the thing referred to in column 1 of the GFF doesn't exist). Frequently, > creators of GFF don't define the reference sequence in the GFF for whatever > reason (it's not required by the GFF3 spec, since it might be credibly > defined elsewhere). To define it in the GFF you have, add a line before > anything else that looks something like this: > > > > > > > > NZ_CP027390.1 . chromosome 1 1234566 . . . > ID=NZ_CP027390.1;Name=NZ_CP027390.1 > > > > > > > > Where "NZ_CP027390.1" is the name of the reference sequence, > "chromosome" is the Sequence Ontology term for the type of thing (other > options would include "contig" and "supercontig"), and the "123456" is the > length of the sequence. > > > > > > > > Finally, I would add that using Chado through Tripal frequently makes > life easier, though I think all of these issues would still have been > problems (with the probable exception of item 2--Tripal would have > initialized the "Note" item in cvterm I think). > > > > > > > > Scott > > > > > > > > > > > > On Thu, Jan 16, 2020 at 11:15 AM RAJAONISON Andriamiharimamy via > Gmod-schema <gmo...@li...> wrote: > > > > Hi, > > > > > > > > Hi, > > > > Hope this mail will find you well. Send you my best wishes for this new > year. > > > > I am reaching to you because I have issues loading GFF files into you > Chado. I tried several files but none of them seems to work. > > > > > > > > Thank you in advance for you help > > > > > > > > Miharimamy > > > > > > > > The S.cerevisiae example in gmod.org > > > > Command : > > > > chado-1.31/load/bin/gmod_bulk_load_gff3.pl --organism yeast --gfffile > saccharomyces/saccharomyces_cerevisiae.gff.sorted --fastafile > saccharomyces/saccharomyces_cerevisiae.gff.sorted.fasta > > > > > > > > Output : > > > > ------------- EXCEPTION: Bio::Root::Exception ------------- > > > > MSG: Error in line: > > > > chrIII SGD chromosome 1 316620 . . . > ID=chrIII;dbxref=NCBI:;Name=chrIII > > > > > > > > Dbxref value 'NCBI:' did not conform to GFF3 specification > > > > STACK: Error::throw > > > > STACK: Bio::Root::Root::throw /usr/local/share/perl5/Bio/Root/Root.pm:447 > > > > STACK: Bio::FeatureIO::gff::_handle_feature > /usr/local/share/perl5/Bio/FeatureIO/gff.pm:659 > > > > STACK: Bio::FeatureIO::gff::next_feature > /usr/local/share/perl5/Bio/FeatureIO/gff.pm:187 > > > > STACK: chado-1.31/load/bin/gmod_bulk_load_gff3.pl:782 > > > > ----------------------------------------------------------- > > > > S.cerevisiae without dbxref > > > > Command : > > > > chado-1.31/load/bin/gmod_bulk_load_gff3.pl --organism yeast --gfffile > saccharomyces/test_saccharomyces_ncbi --fastafile > saccharomyces/saccharomyces_cerevisiae.gff.sorted.fasta > > > > > > > > Output : > > > > ------------- EXCEPTION: Bio::Root::Exception ------------- > > > > MSG: I couldn't find the 'Note' cvterm in the database; > > > > Did you load the feature property controlled vocabulary? > > > > STACK: Error::throw > > > > STACK: Bio::Root::Root::throw /usr/local/share/perl5/Bio/Root/Root.pm:447 > > > > STACK: Bio::GMOD::DB::Adapter::handle_note > /usr/local/share/perl5/Bio/GMOD/DB/Adapter.pm:3901 > > > > STACK: chado-1.31/load/bin/gmod_bulk_load_gff3.pl:974 > > > > > > > > A GFF from NCBI > > > > Command : > > > > chado-1.31/load/bin/gmod_bulk_load_gff3.pl --organism bacteria > --gfffile GCF_000008865.2/GCF_003018035.1_ASM301803v1_genomic.gff.sorted > > > > > > > > Output : > > > > Unable to find srcfeature NZ_CP027390.1 in the database. > > > > Perhaps you need to rerun your data load with the '--recreate_cache' > option. at /usr/local/share/perl5/Bio/GMOD/DB/Adapter.pm line 4605, <GEN0> > line 2. > > > > > Bio::GMOD::DB::Adapter::src_second_chance('Bio::GMOD::DB::Adapter=HASH(0x241a8d0)', > 'Bio::SeqFeature::Annotated=HASH(0x26c61d0)') called at chado-1.31/load/bin/ > gmod_bulk_load_gff3.pl line 851 > > > > > > > > A GFF3 from prokka > > > > Command : > > > > chado-1.31/load/bin/gmod_bulk_load_gff3.pl --organism bacteria -gfffile > GFF3/GCF_000455285.gff.sorted --fastafile > GFF3/GCF_000455285.gff.sorted.fasta > > > > Output : > > > > Unable to find srcfeature gnl|Prokka|GCF_000455285_1 in the database. > > > > Perhaps you need to rerun your data load with the '--recreate_cache' > option. at /usr/local/share/perl5/Bio/GMOD/DB/Adapter.pm line 4605, <GEN0> > line 2. > > > > > Bio::GMOD::DB::Adapter::src_second_chance('Bio::GMOD::DB::Adapter=HASH(0x33008f0)', > 'Bio::SeqFeature::Annotated=HASH(0x35ac0a0)') called at chado-1.31/load/bin/ > gmod_bulk_load_gff3.pl line 851 > > > > > > > > > > > > > > > > > > > > _______________________________________________ > > Gmod-schema mailing list > > Gmo...@li... > > https://lists.sourceforge.net/lists/listinfo/gmod-schema > > > > > > > > -- > > > > ------------------------------------------------------------------------ > > Scott Cain, Ph. D. scott at scottcain > dot net > > GMOD Coordinator (http://gmod.org/) 216-392-3087 > > Ontario Institute for Cancer Research > > > > > > > > -- > > > > ------------------------------------------------------------------------ > > Scott Cain, Ph. D. scott at scottcain > dot net > > GMOD Coordinator (http://gmod.org/) 216-392-3087 > > Ontario Institute for Cancer Research > > > > > > > > -- > > > > ------------------------------------------------------------------------ > > Scott Cain, Ph. D. scott at scottcain > dot net > > GMOD Coordinator (http://gmod.org/) 216-392-3087 > > Ontario Institute for Cancer Research > > > > -- > ------------------------------------------------------------------------ > Scott Cain, Ph. D. scott at scottcain > dot net > GMOD Coordinator (http://gmod.org/) 216-392-3087 > Ontario Institute for Cancer Research > > > > -- > > ------------------------------------------------------------------------ > Scott Cain, Ph. D. scott at scottcain > dot net > GMOD Coordinator (http://gmod.org/) 216-392-3087 > Ontario Institute for Cancer Research > > > > -- > > ------------------------------------------------------------------------ > Scott Cain, Ph. D. scott at scottcain > dot net > GMOD Coordinator (http://gmod.org/) 216-392-3087 > Ontario Institute for Cancer Research > > > > -- > > ------------------------------------------------------------------------ > Scott Cain, Ph. D. scott at scottcain > dot net > GMOD Coordinator (http://gmod.org/) 216-392-3087 > Ontario Institute for Cancer Research > > > > -- > > ------------------------------------------------------------------------ > Scott Cain, Ph. D. scott at scottcain > dot net > GMOD Coordinator (http://gmod.org/) 216-392-3087 > Ontario Institute for Cancer Research > -- ------------------------------------------------------------------------ Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research |
From: Adhemar <az...@gm...> - 2020-01-21 13:26:35
|
Hi Scott and Miharimamy, Since you've mentioned Python, I'd like to let you know that we've made significant advances in our Python project to store, search and visualize biological data using Chado. It might be useful for you. Here is an instances of Machado: https://www.machado.cnptia.embrapa.br/plantannot The source code is freely available at https://github.com/lmb-embrapa/machado and there's plenty documentation at https://machado.readthedocs.io On Mon, Jan 20, 2020 at 6:55 PM Scott Cain <sc...@sc...> wrote: > I agree that it's overly complicated. These days, most people who want to > use Chado do so by installing Tripal, which makes it a lot easier. Believe > me, I would be thrilled to have a simpler python script that still does the > same job! > > On Mon, Jan 20, 2020 at 1:50 PM RAJAONISON Andriamiharimamy < > mir...@ya...> wrote: > >> 👌👌 >> >> Works perfectly ! >> >> It is quite complicated though. I thought of translating the perl script >> to python at some point. >> >> Thank you for your help Scott ! >> >> Miharimamy >> >> >> >> *De :* Scott Cain <sc...@sc...> >> *Envoyé :* mardi 21 janvier 2020 00:33 >> *À :* RAJAONISON Andriamiharimamy <mir...@ya...> >> *Cc :* GMOD Schema/Chado List <gmo...@li...> >> *Objet :* Re: [Gmod-schema] [ISSUE] Load gff in chado >> >> >> >> Ah, you have chado_properties (cv_id 3) but not feature_properties, which >> is where Note comes from. That should come from the make command but it >> appears to have not worked. To run the make command again, my recollection >> is that you have to remove a directory from the "load" directory. Do this: >> try running that make command again and select feature properties from the >> menu. If nothing happens, my recollection is that running "make clean" >> will remove the lock files that prevent loading an ontology twice (which is >> what it thinks you'll be trying to do I think). >> >> >> >> >> >> On Mon, Jan 20, 2020 at 1:11 PM RAJAONISON Andriamiharimamy < >> mir...@ya...> wrote: >> >> Here it is >> >> select * from cvterm where cv_id=3 >> >> >> >> cvterm_id cv_id name definition >> >> 3 3 "version" "Chado schema version" >> >> *De :* Scott Cain <sc...@sc...> >> *Envoyé :* lundi 20 janvier 2020 23:47 >> *À :* RAJAONISON Andriamiharimamy <mir...@ya...> >> *Cc :* GMOD Schema/Chado List <gmo...@li...> >> *Objet :* Re: [Gmod-schema] [ISSUE] Load gff in chado >> >> >> >> Can you also do a "select * from cvterm where cv_id=3" and show us that? >> >> >> >> On Mon, Jan 20, 2020 at 12:45 PM RAJAONISON Andriamiharimamy < >> mir...@ya...> wrote: >> >> Thank you, it worked but I run into the missing “'Note' cvterm”. >> >> >> >> ------------- EXCEPTION: Bio::Root::Exception ------------- >> >> MSG: I couldn't find the 'Note' cvterm in the database; >> >> Did you load the feature property controlled vocabulary? >> >> STACK: Error::throw >> >> STACK: Bio::Root::Root::throw /usr/local/share/perl5/Bio/Root/Root.pm:447 >> >> STACK: Bio::GMOD::DB::Adapter::handle_note >> /usr/local/share/perl5/Bio/GMOD/DB/Adapter.pm:3901 >> >> STACK: chado-1.31/load/bin/gmod_bulk_load_gff3.pl:974 >> >> ----------------------------------------------------------- >> >> >> >> I used “make” to create database and “make ontologies” to load >> ontologies. >> >> I installed : Relationship Ontology, Sequence Ontology, Gene Ontology and >> Chado Feature Properties >> >> A select in the cv table shows : >> >> 1 "null" >> >> 2 "local" "Locally created terms" >> >> 3 "chado_properties" "Terms that are used in the >> chadoprop table to describe the state of the database" >> >> 4 "relationship" >> >> 5 "synonym_type" >> >> 6 "cvterm_property_type" >> >> 7 "anonymous" >> >> 8 "sequence" >> >> 9 "biological_process" >> >> 10 "molecular_function" >> >> 11 "cellular_component" >> >> 12 "external" >> >> >> >> Thank you >> >> Miharimamy >> >> >> >> *De :* Scott Cain <sc...@sc...> >> *Envoyé :* lundi 20 janvier 2020 22:39 >> *À :* RAJAONISON Andriamiharimamy <mir...@ya...> >> *Cc :* GMOD Schema/Chado List <gmo...@li...> >> *Objet :* Re: [Gmod-schema] [ISSUE] Load gff in chado >> >> >> >> Is_circular isn't a valid GFF tag. You can change it to is_circular to >> fix it. >> >> >> >> On Mon, Jan 20, 2020 at 11:24 AM RAJAONISON Andriamiharimamy < >> mir...@ya...> wrote: >> >> Hi again Scott, >> >> >> >> Here is the modified header of the file : >> >> >> >> ##gff-version 3 >> >> NZ_CP027390.1 . chromosome 1 5901472 . . >> . ID=NZ_CP027390.1;Name=NZ_CP027390.1 >> >> NZ_CP027391.1 . plasmid 1 98724 . . . >> ID=NZ_CP027391.1;Name=NZ_CP027391.1 >> >> NZ_CP027390.1 RefSeq gene 931 1197 . + . >> ID=gene0;Name=AX062_RS00010;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS00010 >> >> NZ_CP027390.1 RefSeq gene 1191 1577 . - . >> ID=gene1;Name=AX062_RS00015;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS00015 >> >> >> >> And here is the output of the script. >> >> >> >> ------------- EXCEPTION: Bio::Root::Exception ------------- >> >> MSG: The following tag(s) are illegal and are causing this parser to die: >> Is_circular >> >> STACK: Error::throw >> >> STACK: Bio::Root::Root::throw /usr/local/share/perl5/Bio/Root/Root.pm:447 >> >> STACK: Bio::FeatureIO::gff::_handle_feature >> /usr/local/share/perl5/Bio/FeatureIO/gff.pm:787 >> >> STACK: Bio::FeatureIO::gff::next_feature >> /usr/local/share/perl5/Bio/FeatureIO/gff.pm:187 >> >> STACK: chado-1.31/load/bin/gmod_bulk_load_gff3.pl:782 >> >> ----------------------------------------------------------- >> >> >> >> I checked the tag in the file and here is the snippet causing issue: >> >> >> >> NZ_CP027390.1 RefSeq region 1 5802748 . + . >> ID=id0;Dbxref=taxon:562;Is_circular=true;Name=ANONYMOUS;collected-by=CDC;country=USA;gbkey=Src;genome=chromosome;mol_type=genomic >> DNA;serov >> >> ar=E. coli O26:Pending;strain=2015C-4944 >> >> NZ_CP027390.1 cmsearch riboswitch 4764335 4764483 . >> + . ID=id113;Dbxref=RFAM:RF00050;Note=FMN >> riboswitch;bound_moiety=flavin >> mononucleotide;gbkey=regulatory;inference=COORDINATES: >> >> profile:INFERNAL:1.1.1;regulatory_class=riboswitch >> >> NZ_CP027390.1 RefSeq direct_repeat 5085858 5086435 . + >> . ID=id120;gbkey=repeat_region;inference=COORDINATES: >> alignment:pilercr:v1.02;rpt_family=CRISPR;rpt_type=direct;rpt_unit_range=508586 >> >> 2..5085885;rpt_unit_seq=tccccgcgccagcggggataaacc >> >> NZ_CP027390.1 RefSeq direct_repeat 5112137 5112653 . + >> . ID=id121;gbkey=repeat_region;inference=COORDINATES: >> alignment:pilercr:v1.02;rpt_family=CRISPR;rpt_type=direct;rpt_unit_range=511213 >> >> 7..5112165;rpt_unit_seq=gtgttccccgcgccagcggggataaaccg >> >> NZ_CP027391.1 RefSeq region 1 98724 . + . >> ID=id147;Dbxref=taxon:562;Is_circular=true;Name=unnamed;collected-by=CDC;country=USA;gbkey=Src;genome=plasmid;mol_type=genomic >> DNA;plasmid- >> >> name=unnamed;serovar=E. coli O26:Pending;strain=2015C-4944 >> >> >> >> Thank you >> >> Miharimamy >> >> >> >> *De :* Scott Cain <sc...@sc...> >> *Envoyé :* lundi 20 janvier 2020 22:03 >> *À :* RAJAONISON Andriamiharimamy <mir...@ya...> >> *Cc :* GMOD Schema/Chado List <gmo...@li...> >> *Objet :* Re: [Gmod-schema] [ISSUE] Load gff in chado >> >> >> >> Hi Miharimamy, >> >> The snippet of GFF that you posted had a chromosome named "NZ_CP027390.1" >> which looks fine to me, but the error message is about a reference sequence >> named "NZ_CP027391.1". Is it defined anywhere in your GFF? >> >> Scott >> >> >> On Mon, Jan 20, 2020 at 10:58 AM RAJAONISON Andriamiharimamy < >> mir...@ya...> wrote: >> > >> > Hi Scott, >> > >> > >> > >> > Even with the --recreate_cache and --remove_lock, I still get the same >> error. >> > >> > >> > >> > Miharimamy >> > >> > >> > >> > >> > >> > (Re)creating the uniquename cache in the database... >> > >> > Creating table... >> > >> > Populating table... >> > >> > Creating indexes... >> > >> > Adjusting the primary key sequences (if necessary)...Done. >> > >> > Preparing data for inserting into the chado database >> > >> > (This may take a while ...) >> > >> > Unable to find srcfeature NZ_CP027391.1 in the database. >> > >> > Perhaps you need to rerun your data load with the '--recreate_cache' >> option. at /usr/local/share/perl5/Bio/GMOD/DB/Adapter.pm line 4605, <GEN0> >> line 5939. >> > >> > >> Bio::GMOD::DB::Adapter::src_second_chance('Bio::GMOD::DB::Adapter=HASH(0x25623e0)', >> 'Bio::SeqFeature::Annotated=HASH(0x2680c18)') called at chado-1.31/load/bin/ >> gmod_bulk_load_gff3.pl line 851 >> > >> > >> > >> > Abnormal termination, trying to clean up... >> > >> > >> > >> > Attempting to clean up the loader temp table (so that --recreate_cache >> > >> > won't be needed)... >> > >> > Trying to remove the run lock (so that --remove_lock won't be needed)... >> > >> > Exiting... >> > >> > >> > >> > >> > >> > >> > >> > De : Scott Cain <sc...@sc...> >> > Envoyé : lundi 20 janvier 2020 20:56 >> > À : RAJAONISON Andriamiharimamy <mir...@ya...> >> > Cc : GMOD Schema/Chado List <gmo...@li...> >> > Objet : Re: [Gmod-schema] [ISSUE] Load gff in chado >> > >> > >> > >> > Hi Miharimamy, >> > >> > >> > >> > Did you try running with the --recreate_cache option as suggested by >> the error message? >> > >> > >> > >> > On Mon, Jan 20, 2020 at 6:54 AM RAJAONISON Andriamiharimamy < >> mir...@ya...> wrote: >> > >> > Hi Scott, >> > >> > You were right, the issue lied in the line 2. The field separator was 4 >> spaces instead of tabulation. >> > >> > I converted it and tried to load the file back but went back to the >> first error. >> > >> > >> > >> > Thank you for your time, >> > >> > Miharimamy >> > >> > >> > >> > Preparing data for inserting into the chado database >> > >> > (This may take a while ...) >> > >> > Unable to find srcfeature NZ_CP027391.1 in the database. >> > >> > Perhaps you need to rerun your data load with the '--recreate_cache' >> option. at /usr/local/share/perl5/Bio/GMOD/DB/Adapter.pm line 4605, <GEN0> >> line 5939. >> > >> > >> Bio::GMOD::DB::Adapter::src_second_chance('Bio::GMOD::DB::Adapter=HASH(0x297c120)', >> 'Bio::SeqFeature::Annotated=HASH(0x2bcd420)') called at chado-1.31/load/bin/ >> gmod_bulk_load_gff3.pl line 851 >> > >> > >> > >> > Abnormal termination, trying to clean up... >> > >> > >> > >> > Attempting to clean up the loader temp table (so that --recreate_cache >> > >> > won't be needed)... >> > >> > Trying to remove the run lock (so that --remove_lock won't be needed)... >> > >> > Exiting... >> > >> > >> > >> > >> > >> > ##gff-version 3 >> > >> > NZ_CP027390.1 . chromosome 1 5901472 . . >> . ID=NZ_CP027390.1;Name=NZ_CP027390.1 >> > >> > NZ_CP027390.1 RefSeq gene 931 1197 . + . >> ID=gene0;Name=AX062_RS00010;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS00010 >> > >> > NZ_CP027390.1 RefSeq gene 1191 1577 . - . >> ID=gene1;Name=AX062_RS00015;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS00015 >> > >> > NZ_CP027390.1 RefSeq gene 10194 10736 . + . >> ID=gene10;Name=AX062_RS00060;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS00060 >> > >> > NZ_CP027390.1 RefSeq gene 80547 81860 . + . >> ID=gene100;Name=AX062_RS00510;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS00510 >> > >> > NZ_CP027390.1 RefSeq gene 918692 918997 . - . >> ID=gene1000;Name=AX062_RS05010;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS05010 >> > >> > NZ_CP027390.1 RefSeq gene 919105 919815 . + . >> ID=gene1001;Name=AX062_RS05015;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS05015 >> > >> > NZ_CP027390.1 RefSeq gene 919818 920378 . - . >> ID=gene1002;Name=AX062_RS05020;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS05020 >> > >> > NZ_CP027390.1 RefSeq gene 920413 920754 . - . >> ID=gene1003;Name=AX062_RS05025;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS05025 >> > >> > NZ_CP027390.1 RefSeq gene 920889 921215 . + . >> ID=gene1004;Name=AX062_RS05030;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS05030 >> > >> > NZ_CP027390.1 RefSeq gene 921252 921440 . + . >> ID=gene1005;Name=AX062_RS05035;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS05035 >> > >> > NZ_CP027390.1 RefSeq gene 921421 922635 . + . >> ID=gene1006;Name=AX062_RS05040;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS05040 >> > >> > >> > >> > >> > >> > De : Scott Cain <sc...@sc...> >> > Envoyé : vendredi 17 janvier 2020 21:49 >> > À : RAJAONISON Andriamiharimamy <mir...@ya...> >> > Cc : GMOD Schema/Chado List <gmo...@li...> >> > Objet : Re: [Gmod-schema] [ISSUE] Load gff in chado >> > >> > >> > >> > Hi Miharimamy, >> > >> > >> > >> > The end of the error message where it says "line 2" I'm pretty sure >> means the problem is in line 2 of your GFF file. Is that the line you >> added? What does it look like? >> > >> > >> > >> > Scott >> > >> > >> > >> > >> > >> > On Fri, Jan 17, 2020 at 5:57 AM RAJAONISON Andriamiharimamy < >> mir...@ya...> wrote: >> > >> > Hi Scott, >> > >> > >> > >> > Thank you again for your help because adding the parent feature removes >> the srcfeature related error. >> > >> > Yet I get another issue that I don’t understand : >> > >> > >> > >> > --------------------- WARNING --------------------- >> > >> > MSG: Can not set Bio::Location::Simple::end() equal to start; start not >> set >> > >> > --------------------------------------------------- >> > >> > Use of uninitialized value $featuretype in pattern match (m//) at >> chado-1.31/load/bin/gmod_bulk_load_gff3.pl line 819, <GEN0> line 2. >> > >> > Use of uninitialized value $featuretype in pattern match (m//) at >> chado-1.31/load/bin/gmod_bulk_load_gff3.pl line 820, <GEN0> line 2. >> > >> > >> > >> > ------------- EXCEPTION: Bio::Root::Exception ------------- >> > >> > MSG: no cvterm for >> > >> > STACK: Error::throw >> > >> > STACK: Bio::Root::Root::throw >> /usr/local/share/perl5/Bio/Root/Root.pm:447 >> > >> > STACK: Bio::GMOD::DB::Adapter::get_type >> /usr/local/share/perl5/Bio/GMOD/DB/Adapter.pm:4629 >> > >> > STACK: chado-1.31/load/bin/gmod_bulk_load_gff3.pl:848 >> > >> > ----------------------------------------------------------- >> > >> > >> > >> > Abnormal termination, trying to clean up... >> > >> > >> > >> > Attempting to clean up the loader temp table (so that --recreate_cache >> > >> > won't be needed)... >> > >> > Trying to remove the run lock (so that --remove_lock won't be needed)... >> > >> > Exiting... >> > >> > >> > >> > Thank you >> > >> > >> > >> > Miharimamy >> > >> > >> > >> > >> > >> > De : Scott Cain <sc...@sc...> >> > Envoyé : jeudi 16 janvier 2020 22:31 >> > À : RAJAONISON Andriamiharimamy <mir...@ya...> >> > Cc : GMOD Schema/Chado List <gmo...@li...> >> > Objet : Re: [Gmod-schema] [ISSUE] Load gff in chado >> > >> > >> > >> > Hi Miharimamy, >> > >> > >> > >> > Thanks for sending this report. Generally, loading GFF into Chado can >> be difficult, as the perl-based loader that you are using can be quite >> particular about the format of the GFF and producers of GFF generally >> aren't so particular. Since the loader makes the (in my view, correct) >> decision to not load anything if it can't load everything in a file, it >> quits. So, taking each of the problems you found in order: >> > >> > >> > >> > 1. "dbxref=NCBI:" isn't a valid accession (presumably there should be >> something after the "NCBI:") and so the loader won't continue. Two options >> here are to contact SGD and point out that there is a problem with their >> GFF, or delete the offending item and try again, which it looks like you >> did for item 2. >> > >> > >> > >> > 2. If the item Note is missing from the cvterm table, I think that >> probably means that you didn't install the schema using the "make" >> procedure that would have installed some necessary items in the cv and >> cvterm table. >> > >> > >> > >> > 3 and 4: Messages that srcfeatures can't be found mean that the feature >> on which the features in your GFF reside hasn't been defined yet (that is, >> the thing referred to in column 1 of the GFF doesn't exist). Frequently, >> creators of GFF don't define the reference sequence in the GFF for whatever >> reason (it's not required by the GFF3 spec, since it might be credibly >> defined elsewhere). To define it in the GFF you have, add a line before >> anything else that looks something like this: >> > >> > >> > >> > NZ_CP027390.1 . chromosome 1 1234566 . . . >> ID=NZ_CP027390.1;Name=NZ_CP027390.1 >> > >> > >> > >> > Where "NZ_CP027390.1" is the name of the reference sequence, >> "chromosome" is the Sequence Ontology term for the type of thing (other >> options would include "contig" and "supercontig"), and the "123456" is the >> length of the sequence. >> > >> > >> > >> > Finally, I would add that using Chado through Tripal frequently makes >> life easier, though I think all of these issues would still have been >> problems (with the probable exception of item 2--Tripal would have >> initialized the "Note" item in cvterm I think). >> > >> > >> > >> > Scott >> > >> > >> > >> > >> > >> > On Thu, Jan 16, 2020 at 11:15 AM RAJAONISON Andriamiharimamy via >> Gmod-schema <gmo...@li...> wrote: >> > >> > Hi, >> > >> > >> > >> > Hi, >> > >> > Hope this mail will find you well. Send you my best wishes for this new >> year. >> > >> > I am reaching to you because I have issues loading GFF files into you >> Chado. I tried several files but none of them seems to work. >> > >> > >> > >> > Thank you in advance for you help >> > >> > >> > >> > Miharimamy >> > >> > >> > >> > The S.cerevisiae example in gmod.org >> > >> > Command : >> > >> > chado-1.31/load/bin/gmod_bulk_load_gff3.pl --organism yeast --gfffile >> saccharomyces/saccharomyces_cerevisiae.gff.sorted --fastafile >> saccharomyces/saccharomyces_cerevisiae.gff.sorted.fasta >> > >> > >> > >> > Output : >> > >> > ------------- EXCEPTION: Bio::Root::Exception ------------- >> > >> > MSG: Error in line: >> > >> > chrIII SGD chromosome 1 316620 . . . >> ID=chrIII;dbxref=NCBI:;Name=chrIII >> > >> > >> > >> > Dbxref value 'NCBI:' did not conform to GFF3 specification >> > >> > STACK: Error::throw >> > >> > STACK: Bio::Root::Root::throw >> /usr/local/share/perl5/Bio/Root/Root.pm:447 >> > >> > STACK: Bio::FeatureIO::gff::_handle_feature >> /usr/local/share/perl5/Bio/FeatureIO/gff.pm:659 >> > >> > STACK: Bio::FeatureIO::gff::next_feature >> /usr/local/share/perl5/Bio/FeatureIO/gff.pm:187 >> > >> > STACK: chado-1.31/load/bin/gmod_bulk_load_gff3.pl:782 >> > >> > ----------------------------------------------------------- >> > >> > S.cerevisiae without dbxref >> > >> > Command : >> > >> > chado-1.31/load/bin/gmod_bulk_load_gff3.pl --organism yeast --gfffile >> saccharomyces/test_saccharomyces_ncbi --fastafile >> saccharomyces/saccharomyces_cerevisiae.gff.sorted.fasta >> > >> > >> > >> > Output : >> > >> > ------------- EXCEPTION: Bio::Root::Exception ------------- >> > >> > MSG: I couldn't find the 'Note' cvterm in the database; >> > >> > Did you load the feature property controlled vocabulary? >> > >> > STACK: Error::throw >> > >> > STACK: Bio::Root::Root::throw >> /usr/local/share/perl5/Bio/Root/Root.pm:447 >> > >> > STACK: Bio::GMOD::DB::Adapter::handle_note >> /usr/local/share/perl5/Bio/GMOD/DB/Adapter.pm:3901 >> > >> > STACK: chado-1.31/load/bin/gmod_bulk_load_gff3.pl:974 >> > >> > >> > >> > A GFF from NCBI >> > >> > Command : >> > >> > chado-1.31/load/bin/gmod_bulk_load_gff3.pl --organism bacteria >> --gfffile GCF_000008865.2/GCF_003018035.1_ASM301803v1_genomic.gff.sorted >> > >> > >> > >> > Output : >> > >> > Unable to find srcfeature NZ_CP027390.1 in the database. >> > >> > Perhaps you need to rerun your data load with the '--recreate_cache' >> option. at /usr/local/share/perl5/Bio/GMOD/DB/Adapter.pm line 4605, <GEN0> >> line 2. >> > >> > >> Bio::GMOD::DB::Adapter::src_second_chance('Bio::GMOD::DB::Adapter=HASH(0x241a8d0)', >> 'Bio::SeqFeature::Annotated=HASH(0x26c61d0)') called at chado-1.31/load/bin/ >> gmod_bulk_load_gff3.pl line 851 >> > >> > >> > >> > A GFF3 from prokka >> > >> > Command : >> > >> > chado-1.31/load/bin/gmod_bulk_load_gff3.pl --organism bacteria >> -gfffile GFF3/GCF_000455285.gff.sorted --fastafile >> GFF3/GCF_000455285.gff.sorted.fasta >> > >> > Output : >> > >> > Unable to find srcfeature gnl|Prokka|GCF_000455285_1 in the database. >> > >> > Perhaps you need to rerun your data load with the '--recreate_cache' >> option. at /usr/local/share/perl5/Bio/GMOD/DB/Adapter.pm line 4605, <GEN0> >> line 2. >> > >> > >> Bio::GMOD::DB::Adapter::src_second_chance('Bio::GMOD::DB::Adapter=HASH(0x33008f0)', >> 'Bio::SeqFeature::Annotated=HASH(0x35ac0a0)') called at chado-1.31/load/bin/ >> gmod_bulk_load_gff3.pl line 851 >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > _______________________________________________ >> > Gmod-schema mailing list >> > Gmo...@li... >> > https://lists.sourceforge.net/lists/listinfo/gmod-schema >> > >> > >> > >> > -- >> > >> > ------------------------------------------------------------------------ >> > Scott Cain, Ph. D. scott at scottcain >> dot net >> > GMOD Coordinator (http://gmod.org/) 216-392-3087 >> > Ontario Institute for Cancer Research >> > >> > >> > >> > -- >> > >> > ------------------------------------------------------------------------ >> > Scott Cain, Ph. D. scott at scottcain >> dot net >> > GMOD Coordinator (http://gmod.org/) 216-392-3087 >> > Ontario Institute for Cancer Research >> > >> > >> > >> > -- >> > >> > ------------------------------------------------------------------------ >> > Scott Cain, Ph. D. scott at scottcain >> dot net >> > GMOD Coordinator (http://gmod.org/) 216-392-3087 >> > Ontario Institute for Cancer Research >> >> >> >> -- >> ------------------------------------------------------------------------ >> Scott Cain, Ph. D. scott at scottcain >> dot net >> GMOD Coordinator (http://gmod.org/) 216-392-3087 >> Ontario Institute for Cancer Research >> >> >> >> -- >> >> ------------------------------------------------------------------------ >> Scott Cain, Ph. D. scott at scottcain >> dot net >> GMOD Coordinator (http://gmod.org/) 216-392-3087 >> Ontario Institute for Cancer Research >> >> >> >> -- >> >> ------------------------------------------------------------------------ >> Scott Cain, Ph. D. scott at scottcain >> dot net >> GMOD Coordinator (http://gmod.org/) 216-392-3087 >> Ontario Institute for Cancer Research >> >> >> >> -- >> >> ------------------------------------------------------------------------ >> Scott Cain, Ph. D. scott at scottcain >> dot net >> GMOD Coordinator (http://gmod.org/) 216-392-3087 >> Ontario Institute for Cancer Research >> > > > -- > ------------------------------------------------------------------------ > Scott Cain, Ph. D. scott at scottcain > dot net > GMOD Coordinator (http://gmod.org/) 216-392-3087 > Ontario Institute for Cancer Research > _______________________________________________ > Gmod-schema mailing list > Gmo...@li... > https://lists.sourceforge.net/lists/listinfo/gmod-schema > |
From: RAJAONISON A. <mir...@ya...> - 2020-01-21 14:59:07
|
Hi Adhemar, Seems neat. I’ll get back to you as soon as I tested it. Thank you Miharimamy De : Adhemar <az...@gm...> Envoyé : mardi 21 janvier 2020 16:26 À : Scott Cain <sc...@sc...> Cc : RAJAONISON Andriamiharimamy <mir...@ya...>; GMOD Schema/Chado List <gmo...@li...> Objet : Re: [Gmod-schema] [ISSUE] Load gff in chado Hi Scott and Miharimamy, Since you've mentioned Python, I'd like to let you know that we've made significant advances in our Python project to store, search and visualize biological data using Chado. It might be useful for you. Here is an instances of Machado: https://www.machado.cnptia.embrapa.br/plantannot The source code is freely available at https://github.com/lmb-embrapa/machado and there's plenty documentation at https://machado.readthedocs.io On Mon, Jan 20, 2020 at 6:55 PM Scott Cain <sc...@sc... <mailto:sc...@sc...> > wrote: I agree that it's overly complicated. These days, most people who want to use Chado do so by installing Tripal, which makes it a lot easier. Believe me, I would be thrilled to have a simpler python script that still does the same job! On Mon, Jan 20, 2020 at 1:50 PM RAJAONISON Andriamiharimamy <mir...@ya... <mailto:mir...@ya...> > wrote: 👌👌 Works perfectly ! It is quite complicated though. I thought of translating the perl script to python at some point. Thank you for your help Scott ! Miharimamy De : Scott Cain <sc...@sc... <mailto:sc...@sc...> > Envoyé : mardi 21 janvier 2020 00:33 À : RAJAONISON Andriamiharimamy <mir...@ya... <mailto:mir...@ya...> > Cc : GMOD Schema/Chado List <gmo...@li... <mailto:gmo...@li...> > Objet : Re: [Gmod-schema] [ISSUE] Load gff in chado Ah, you have chado_properties (cv_id 3) but not feature_properties, which is where Note comes from. That should come from the make command but it appears to have not worked. To run the make command again, my recollection is that you have to remove a directory from the "load" directory. Do this: try running that make command again and select feature properties from the menu. If nothing happens, my recollection is that running "make clean" will remove the lock files that prevent loading an ontology twice (which is what it thinks you'll be trying to do I think). On Mon, Jan 20, 2020 at 1:11 PM RAJAONISON Andriamiharimamy <mir...@ya... <mailto:mir...@ya...> > wrote: Here it is select * from cvterm where cv_id=3 cvterm_id cv_id name definition 3 3 "version" "Chado schema version" De : Scott Cain <sc...@sc... <mailto:sc...@sc...> > Envoyé : lundi 20 janvier 2020 23:47 À : RAJAONISON Andriamiharimamy <mir...@ya... <mailto:mir...@ya...> > Cc : GMOD Schema/Chado List <gmo...@li... <mailto:gmo...@li...> > Objet : Re: [Gmod-schema] [ISSUE] Load gff in chado Can you also do a "select * from cvterm where cv_id=3" and show us that? On Mon, Jan 20, 2020 at 12:45 PM RAJAONISON Andriamiharimamy <mir...@ya... <mailto:mir...@ya...> > wrote: Thank you, it worked but I run into the missing “'Note' cvterm”. ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: I couldn't find the 'Note' cvterm in the database; Did you load the feature property controlled vocabulary? STACK: Error::throw STACK: Bio::Root::Root::throw /usr/local/share/perl5/Bio/Root/Root.pm:447 STACK: Bio::GMOD::DB::Adapter::handle_note /usr/local/share/perl5/Bio/GMOD/DB/Adapter.pm:3901 STACK: chado-1.31/load/bin/gmod_bulk_load_gff3.pl:974 <http://gmod_bulk_load_gff3.pl:974> ----------------------------------------------------------- I used “make” to create database and “make ontologies” to load ontologies. I installed : Relationship Ontology, Sequence Ontology, Gene Ontology and Chado Feature Properties A select in the cv table shows : 1 "null" 2 "local" "Locally created terms" 3 "chado_properties" "Terms that are used in the chadoprop table to describe the state of the database" 4 "relationship" 5 "synonym_type" 6 "cvterm_property_type" 7 "anonymous" 8 "sequence" 9 "biological_process" 10 "molecular_function" 11 "cellular_component" 12 "external" Thank you Miharimamy De : Scott Cain <sc...@sc... <mailto:sc...@sc...> > Envoyé : lundi 20 janvier 2020 22:39 À : RAJAONISON Andriamiharimamy <mir...@ya... <mailto:mir...@ya...> > Cc : GMOD Schema/Chado List <gmo...@li... <mailto:gmo...@li...> > Objet : Re: [Gmod-schema] [ISSUE] Load gff in chado Is_circular isn't a valid GFF tag. You can change it to is_circular to fix it. On Mon, Jan 20, 2020 at 11:24 AM RAJAONISON Andriamiharimamy <mir...@ya... <mailto:mir...@ya...> > wrote: Hi again Scott, Here is the modified header of the file : ##gff-version 3 NZ_CP027390.1 . chromosome 1 5901472 . . . ID=NZ_CP027390.1;Name=NZ_CP027390.1 NZ_CP027391.1 . plasmid 1 98724 . . . ID=NZ_CP027391.1;Name=NZ_CP027391.1 NZ_CP027390.1 RefSeq gene 931 1197 . + . ID=gene0;Name=AX062_RS00010;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS00010 NZ_CP027390.1 RefSeq gene 1191 1577 . - . ID=gene1;Name=AX062_RS00015;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS00015 And here is the output of the script. ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: The following tag(s) are illegal and are causing this parser to die: Is_circular STACK: Error::throw STACK: Bio::Root::Root::throw /usr/local/share/perl5/Bio/Root/Root.pm:447 STACK: Bio::FeatureIO::gff::_handle_feature /usr/local/share/perl5/Bio/FeatureIO/gff.pm:787 <http://gff.pm:787> STACK: Bio::FeatureIO::gff::next_feature /usr/local/share/perl5/Bio/FeatureIO/gff.pm:187 <http://gff.pm:187> STACK: chado-1.31/load/bin/gmod_bulk_load_gff3.pl:782 <http://gmod_bulk_load_gff3.pl:782> ----------------------------------------------------------- I checked the tag in the file and here is the snippet causing issue: NZ_CP027390.1 RefSeq region 1 5802748 . + . ID=id0;Dbxref=taxon:562;Is_circular=true;Name=ANONYMOUS;collected-by=CDC;country=USA;gbkey=Src;genome=chromosome;mol_type=genomic DNA;serov ar=E. coli O26:Pending;strain=2015C-4944 NZ_CP027390.1 cmsearch riboswitch 4764335 4764483 . + . ID=id113;Dbxref=RFAM:RF00050;Note=FMN riboswitch;bound_moiety=flavin mononucleotide;gbkey=regulatory;inference=COORDINATES: profile:INFERNAL:1.1.1;regulatory_class=riboswitch NZ_CP027390.1 RefSeq direct_repeat 5085858 5086435 . + . ID=id120;gbkey=repeat_region;inference=COORDINATES: alignment:pilercr:v1.02;rpt_family=CRISPR;rpt_type=direct;rpt_unit_range=508586 2..5085885;rpt_unit_seq=tccccgcgccagcggggataaacc NZ_CP027390.1 RefSeq direct_repeat 5112137 5112653 . + . ID=id121;gbkey=repeat_region;inference=COORDINATES: alignment:pilercr:v1.02;rpt_family=CRISPR;rpt_type=direct;rpt_unit_range=511213 7..5112165;rpt_unit_seq=gtgttccccgcgccagcggggataaaccg NZ_CP027391.1 RefSeq region 1 98724 . + . ID=id147;Dbxref=taxon:562;Is_circular=true;Name=unnamed;collected-by=CDC;country=USA;gbkey=Src;genome=plasmid;mol_type=genomic DNA;plasmid- name=unnamed;serovar=E. coli O26:Pending;strain=2015C-4944 Thank you Miharimamy De : Scott Cain <sc...@sc... <mailto:sc...@sc...> > Envoyé : lundi 20 janvier 2020 22:03 À : RAJAONISON Andriamiharimamy <mir...@ya... <mailto:mir...@ya...> > Cc : GMOD Schema/Chado List <gmo...@li... <mailto:gmo...@li...> > Objet : Re: [Gmod-schema] [ISSUE] Load gff in chado Hi Miharimamy, The snippet of GFF that you posted had a chromosome named "NZ_CP027390.1" which looks fine to me, but the error message is about a reference sequence named "NZ_CP027391.1". Is it defined anywhere in your GFF? Scott On Mon, Jan 20, 2020 at 10:58 AM RAJAONISON Andriamiharimamy <mir...@ya... <mailto:mir...@ya...> > wrote: > > Hi Scott, > > > > Even with the --recreate_cache and --remove_lock, I still get the same error. > > > > Miharimamy > > > > > > (Re)creating the uniquename cache in the database... > > Creating table... > > Populating table... > > Creating indexes... > > Adjusting the primary key sequences (if necessary)...Done. > > Preparing data for inserting into the chado database > > (This may take a while ...) > > Unable to find srcfeature NZ_CP027391.1 in the database. > > Perhaps you need to rerun your data load with the '--recreate_cache' option. at /usr/local/share/perl5/Bio/GMOD/DB/Adapter.pm line 4605, <GEN0> line 5939. > > Bio::GMOD::DB::Adapter::src_second_chance('Bio::GMOD::DB::Adapter=HASH(0x25623e0)', 'Bio::SeqFeature::Annotated=HASH(0x2680c18)') called at chado-1.31/load/bin/gmod_bulk_load_gff3.pl <http://gmod_bulk_load_gff3.pl> line 851 > > > > Abnormal termination, trying to clean up... > > > > Attempting to clean up the loader temp table (so that --recreate_cache > > won't be needed)... > > Trying to remove the run lock (so that --remove_lock won't be needed)... > > Exiting... > > > > > > > > De : Scott Cain <sc...@sc... <mailto:sc...@sc...> > > Envoyé : lundi 20 janvier 2020 20:56 > À : RAJAONISON Andriamiharimamy <mir...@ya... <mailto:mir...@ya...> > > Cc : GMOD Schema/Chado List <gmo...@li... <mailto:gmo...@li...> > > Objet : Re: [Gmod-schema] [ISSUE] Load gff in chado > > > > Hi Miharimamy, > > > > Did you try running with the --recreate_cache option as suggested by the error message? > > > > On Mon, Jan 20, 2020 at 6:54 AM RAJAONISON Andriamiharimamy <mir...@ya... <mailto:mir...@ya...> > wrote: > > Hi Scott, > > You were right, the issue lied in the line 2. The field separator was 4 spaces instead of tabulation. > > I converted it and tried to load the file back but went back to the first error. > > > > Thank you for your time, > > Miharimamy > > > > Preparing data for inserting into the chado database > > (This may take a while ...) > > Unable to find srcfeature NZ_CP027391.1 in the database. > > Perhaps you need to rerun your data load with the '--recreate_cache' option. at /usr/local/share/perl5/Bio/GMOD/DB/Adapter.pm line 4605, <GEN0> line 5939. > > Bio::GMOD::DB::Adapter::src_second_chance('Bio::GMOD::DB::Adapter=HASH(0x297c120)', 'Bio::SeqFeature::Annotated=HASH(0x2bcd420)') called at chado-1.31/load/bin/gmod_bulk_load_gff3.pl <http://gmod_bulk_load_gff3.pl> line 851 > > > > Abnormal termination, trying to clean up... > > > > Attempting to clean up the loader temp table (so that --recreate_cache > > won't be needed)... > > Trying to remove the run lock (so that --remove_lock won't be needed)... > > Exiting... > > > > > > ##gff-version 3 > > NZ_CP027390.1 . chromosome 1 5901472 . . . ID=NZ_CP027390.1;Name=NZ_CP027390.1 > > NZ_CP027390.1 RefSeq gene 931 1197 . + . ID=gene0;Name=AX062_RS00010;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS00010 > > NZ_CP027390.1 RefSeq gene 1191 1577 . - . ID=gene1;Name=AX062_RS00015;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS00015 > > NZ_CP027390.1 RefSeq gene 10194 10736 . + . ID=gene10;Name=AX062_RS00060;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS00060 > > NZ_CP027390.1 RefSeq gene 80547 81860 . + . ID=gene100;Name=AX062_RS00510;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS00510 > > NZ_CP027390.1 RefSeq gene 918692 918997 . - . ID=gene1000;Name=AX062_RS05010;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS05010 > > NZ_CP027390.1 RefSeq gene 919105 919815 . + . ID=gene1001;Name=AX062_RS05015;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS05015 > > NZ_CP027390.1 RefSeq gene 919818 920378 . - . ID=gene1002;Name=AX062_RS05020;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS05020 > > NZ_CP027390.1 RefSeq gene 920413 920754 . - . ID=gene1003;Name=AX062_RS05025;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS05025 > > NZ_CP027390.1 RefSeq gene 920889 921215 . + . ID=gene1004;Name=AX062_RS05030;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS05030 > > NZ_CP027390.1 RefSeq gene 921252 921440 . + . ID=gene1005;Name=AX062_RS05035;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS05035 > > NZ_CP027390.1 RefSeq gene 921421 922635 . + . ID=gene1006;Name=AX062_RS05040;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS05040 > > > > > > De : Scott Cain <sc...@sc... <mailto:sc...@sc...> > > Envoyé : vendredi 17 janvier 2020 21:49 > À : RAJAONISON Andriamiharimamy <mir...@ya... <mailto:mir...@ya...> > > Cc : GMOD Schema/Chado List <gmo...@li... <mailto:gmo...@li...> > > Objet : Re: [Gmod-schema] [ISSUE] Load gff in chado > > > > Hi Miharimamy, > > > > The end of the error message where it says "line 2" I'm pretty sure means the problem is in line 2 of your GFF file. Is that the line you added? What does it look like? > > > > Scott > > > > > > On Fri, Jan 17, 2020 at 5:57 AM RAJAONISON Andriamiharimamy <mir...@ya... <mailto:mir...@ya...> > wrote: > > Hi Scott, > > > > Thank you again for your help because adding the parent feature removes the srcfeature related error. > > Yet I get another issue that I don’t understand : > > > > --------------------- WARNING --------------------- > > MSG: Can not set Bio::Location::Simple::end() equal to start; start not set > > --------------------------------------------------- > > Use of uninitialized value $featuretype in pattern match (m//) at chado-1.31/load/bin/gmod_bulk_load_gff3.pl <http://gmod_bulk_load_gff3.pl> line 819, <GEN0> line 2. > > Use of uninitialized value $featuretype in pattern match (m//) at chado-1.31/load/bin/gmod_bulk_load_gff3.pl <http://gmod_bulk_load_gff3.pl> line 820, <GEN0> line 2. > > > > ------------- EXCEPTION: Bio::Root::Exception ------------- > > MSG: no cvterm for > > STACK: Error::throw > > STACK: Bio::Root::Root::throw /usr/local/share/perl5/Bio/Root/Root.pm:447 > > STACK: Bio::GMOD::DB::Adapter::get_type /usr/local/share/perl5/Bio/GMOD/DB/Adapter.pm:4629 > > STACK: chado-1.31/load/bin/gmod_bulk_load_gff3.pl:848 <http://gmod_bulk_load_gff3.pl:848> > > ----------------------------------------------------------- > > > > Abnormal termination, trying to clean up... > > > > Attempting to clean up the loader temp table (so that --recreate_cache > > won't be needed)... > > Trying to remove the run lock (so that --remove_lock won't be needed)... > > Exiting... > > > > Thank you > > > > Miharimamy > > > > > > De : Scott Cain <sc...@sc... <mailto:sc...@sc...> > > Envoyé : jeudi 16 janvier 2020 22:31 > À : RAJAONISON Andriamiharimamy <mir...@ya... <mailto:mir...@ya...> > > Cc : GMOD Schema/Chado List <gmo...@li... <mailto:gmo...@li...> > > Objet : Re: [Gmod-schema] [ISSUE] Load gff in chado > > > > Hi Miharimamy, > > > > Thanks for sending this report. Generally, loading GFF into Chado can be difficult, as the perl-based loader that you are using can be quite particular about the format of the GFF and producers of GFF generally aren't so particular. Since the loader makes the (in my view, correct) decision to not load anything if it can't load everything in a file, it quits. So, taking each of the problems you found in order: > > > > 1. "dbxref=NCBI:" isn't a valid accession (presumably there should be something after the "NCBI:") and so the loader won't continue. Two options here are to contact SGD and point out that there is a problem with their GFF, or delete the offending item and try again, which it looks like you did for item 2. > > > > 2. If the item Note is missing from the cvterm table, I think that probably means that you didn't install the schema using the "make" procedure that would have installed some necessary items in the cv and cvterm table. > > > > 3 and 4: Messages that srcfeatures can't be found mean that the feature on which the features in your GFF reside hasn't been defined yet (that is, the thing referred to in column 1 of the GFF doesn't exist). Frequently, creators of GFF don't define the reference sequence in the GFF for whatever reason (it's not required by the GFF3 spec, since it might be credibly defined elsewhere). To define it in the GFF you have, add a line before anything else that looks something like this: > > > > NZ_CP027390.1 . chromosome 1 1234566 . . . ID=NZ_CP027390.1;Name=NZ_CP027390.1 > > > > Where "NZ_CP027390.1" is the name of the reference sequence, "chromosome" is the Sequence Ontology term for the type of thing (other options would include "contig" and "supercontig"), and the "123456" is the length of the sequence. > > > > Finally, I would add that using Chado through Tripal frequently makes life easier, though I think all of these issues would still have been problems (with the probable exception of item 2--Tripal would have initialized the "Note" item in cvterm I think). > > > > Scott > > > > > > On Thu, Jan 16, 2020 at 11:15 AM RAJAONISON Andriamiharimamy via Gmod-schema <gmo...@li... <mailto:gmo...@li...> > wrote: > > Hi, > > > > Hi, > > Hope this mail will find you well. Send you my best wishes for this new year. > > I am reaching to you because I have issues loading GFF files into you Chado. I tried several files but none of them seems to work. > > > > Thank you in advance for you help > > > > Miharimamy > > > > The S.cerevisiae example in gmod.org <http://gmod.org> > > Command : > > chado-1.31/load/bin/gmod_bulk_load_gff3.pl <http://gmod_bulk_load_gff3.pl> --organism yeast --gfffile saccharomyces/saccharomyces_cerevisiae.gff.sorted --fastafile saccharomyces/saccharomyces_cerevisiae.gff.sorted.fasta > > > > Output : > > ------------- EXCEPTION: Bio::Root::Exception ------------- > > MSG: Error in line: > > chrIII SGD chromosome 1 316620 . . . ID=chrIII;dbxref=NCBI:;Name=chrIII > > > > Dbxref value 'NCBI:' did not conform to GFF3 specification > > STACK: Error::throw > > STACK: Bio::Root::Root::throw /usr/local/share/perl5/Bio/Root/Root.pm:447 > > STACK: Bio::FeatureIO::gff::_handle_feature /usr/local/share/perl5/Bio/FeatureIO/gff.pm:659 <http://gff.pm:659> > > STACK: Bio::FeatureIO::gff::next_feature /usr/local/share/perl5/Bio/FeatureIO/gff.pm:187 <http://gff.pm:187> > > STACK: chado-1.31/load/bin/gmod_bulk_load_gff3.pl:782 <http://gmod_bulk_load_gff3.pl:782> > > ----------------------------------------------------------- > > S.cerevisiae without dbxref > > Command : > > chado-1.31/load/bin/gmod_bulk_load_gff3.pl <http://gmod_bulk_load_gff3.pl> --organism yeast --gfffile saccharomyces/test_saccharomyces_ncbi --fastafile saccharomyces/saccharomyces_cerevisiae.gff.sorted.fasta > > > > Output : > > ------------- EXCEPTION: Bio::Root::Exception ------------- > > MSG: I couldn't find the 'Note' cvterm in the database; > > Did you load the feature property controlled vocabulary? > > STACK: Error::throw > > STACK: Bio::Root::Root::throw /usr/local/share/perl5/Bio/Root/Root.pm:447 > > STACK: Bio::GMOD::DB::Adapter::handle_note /usr/local/share/perl5/Bio/GMOD/DB/Adapter.pm:3901 > > STACK: chado-1.31/load/bin/gmod_bulk_load_gff3.pl:974 <http://gmod_bulk_load_gff3.pl:974> > > > > A GFF from NCBI > > Command : > > chado-1.31/load/bin/gmod_bulk_load_gff3.pl <http://gmod_bulk_load_gff3.pl> --organism bacteria --gfffile GCF_000008865.2/GCF_003018035.1_ASM301803v1_genomic.gff.sorted > > > > Output : > > Unable to find srcfeature NZ_CP027390.1 in the database. > > Perhaps you need to rerun your data load with the '--recreate_cache' option. at /usr/local/share/perl5/Bio/GMOD/DB/Adapter.pm line 4605, <GEN0> line 2. > > Bio::GMOD::DB::Adapter::src_second_chance('Bio::GMOD::DB::Adapter=HASH(0x241a8d0)', 'Bio::SeqFeature::Annotated=HASH(0x26c61d0)') called at chado-1.31/load/bin/gmod_bulk_load_gff3.pl <http://gmod_bulk_load_gff3.pl> line 851 > > > > A GFF3 from prokka > > Command : > > chado-1.31/load/bin/gmod_bulk_load_gff3.pl <http://gmod_bulk_load_gff3.pl> --organism bacteria -gfffile GFF3/GCF_000455285.gff.sorted --fastafile GFF3/GCF_000455285.gff.sorted.fasta > > Output : > > Unable to find srcfeature gnl|Prokka|GCF_000455285_1 in the database. > > Perhaps you need to rerun your data load with the '--recreate_cache' option. at /usr/local/share/perl5/Bio/GMOD/DB/Adapter.pm line 4605, <GEN0> line 2. > > Bio::GMOD::DB::Adapter::src_second_chance('Bio::GMOD::DB::Adapter=HASH(0x33008f0)', 'Bio::SeqFeature::Annotated=HASH(0x35ac0a0)') called at chado-1.31/load/bin/gmod_bulk_load_gff3.pl <http://gmod_bulk_load_gff3.pl> line 851 > > > > > > > > > > _______________________________________________ > Gmod-schema mailing list > Gmo...@li... <mailto:Gmo...@li...> > https://lists.sourceforge.net/lists/listinfo/gmod-schema > > > > -- > > ------------------------------------------------------------------------ > Scott Cain, Ph. D. scott at scottcain dot net > GMOD Coordinator (http://gmod.org/) 216-392-3087 > Ontario Institute for Cancer Research > > > > -- > > ------------------------------------------------------------------------ > Scott Cain, Ph. D. scott at scottcain dot net > GMOD Coordinator (http://gmod.org/) 216-392-3087 > Ontario Institute for Cancer Research > > > > -- > > ------------------------------------------------------------------------ > Scott Cain, Ph. D. scott at scottcain dot net > GMOD Coordinator (http://gmod.org/) 216-392-3087 > Ontario Institute for Cancer Research -- ------------------------------------------------------------------------ Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research -- ------------------------------------------------------------------------ Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research -- ------------------------------------------------------------------------ Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research -- ------------------------------------------------------------------------ Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research -- ------------------------------------------------------------------------ Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research _______________________________________________ Gmod-schema mailing list Gmo...@li... <mailto:Gmo...@li...> https://lists.sourceforge.net/lists/listinfo/gmod-schema |