From: Scott C. <sc...@sc...> - 2020-01-20 18:21:41
|
Hi Miharimamy, Did you try running with the --recreate_cache option as suggested by the error message? On Mon, Jan 20, 2020 at 6:54 AM RAJAONISON Andriamiharimamy < mir...@ya...> wrote: > Hi Scott, > > You were right, the issue lied in the line 2. The field separator was 4 > spaces instead of tabulation. > > I converted it and tried to load the file back but went back to the first > error. > > > > Thank you for your time, > > Miharimamy > > > > Preparing data for inserting into the chado database > > (This may take a while ...) > > Unable to find srcfeature NZ_CP027391.1 in the database. > > Perhaps you need to rerun your data load with the '--recreate_cache' > option. at /usr/local/share/perl5/Bio/GMOD/DB/Adapter.pm line 4605, <GEN0> > line 5939. > > > Bio::GMOD::DB::Adapter::src_second_chance('Bio::GMOD::DB::Adapter=HASH(0x297c120)', > 'Bio::SeqFeature::Annotated=HASH(0x2bcd420)') called at chado-1.31/load/bin/ > gmod_bulk_load_gff3.pl line 851 > > > > Abnormal termination, trying to clean up... > > > > Attempting to clean up the loader temp table (so that --recreate_cache > > won't be needed)... > > Trying to remove the run lock (so that --remove_lock won't be needed)... > > Exiting... > > > > > > ##gff-version 3 > > NZ_CP027390.1 . chromosome 1 5901472 . . > . ID=NZ_CP027390.1;Name=NZ_CP027390.1 > > NZ_CP027390.1 RefSeq gene 931 1197 . + . > ID=gene0;Name=AX062_RS00010;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS00010 > > NZ_CP027390.1 RefSeq gene 1191 1577 . - . > ID=gene1;Name=AX062_RS00015;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS00015 > > NZ_CP027390.1 RefSeq gene 10194 10736 . + . > ID=gene10;Name=AX062_RS00060;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS00060 > > NZ_CP027390.1 RefSeq gene 80547 81860 . + . > ID=gene100;Name=AX062_RS00510;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS00510 > > NZ_CP027390.1 RefSeq gene 918692 918997 . - . > ID=gene1000;Name=AX062_RS05010;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS05010 > > NZ_CP027390.1 RefSeq gene 919105 919815 . + . > ID=gene1001;Name=AX062_RS05015;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS05015 > > NZ_CP027390.1 RefSeq gene 919818 920378 . - . > ID=gene1002;Name=AX062_RS05020;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS05020 > > NZ_CP027390.1 RefSeq gene 920413 920754 . - . > ID=gene1003;Name=AX062_RS05025;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS05025 > > NZ_CP027390.1 RefSeq gene 920889 921215 . + . > ID=gene1004;Name=AX062_RS05030;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS05030 > > NZ_CP027390.1 RefSeq gene 921252 921440 . + . > ID=gene1005;Name=AX062_RS05035;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS05035 > > NZ_CP027390.1 RefSeq gene 921421 922635 . + . > ID=gene1006;Name=AX062_RS05040;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS05040 > > > > > > *De :* Scott Cain <sc...@sc...> > *Envoyé :* vendredi 17 janvier 2020 21:49 > *À :* RAJAONISON Andriamiharimamy <mir...@ya...> > *Cc :* GMOD Schema/Chado List <gmo...@li...> > *Objet :* Re: [Gmod-schema] [ISSUE] Load gff in chado > > > > Hi Miharimamy, > > > > The end of the error message where it says "line 2" I'm pretty sure means > the problem is in line 2 of your GFF file. Is that the line you added? > What does it look like? > > > > Scott > > > > > > On Fri, Jan 17, 2020 at 5:57 AM RAJAONISON Andriamiharimamy < > mir...@ya...> wrote: > > Hi Scott, > > > > Thank you again for your help because adding the parent feature removes > the srcfeature related error. > > Yet I get another issue that I don’t understand : > > > > --------------------- WARNING --------------------- > > MSG: Can not set Bio::Location::Simple::end() equal to start; start not set > > --------------------------------------------------- > > Use of uninitialized value $featuretype in pattern match (m//) at > chado-1.31/load/bin/gmod_bulk_load_gff3.pl line 819, <GEN0> line 2. > > Use of uninitialized value $featuretype in pattern match (m//) at > chado-1.31/load/bin/gmod_bulk_load_gff3.pl line 820, <GEN0> line 2. > > > > ------------- EXCEPTION: Bio::Root::Exception ------------- > > MSG: no cvterm for > > STACK: Error::throw > > STACK: Bio::Root::Root::throw /usr/local/share/perl5/Bio/Root/Root.pm:447 > > STACK: Bio::GMOD::DB::Adapter::get_type > /usr/local/share/perl5/Bio/GMOD/DB/Adapter.pm:4629 > > STACK: chado-1.31/load/bin/gmod_bulk_load_gff3.pl:848 > > ----------------------------------------------------------- > > > > Abnormal termination, trying to clean up... > > > > Attempting to clean up the loader temp table (so that --recreate_cache > > won't be needed)... > > Trying to remove the run lock (so that --remove_lock won't be needed)... > > Exiting... > > > > Thank you > > > > Miharimamy > > > > > > *De :* Scott Cain <sc...@sc...> > *Envoyé :* jeudi 16 janvier 2020 22:31 > *À :* RAJAONISON Andriamiharimamy <mir...@ya...> > *Cc :* GMOD Schema/Chado List <gmo...@li...> > *Objet :* Re: [Gmod-schema] [ISSUE] Load gff in chado > > > > Hi Miharimamy, > > > > Thanks for sending this report. Generally, loading GFF into Chado can be > difficult, as the perl-based loader that you are using can be quite > particular about the format of the GFF and producers of GFF generally > aren't so particular. Since the loader makes the (in my view, correct) > decision to not load anything if it can't load everything in a file, it > quits. So, taking each of the problems you found in order: > > > > 1. "dbxref=NCBI:" isn't a valid accession (presumably there should be > something after the "NCBI:") and so the loader won't continue. Two options > here are to contact SGD and point out that there is a problem with their > GFF, or delete the offending item and try again, which it looks like you > did for item 2. > > > > 2. If the item Note is missing from the cvterm table, I think that > probably means that you didn't install the schema using the "make" > procedure that would have installed some necessary items in the cv and > cvterm table. > > > > 3 and 4: Messages that srcfeatures can't be found mean that the feature on > which the features in your GFF reside hasn't been defined yet (that is, the > thing referred to in column 1 of the GFF doesn't exist). Frequently, > creators of GFF don't define the reference sequence in the GFF for whatever > reason (it's not required by the GFF3 spec, since it might be credibly > defined elsewhere). To define it in the GFF you have, add a line before > anything else that looks something like this: > > > > NZ_CP027390.1 . chromosome 1 1234566 . . . > ID=NZ_CP027390.1;Name=NZ_CP027390.1 > > > > Where "NZ_CP027390.1" is the name of the reference sequence, "chromosome" > is the Sequence Ontology term for the type of thing (other options would > include "contig" and "supercontig"), and the "123456" is the length of the > sequence. > > > > Finally, I would add that using Chado through Tripal frequently makes life > easier, though I think all of these issues would still have been problems > (with the probable exception of item 2--Tripal would have initialized the > "Note" item in cvterm I think). > > > > Scott > > > > > > On Thu, Jan 16, 2020 at 11:15 AM RAJAONISON Andriamiharimamy via > Gmod-schema <gmo...@li...> wrote: > > Hi, > > > > Hi, > > Hope this mail will find you well. Send you my best wishes for this new > year. > > I am reaching to you because I have issues loading GFF files into you > Chado. I tried several files but none of them seems to work. > > > > Thank you in advance for you help > > > > Miharimamy > > > > 1. The S.cerevisiae example in gmod.org > > *Command : * > > chado-1.31/load/bin/gmod_bulk_load_gff3.pl --organism yeast --gfffile > saccharomyces/saccharomyces_cerevisiae.gff.sorted --fastafile > saccharomyces/saccharomyces_cerevisiae.gff.sorted.fasta > > > > *Output : * > > ------------- EXCEPTION: Bio::Root::Exception ------------- > > MSG: Error in line: > > chrIII SGD chromosome 1 316620 . . . > ID=chrIII;dbxref=NCBI:;Name=chrIII > > > > Dbxref value 'NCBI:' did not conform to GFF3 specification > > STACK: Error::throw > > STACK: Bio::Root::Root::throw /usr/local/share/perl5/Bio/Root/Root.pm:447 > > STACK: Bio::FeatureIO::gff::_handle_feature > /usr/local/share/perl5/Bio/FeatureIO/gff.pm:659 > > STACK: Bio::FeatureIO::gff::next_feature > /usr/local/share/perl5/Bio/FeatureIO/gff.pm:187 > > STACK: chado-1.31/load/bin/gmod_bulk_load_gff3.pl:782 > > ----------------------------------------------------------- > > 1. S.cerevisiae without dbxref > > *Command : * > > chado-1.31/load/bin/gmod_bulk_load_gff3.pl --organism yeast --gfffile > saccharomyces/test_saccharomyces_ncbi --fastafile > saccharomyces/saccharomyces_cerevisiae.gff.sorted.fasta > > > > *Output : * > > ------------- EXCEPTION: Bio::Root::Exception ------------- > > MSG: I couldn't find the 'Note' cvterm in the database; > > Did you load the feature property controlled vocabulary? > > STACK: Error::throw > > STACK: Bio::Root::Root::throw /usr/local/share/perl5/Bio/Root/Root.pm:447 > > STACK: Bio::GMOD::DB::Adapter::handle_note > /usr/local/share/perl5/Bio/GMOD/DB/Adapter.pm:3901 > > STACK: chado-1.31/load/bin/gmod_bulk_load_gff3.pl:974 > > > > 1. A GFF from NCBI > > *Command : * > > chado-1.31/load/bin/gmod_bulk_load_gff3.pl --organism bacteria --gfffile > GCF_000008865.2/GCF_003018035.1_ASM301803v1_genomic.gff.sorted > > > > *Output :* > > Unable to find srcfeature NZ_CP027390.1 in the database. > > Perhaps you need to rerun your data load with the '--recreate_cache' > option. at /usr/local/share/perl5/Bio/GMOD/DB/Adapter.pm line 4605, <GEN0> > line 2. > > > Bio::GMOD::DB::Adapter::src_second_chance('Bio::GMOD::DB::Adapter=HASH(0x241a8d0)', > 'Bio::SeqFeature::Annotated=HASH(0x26c61d0)') called at chado-1.31/load/bin/ > gmod_bulk_load_gff3.pl line 851 > > > > 1. A GFF3 from prokka > > *Command :* > > chado-1.31/load/bin/gmod_bulk_load_gff3.pl --organism bacteria -gfffile > GFF3/GCF_000455285.gff.sorted --fastafile > GFF3/GCF_000455285.gff.sorted.fasta > > *Output :* > > Unable to find srcfeature gnl|Prokka|GCF_000455285_1 in the database. > > Perhaps you need to rerun your data load with the '--recreate_cache' > option. at /usr/local/share/perl5/Bio/GMOD/DB/Adapter.pm line 4605, <GEN0> > line 2. > > > Bio::GMOD::DB::Adapter::src_second_chance('Bio::GMOD::DB::Adapter=HASH(0x33008f0)', > 'Bio::SeqFeature::Annotated=HASH(0x35ac0a0)') called at chado-1.31/load/bin/ > gmod_bulk_load_gff3.pl line 851 > > > > > > > > > > _______________________________________________ > Gmod-schema mailing list > Gmo...@li... > https://lists.sourceforge.net/lists/listinfo/gmod-schema > > > > -- > > ------------------------------------------------------------------------ > Scott Cain, Ph. D. scott at scottcain > dot net > GMOD Coordinator (http://gmod.org/) 216-392-3087 > Ontario Institute for Cancer Research > > > > -- > > ------------------------------------------------------------------------ > Scott Cain, Ph. D. scott at scottcain > dot net > GMOD Coordinator (http://gmod.org/) 216-392-3087 > Ontario Institute for Cancer Research > -- ------------------------------------------------------------------------ Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research |