From: Scott C. <sc...@sc...> - 2020-01-20 20:47:49
|
Can you also do a "select * from cvterm where cv_id=3" and show us that? On Mon, Jan 20, 2020 at 12:45 PM RAJAONISON Andriamiharimamy < mir...@ya...> wrote: > Thank you, it worked but I run into the missing “'Note' cvterm”. > > > > ------------- EXCEPTION: Bio::Root::Exception ------------- > > MSG: I couldn't find the 'Note' cvterm in the database; > > Did you load the feature property controlled vocabulary? > > STACK: Error::throw > > STACK: Bio::Root::Root::throw /usr/local/share/perl5/Bio/Root/Root.pm:447 > > STACK: Bio::GMOD::DB::Adapter::handle_note > /usr/local/share/perl5/Bio/GMOD/DB/Adapter.pm:3901 > > STACK: chado-1.31/load/bin/gmod_bulk_load_gff3.pl:974 > > ----------------------------------------------------------- > > > > I used “make” to create database and “make ontologies” to load > ontologies. > > I installed : Relationship Ontology, Sequence Ontology, Gene Ontology and > Chado Feature Properties > > A select in the cv table shows : > > 1 "null" > > 2 "local" "Locally created terms" > > 3 "chado_properties" "Terms that are used in the > chadoprop table to describe the state of the database" > > 4 "relationship" > > 5 "synonym_type" > > 6 "cvterm_property_type" > > 7 "anonymous" > > 8 "sequence" > > 9 "biological_process" > > 10 "molecular_function" > > 11 "cellular_component" > > 12 "external" > > > > Thank you > > Miharimamy > > > > *De :* Scott Cain <sc...@sc...> > *Envoyé :* lundi 20 janvier 2020 22:39 > *À :* RAJAONISON Andriamiharimamy <mir...@ya...> > *Cc :* GMOD Schema/Chado List <gmo...@li...> > *Objet :* Re: [Gmod-schema] [ISSUE] Load gff in chado > > > > Is_circular isn't a valid GFF tag. You can change it to is_circular to > fix it. > > > > On Mon, Jan 20, 2020 at 11:24 AM RAJAONISON Andriamiharimamy < > mir...@ya...> wrote: > > Hi again Scott, > > > > Here is the modified header of the file : > > > > ##gff-version 3 > > NZ_CP027390.1 . chromosome 1 5901472 . . > . ID=NZ_CP027390.1;Name=NZ_CP027390.1 > > NZ_CP027391.1 . plasmid 1 98724 . . . > ID=NZ_CP027391.1;Name=NZ_CP027391.1 > > NZ_CP027390.1 RefSeq gene 931 1197 . + . > ID=gene0;Name=AX062_RS00010;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS00010 > > NZ_CP027390.1 RefSeq gene 1191 1577 . - . > ID=gene1;Name=AX062_RS00015;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS00015 > > > > And here is the output of the script. > > > > ------------- EXCEPTION: Bio::Root::Exception ------------- > > MSG: The following tag(s) are illegal and are causing this parser to die: > Is_circular > > STACK: Error::throw > > STACK: Bio::Root::Root::throw /usr/local/share/perl5/Bio/Root/Root.pm:447 > > STACK: Bio::FeatureIO::gff::_handle_feature > /usr/local/share/perl5/Bio/FeatureIO/gff.pm:787 > > STACK: Bio::FeatureIO::gff::next_feature > /usr/local/share/perl5/Bio/FeatureIO/gff.pm:187 > > STACK: chado-1.31/load/bin/gmod_bulk_load_gff3.pl:782 > > ----------------------------------------------------------- > > > > I checked the tag in the file and here is the snippet causing issue: > > > > NZ_CP027390.1 RefSeq region 1 5802748 . + . > ID=id0;Dbxref=taxon:562;Is_circular=true;Name=ANONYMOUS;collected-by=CDC;country=USA;gbkey=Src;genome=chromosome;mol_type=genomic > DNA;serov > > ar=E. coli O26:Pending;strain=2015C-4944 > > NZ_CP027390.1 cmsearch riboswitch 4764335 4764483 . > + . ID=id113;Dbxref=RFAM:RF00050;Note=FMN > riboswitch;bound_moiety=flavin > mononucleotide;gbkey=regulatory;inference=COORDINATES: > > profile:INFERNAL:1.1.1;regulatory_class=riboswitch > > NZ_CP027390.1 RefSeq direct_repeat 5085858 5086435 . + > . ID=id120;gbkey=repeat_region;inference=COORDINATES: > alignment:pilercr:v1.02;rpt_family=CRISPR;rpt_type=direct;rpt_unit_range=508586 > > 2..5085885;rpt_unit_seq=tccccgcgccagcggggataaacc > > NZ_CP027390.1 RefSeq direct_repeat 5112137 5112653 . + > . ID=id121;gbkey=repeat_region;inference=COORDINATES: > alignment:pilercr:v1.02;rpt_family=CRISPR;rpt_type=direct;rpt_unit_range=511213 > > 7..5112165;rpt_unit_seq=gtgttccccgcgccagcggggataaaccg > > NZ_CP027391.1 RefSeq region 1 98724 . + . > ID=id147;Dbxref=taxon:562;Is_circular=true;Name=unnamed;collected-by=CDC;country=USA;gbkey=Src;genome=plasmid;mol_type=genomic > DNA;plasmid- > > name=unnamed;serovar=E. coli O26:Pending;strain=2015C-4944 > > > > Thank you > > Miharimamy > > > > *De :* Scott Cain <sc...@sc...> > *Envoyé :* lundi 20 janvier 2020 22:03 > *À :* RAJAONISON Andriamiharimamy <mir...@ya...> > *Cc :* GMOD Schema/Chado List <gmo...@li...> > *Objet :* Re: [Gmod-schema] [ISSUE] Load gff in chado > > > > Hi Miharimamy, > > The snippet of GFF that you posted had a chromosome named "NZ_CP027390.1" > which looks fine to me, but the error message is about a reference sequence > named "NZ_CP027391.1". Is it defined anywhere in your GFF? > > Scott > > > On Mon, Jan 20, 2020 at 10:58 AM RAJAONISON Andriamiharimamy < > mir...@ya...> wrote: > > > > Hi Scott, > > > > > > > > Even with the --recreate_cache and --remove_lock, I still get the same > error. > > > > > > > > Miharimamy > > > > > > > > > > > > (Re)creating the uniquename cache in the database... > > > > Creating table... > > > > Populating table... > > > > Creating indexes... > > > > Adjusting the primary key sequences (if necessary)...Done. > > > > Preparing data for inserting into the chado database > > > > (This may take a while ...) > > > > Unable to find srcfeature NZ_CP027391.1 in the database. > > > > Perhaps you need to rerun your data load with the '--recreate_cache' > option. at /usr/local/share/perl5/Bio/GMOD/DB/Adapter.pm line 4605, <GEN0> > line 5939. > > > > > Bio::GMOD::DB::Adapter::src_second_chance('Bio::GMOD::DB::Adapter=HASH(0x25623e0)', > 'Bio::SeqFeature::Annotated=HASH(0x2680c18)') called at chado-1.31/load/bin/ > gmod_bulk_load_gff3.pl line 851 > > > > > > > > Abnormal termination, trying to clean up... > > > > > > > > Attempting to clean up the loader temp table (so that --recreate_cache > > > > won't be needed)... > > > > Trying to remove the run lock (so that --remove_lock won't be needed)... > > > > Exiting... > > > > > > > > > > > > > > > > De : Scott Cain <sc...@sc...> > > Envoyé : lundi 20 janvier 2020 20:56 > > À : RAJAONISON Andriamiharimamy <mir...@ya...> > > Cc : GMOD Schema/Chado List <gmo...@li...> > > Objet : Re: [Gmod-schema] [ISSUE] Load gff in chado > > > > > > > > Hi Miharimamy, > > > > > > > > Did you try running with the --recreate_cache option as suggested by the > error message? > > > > > > > > On Mon, Jan 20, 2020 at 6:54 AM RAJAONISON Andriamiharimamy < > mir...@ya...> wrote: > > > > Hi Scott, > > > > You were right, the issue lied in the line 2. The field separator was 4 > spaces instead of tabulation. > > > > I converted it and tried to load the file back but went back to the > first error. > > > > > > > > Thank you for your time, > > > > Miharimamy > > > > > > > > Preparing data for inserting into the chado database > > > > (This may take a while ...) > > > > Unable to find srcfeature NZ_CP027391.1 in the database. > > > > Perhaps you need to rerun your data load with the '--recreate_cache' > option. at /usr/local/share/perl5/Bio/GMOD/DB/Adapter.pm line 4605, <GEN0> > line 5939. > > > > > Bio::GMOD::DB::Adapter::src_second_chance('Bio::GMOD::DB::Adapter=HASH(0x297c120)', > 'Bio::SeqFeature::Annotated=HASH(0x2bcd420)') called at chado-1.31/load/bin/ > gmod_bulk_load_gff3.pl line 851 > > > > > > > > Abnormal termination, trying to clean up... > > > > > > > > Attempting to clean up the loader temp table (so that --recreate_cache > > > > won't be needed)... > > > > Trying to remove the run lock (so that --remove_lock won't be needed)... > > > > Exiting... > > > > > > > > > > > > ##gff-version 3 > > > > NZ_CP027390.1 . chromosome 1 5901472 . . > . ID=NZ_CP027390.1;Name=NZ_CP027390.1 > > > > NZ_CP027390.1 RefSeq gene 931 1197 . + . > ID=gene0;Name=AX062_RS00010;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS00010 > > > > NZ_CP027390.1 RefSeq gene 1191 1577 . - . > ID=gene1;Name=AX062_RS00015;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS00015 > > > > NZ_CP027390.1 RefSeq gene 10194 10736 . + . > ID=gene10;Name=AX062_RS00060;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS00060 > > > > NZ_CP027390.1 RefSeq gene 80547 81860 . + . > ID=gene100;Name=AX062_RS00510;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS00510 > > > > NZ_CP027390.1 RefSeq gene 918692 918997 . - . > ID=gene1000;Name=AX062_RS05010;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS05010 > > > > NZ_CP027390.1 RefSeq gene 919105 919815 . + . > ID=gene1001;Name=AX062_RS05015;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS05015 > > > > NZ_CP027390.1 RefSeq gene 919818 920378 . - . > ID=gene1002;Name=AX062_RS05020;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS05020 > > > > NZ_CP027390.1 RefSeq gene 920413 920754 . - . > ID=gene1003;Name=AX062_RS05025;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS05025 > > > > NZ_CP027390.1 RefSeq gene 920889 921215 . + . > ID=gene1004;Name=AX062_RS05030;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS05030 > > > > NZ_CP027390.1 RefSeq gene 921252 921440 . + . > ID=gene1005;Name=AX062_RS05035;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS05035 > > > > NZ_CP027390.1 RefSeq gene 921421 922635 . + . > ID=gene1006;Name=AX062_RS05040;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS05040 > > > > > > > > > > > > De : Scott Cain <sc...@sc...> > > Envoyé : vendredi 17 janvier 2020 21:49 > > À : RAJAONISON Andriamiharimamy <mir...@ya...> > > Cc : GMOD Schema/Chado List <gmo...@li...> > > Objet : Re: [Gmod-schema] [ISSUE] Load gff in chado > > > > > > > > Hi Miharimamy, > > > > > > > > The end of the error message where it says "line 2" I'm pretty sure > means the problem is in line 2 of your GFF file. Is that the line you > added? What does it look like? > > > > > > > > Scott > > > > > > > > > > > > On Fri, Jan 17, 2020 at 5:57 AM RAJAONISON Andriamiharimamy < > mir...@ya...> wrote: > > > > Hi Scott, > > > > > > > > Thank you again for your help because adding the parent feature removes > the srcfeature related error. > > > > Yet I get another issue that I don’t understand : > > > > > > > > --------------------- WARNING --------------------- > > > > MSG: Can not set Bio::Location::Simple::end() equal to start; start not > set > > > > --------------------------------------------------- > > > > Use of uninitialized value $featuretype in pattern match (m//) at > chado-1.31/load/bin/gmod_bulk_load_gff3.pl line 819, <GEN0> line 2. > > > > Use of uninitialized value $featuretype in pattern match (m//) at > chado-1.31/load/bin/gmod_bulk_load_gff3.pl line 820, <GEN0> line 2. > > > > > > > > ------------- EXCEPTION: Bio::Root::Exception ------------- > > > > MSG: no cvterm for > > > > STACK: Error::throw > > > > STACK: Bio::Root::Root::throw /usr/local/share/perl5/Bio/Root/Root.pm:447 > > > > STACK: Bio::GMOD::DB::Adapter::get_type > /usr/local/share/perl5/Bio/GMOD/DB/Adapter.pm:4629 > > > > STACK: chado-1.31/load/bin/gmod_bulk_load_gff3.pl:848 > > > > ----------------------------------------------------------- > > > > > > > > Abnormal termination, trying to clean up... > > > > > > > > Attempting to clean up the loader temp table (so that --recreate_cache > > > > won't be needed)... > > > > Trying to remove the run lock (so that --remove_lock won't be needed)... > > > > Exiting... > > > > > > > > Thank you > > > > > > > > Miharimamy > > > > > > > > > > > > De : Scott Cain <sc...@sc...> > > Envoyé : jeudi 16 janvier 2020 22:31 > > À : RAJAONISON Andriamiharimamy <mir...@ya...> > > Cc : GMOD Schema/Chado List <gmo...@li...> > > Objet : Re: [Gmod-schema] [ISSUE] Load gff in chado > > > > > > > > Hi Miharimamy, > > > > > > > > Thanks for sending this report. Generally, loading GFF into Chado can > be difficult, as the perl-based loader that you are using can be quite > particular about the format of the GFF and producers of GFF generally > aren't so particular. Since the loader makes the (in my view, correct) > decision to not load anything if it can't load everything in a file, it > quits. So, taking each of the problems you found in order: > > > > > > > > 1. "dbxref=NCBI:" isn't a valid accession (presumably there should be > something after the "NCBI:") and so the loader won't continue. Two options > here are to contact SGD and point out that there is a problem with their > GFF, or delete the offending item and try again, which it looks like you > did for item 2. > > > > > > > > 2. If the item Note is missing from the cvterm table, I think that > probably means that you didn't install the schema using the "make" > procedure that would have installed some necessary items in the cv and > cvterm table. > > > > > > > > 3 and 4: Messages that srcfeatures can't be found mean that the feature > on which the features in your GFF reside hasn't been defined yet (that is, > the thing referred to in column 1 of the GFF doesn't exist). Frequently, > creators of GFF don't define the reference sequence in the GFF for whatever > reason (it's not required by the GFF3 spec, since it might be credibly > defined elsewhere). To define it in the GFF you have, add a line before > anything else that looks something like this: > > > > > > > > NZ_CP027390.1 . chromosome 1 1234566 . . . > ID=NZ_CP027390.1;Name=NZ_CP027390.1 > > > > > > > > Where "NZ_CP027390.1" is the name of the reference sequence, > "chromosome" is the Sequence Ontology term for the type of thing (other > options would include "contig" and "supercontig"), and the "123456" is the > length of the sequence. > > > > > > > > Finally, I would add that using Chado through Tripal frequently makes > life easier, though I think all of these issues would still have been > problems (with the probable exception of item 2--Tripal would have > initialized the "Note" item in cvterm I think). > > > > > > > > Scott > > > > > > > > > > > > On Thu, Jan 16, 2020 at 11:15 AM RAJAONISON Andriamiharimamy via > Gmod-schema <gmo...@li...> wrote: > > > > Hi, > > > > > > > > Hi, > > > > Hope this mail will find you well. Send you my best wishes for this new > year. > > > > I am reaching to you because I have issues loading GFF files into you > Chado. I tried several files but none of them seems to work. > > > > > > > > Thank you in advance for you help > > > > > > > > Miharimamy > > > > > > > > The S.cerevisiae example in gmod.org > > > > Command : > > > > chado-1.31/load/bin/gmod_bulk_load_gff3.pl --organism yeast --gfffile > saccharomyces/saccharomyces_cerevisiae.gff.sorted --fastafile > saccharomyces/saccharomyces_cerevisiae.gff.sorted.fasta > > > > > > > > Output : > > > > ------------- EXCEPTION: Bio::Root::Exception ------------- > > > > MSG: Error in line: > > > > chrIII SGD chromosome 1 316620 . . . > ID=chrIII;dbxref=NCBI:;Name=chrIII > > > > > > > > Dbxref value 'NCBI:' did not conform to GFF3 specification > > > > STACK: Error::throw > > > > STACK: Bio::Root::Root::throw /usr/local/share/perl5/Bio/Root/Root.pm:447 > > > > STACK: Bio::FeatureIO::gff::_handle_feature > /usr/local/share/perl5/Bio/FeatureIO/gff.pm:659 > > > > STACK: Bio::FeatureIO::gff::next_feature > /usr/local/share/perl5/Bio/FeatureIO/gff.pm:187 > > > > STACK: chado-1.31/load/bin/gmod_bulk_load_gff3.pl:782 > > > > ----------------------------------------------------------- > > > > S.cerevisiae without dbxref > > > > Command : > > > > chado-1.31/load/bin/gmod_bulk_load_gff3.pl --organism yeast --gfffile > saccharomyces/test_saccharomyces_ncbi --fastafile > saccharomyces/saccharomyces_cerevisiae.gff.sorted.fasta > > > > > > > > Output : > > > > ------------- EXCEPTION: Bio::Root::Exception ------------- > > > > MSG: I couldn't find the 'Note' cvterm in the database; > > > > Did you load the feature property controlled vocabulary? > > > > STACK: Error::throw > > > > STACK: Bio::Root::Root::throw /usr/local/share/perl5/Bio/Root/Root.pm:447 > > > > STACK: Bio::GMOD::DB::Adapter::handle_note > /usr/local/share/perl5/Bio/GMOD/DB/Adapter.pm:3901 > > > > STACK: chado-1.31/load/bin/gmod_bulk_load_gff3.pl:974 > > > > > > > > A GFF from NCBI > > > > Command : > > > > chado-1.31/load/bin/gmod_bulk_load_gff3.pl --organism bacteria > --gfffile GCF_000008865.2/GCF_003018035.1_ASM301803v1_genomic.gff.sorted > > > > > > > > Output : > > > > Unable to find srcfeature NZ_CP027390.1 in the database. > > > > Perhaps you need to rerun your data load with the '--recreate_cache' > option. at /usr/local/share/perl5/Bio/GMOD/DB/Adapter.pm line 4605, <GEN0> > line 2. > > > > > Bio::GMOD::DB::Adapter::src_second_chance('Bio::GMOD::DB::Adapter=HASH(0x241a8d0)', > 'Bio::SeqFeature::Annotated=HASH(0x26c61d0)') called at chado-1.31/load/bin/ > gmod_bulk_load_gff3.pl line 851 > > > > > > > > A GFF3 from prokka > > > > Command : > > > > chado-1.31/load/bin/gmod_bulk_load_gff3.pl --organism bacteria -gfffile > GFF3/GCF_000455285.gff.sorted --fastafile > GFF3/GCF_000455285.gff.sorted.fasta > > > > Output : > > > > Unable to find srcfeature gnl|Prokka|GCF_000455285_1 in the database. > > > > Perhaps you need to rerun your data load with the '--recreate_cache' > option. at /usr/local/share/perl5/Bio/GMOD/DB/Adapter.pm line 4605, <GEN0> > line 2. > > > > > Bio::GMOD::DB::Adapter::src_second_chance('Bio::GMOD::DB::Adapter=HASH(0x33008f0)', > 'Bio::SeqFeature::Annotated=HASH(0x35ac0a0)') called at chado-1.31/load/bin/ > gmod_bulk_load_gff3.pl line 851 > > > > > > > > > > > > > > > > > > > > _______________________________________________ > > Gmod-schema mailing list > > Gmo...@li... > > https://lists.sourceforge.net/lists/listinfo/gmod-schema > > > > > > > > -- > > > > ------------------------------------------------------------------------ > > Scott Cain, Ph. D. scott at scottcain > dot net > > GMOD Coordinator (http://gmod.org/) 216-392-3087 > > Ontario Institute for Cancer Research > > > > > > > > -- > > > > ------------------------------------------------------------------------ > > Scott Cain, Ph. D. scott at scottcain > dot net > > GMOD Coordinator (http://gmod.org/) 216-392-3087 > > Ontario Institute for Cancer Research > > > > > > > > -- > > > > ------------------------------------------------------------------------ > > Scott Cain, Ph. D. scott at scottcain > dot net > > GMOD Coordinator (http://gmod.org/) 216-392-3087 > > Ontario Institute for Cancer Research > > > > -- > ------------------------------------------------------------------------ > Scott Cain, Ph. D. scott at scottcain > dot net > GMOD Coordinator (http://gmod.org/) 216-392-3087 > Ontario Institute for Cancer Research > > > > -- > > ------------------------------------------------------------------------ > Scott Cain, Ph. D. scott at scottcain > dot net > GMOD Coordinator (http://gmod.org/) 216-392-3087 > Ontario Institute for Cancer Research > -- ------------------------------------------------------------------------ Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research |