From: Scott C. <sc...@sc...> - 2020-01-20 21:33:48
|
Ah, you have chado_properties (cv_id 3) but not feature_properties, which is where Note comes from. That should come from the make command but it appears to have not worked. To run the make command again, my recollection is that you have to remove a directory from the "load" directory. Do this: try running that make command again and select feature properties from the menu. If nothing happens, my recollection is that running "make clean" will remove the lock files that prevent loading an ontology twice (which is what it thinks you'll be trying to do I think). On Mon, Jan 20, 2020 at 1:11 PM RAJAONISON Andriamiharimamy < mir...@ya...> wrote: > Here it is > > select * from cvterm where cv_id=3 > > > > cvterm_id cv_id name definition > > 3 3 "version" "Chado schema version" > > *De :* Scott Cain <sc...@sc...> > *Envoyé :* lundi 20 janvier 2020 23:47 > *À :* RAJAONISON Andriamiharimamy <mir...@ya...> > *Cc :* GMOD Schema/Chado List <gmo...@li...> > *Objet :* Re: [Gmod-schema] [ISSUE] Load gff in chado > > > > Can you also do a "select * from cvterm where cv_id=3" and show us that? > > > > On Mon, Jan 20, 2020 at 12:45 PM RAJAONISON Andriamiharimamy < > mir...@ya...> wrote: > > Thank you, it worked but I run into the missing “'Note' cvterm”. > > > > ------------- EXCEPTION: Bio::Root::Exception ------------- > > MSG: I couldn't find the 'Note' cvterm in the database; > > Did you load the feature property controlled vocabulary? > > STACK: Error::throw > > STACK: Bio::Root::Root::throw /usr/local/share/perl5/Bio/Root/Root.pm:447 > > STACK: Bio::GMOD::DB::Adapter::handle_note > /usr/local/share/perl5/Bio/GMOD/DB/Adapter.pm:3901 > > STACK: chado-1.31/load/bin/gmod_bulk_load_gff3.pl:974 > > ----------------------------------------------------------- > > > > I used “make” to create database and “make ontologies” to load > ontologies. > > I installed : Relationship Ontology, Sequence Ontology, Gene Ontology and > Chado Feature Properties > > A select in the cv table shows : > > 1 "null" > > 2 "local" "Locally created terms" > > 3 "chado_properties" "Terms that are used in the > chadoprop table to describe the state of the database" > > 4 "relationship" > > 5 "synonym_type" > > 6 "cvterm_property_type" > > 7 "anonymous" > > 8 "sequence" > > 9 "biological_process" > > 10 "molecular_function" > > 11 "cellular_component" > > 12 "external" > > > > Thank you > > Miharimamy > > > > *De :* Scott Cain <sc...@sc...> > *Envoyé :* lundi 20 janvier 2020 22:39 > *À :* RAJAONISON Andriamiharimamy <mir...@ya...> > *Cc :* GMOD Schema/Chado List <gmo...@li...> > *Objet :* Re: [Gmod-schema] [ISSUE] Load gff in chado > > > > Is_circular isn't a valid GFF tag. You can change it to is_circular to > fix it. > > > > On Mon, Jan 20, 2020 at 11:24 AM RAJAONISON Andriamiharimamy < > mir...@ya...> wrote: > > Hi again Scott, > > > > Here is the modified header of the file : > > > > ##gff-version 3 > > NZ_CP027390.1 . chromosome 1 5901472 . . > . ID=NZ_CP027390.1;Name=NZ_CP027390.1 > > NZ_CP027391.1 . plasmid 1 98724 . . . > ID=NZ_CP027391.1;Name=NZ_CP027391.1 > > NZ_CP027390.1 RefSeq gene 931 1197 . + . > ID=gene0;Name=AX062_RS00010;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS00010 > > NZ_CP027390.1 RefSeq gene 1191 1577 . - . > ID=gene1;Name=AX062_RS00015;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS00015 > > > > And here is the output of the script. > > > > ------------- EXCEPTION: Bio::Root::Exception ------------- > > MSG: The following tag(s) are illegal and are causing this parser to die: > Is_circular > > STACK: Error::throw > > STACK: Bio::Root::Root::throw /usr/local/share/perl5/Bio/Root/Root.pm:447 > > STACK: Bio::FeatureIO::gff::_handle_feature > /usr/local/share/perl5/Bio/FeatureIO/gff.pm:787 > > STACK: Bio::FeatureIO::gff::next_feature > /usr/local/share/perl5/Bio/FeatureIO/gff.pm:187 > > STACK: chado-1.31/load/bin/gmod_bulk_load_gff3.pl:782 > > ----------------------------------------------------------- > > > > I checked the tag in the file and here is the snippet causing issue: > > > > NZ_CP027390.1 RefSeq region 1 5802748 . + . > ID=id0;Dbxref=taxon:562;Is_circular=true;Name=ANONYMOUS;collected-by=CDC;country=USA;gbkey=Src;genome=chromosome;mol_type=genomic > DNA;serov > > ar=E. coli O26:Pending;strain=2015C-4944 > > NZ_CP027390.1 cmsearch riboswitch 4764335 4764483 . > + . ID=id113;Dbxref=RFAM:RF00050;Note=FMN > riboswitch;bound_moiety=flavin > mononucleotide;gbkey=regulatory;inference=COORDINATES: > > profile:INFERNAL:1.1.1;regulatory_class=riboswitch > > NZ_CP027390.1 RefSeq direct_repeat 5085858 5086435 . + > . ID=id120;gbkey=repeat_region;inference=COORDINATES: > alignment:pilercr:v1.02;rpt_family=CRISPR;rpt_type=direct;rpt_unit_range=508586 > > 2..5085885;rpt_unit_seq=tccccgcgccagcggggataaacc > > NZ_CP027390.1 RefSeq direct_repeat 5112137 5112653 . + > . ID=id121;gbkey=repeat_region;inference=COORDINATES: > alignment:pilercr:v1.02;rpt_family=CRISPR;rpt_type=direct;rpt_unit_range=511213 > > 7..5112165;rpt_unit_seq=gtgttccccgcgccagcggggataaaccg > > NZ_CP027391.1 RefSeq region 1 98724 . + . > ID=id147;Dbxref=taxon:562;Is_circular=true;Name=unnamed;collected-by=CDC;country=USA;gbkey=Src;genome=plasmid;mol_type=genomic > DNA;plasmid- > > name=unnamed;serovar=E. coli O26:Pending;strain=2015C-4944 > > > > Thank you > > Miharimamy > > > > *De :* Scott Cain <sc...@sc...> > *Envoyé :* lundi 20 janvier 2020 22:03 > *À :* RAJAONISON Andriamiharimamy <mir...@ya...> > *Cc :* GMOD Schema/Chado List <gmo...@li...> > *Objet :* Re: [Gmod-schema] [ISSUE] Load gff in chado > > > > Hi Miharimamy, > > The snippet of GFF that you posted had a chromosome named "NZ_CP027390.1" > which looks fine to me, but the error message is about a reference sequence > named "NZ_CP027391.1". Is it defined anywhere in your GFF? > > Scott > > > On Mon, Jan 20, 2020 at 10:58 AM RAJAONISON Andriamiharimamy < > mir...@ya...> wrote: > > > > Hi Scott, > > > > > > > > Even with the --recreate_cache and --remove_lock, I still get the same > error. > > > > > > > > Miharimamy > > > > > > > > > > > > (Re)creating the uniquename cache in the database... > > > > Creating table... > > > > Populating table... > > > > Creating indexes... > > > > Adjusting the primary key sequences (if necessary)...Done. > > > > Preparing data for inserting into the chado database > > > > (This may take a while ...) > > > > Unable to find srcfeature NZ_CP027391.1 in the database. > > > > Perhaps you need to rerun your data load with the '--recreate_cache' > option. at /usr/local/share/perl5/Bio/GMOD/DB/Adapter.pm line 4605, <GEN0> > line 5939. > > > > > Bio::GMOD::DB::Adapter::src_second_chance('Bio::GMOD::DB::Adapter=HASH(0x25623e0)', > 'Bio::SeqFeature::Annotated=HASH(0x2680c18)') called at chado-1.31/load/bin/ > gmod_bulk_load_gff3.pl line 851 > > > > > > > > Abnormal termination, trying to clean up... > > > > > > > > Attempting to clean up the loader temp table (so that --recreate_cache > > > > won't be needed)... > > > > Trying to remove the run lock (so that --remove_lock won't be needed)... > > > > Exiting... > > > > > > > > > > > > > > > > De : Scott Cain <sc...@sc...> > > Envoyé : lundi 20 janvier 2020 20:56 > > À : RAJAONISON Andriamiharimamy <mir...@ya...> > > Cc : GMOD Schema/Chado List <gmo...@li...> > > Objet : Re: [Gmod-schema] [ISSUE] Load gff in chado > > > > > > > > Hi Miharimamy, > > > > > > > > Did you try running with the --recreate_cache option as suggested by the > error message? > > > > > > > > On Mon, Jan 20, 2020 at 6:54 AM RAJAONISON Andriamiharimamy < > mir...@ya...> wrote: > > > > Hi Scott, > > > > You were right, the issue lied in the line 2. The field separator was 4 > spaces instead of tabulation. > > > > I converted it and tried to load the file back but went back to the > first error. > > > > > > > > Thank you for your time, > > > > Miharimamy > > > > > > > > Preparing data for inserting into the chado database > > > > (This may take a while ...) > > > > Unable to find srcfeature NZ_CP027391.1 in the database. > > > > Perhaps you need to rerun your data load with the '--recreate_cache' > option. at /usr/local/share/perl5/Bio/GMOD/DB/Adapter.pm line 4605, <GEN0> > line 5939. > > > > > Bio::GMOD::DB::Adapter::src_second_chance('Bio::GMOD::DB::Adapter=HASH(0x297c120)', > 'Bio::SeqFeature::Annotated=HASH(0x2bcd420)') called at chado-1.31/load/bin/ > gmod_bulk_load_gff3.pl line 851 > > > > > > > > Abnormal termination, trying to clean up... > > > > > > > > Attempting to clean up the loader temp table (so that --recreate_cache > > > > won't be needed)... > > > > Trying to remove the run lock (so that --remove_lock won't be needed)... > > > > Exiting... > > > > > > > > > > > > ##gff-version 3 > > > > NZ_CP027390.1 . chromosome 1 5901472 . . > . ID=NZ_CP027390.1;Name=NZ_CP027390.1 > > > > NZ_CP027390.1 RefSeq gene 931 1197 . + . > ID=gene0;Name=AX062_RS00010;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS00010 > > > > NZ_CP027390.1 RefSeq gene 1191 1577 . - . > ID=gene1;Name=AX062_RS00015;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS00015 > > > > NZ_CP027390.1 RefSeq gene 10194 10736 . + . > ID=gene10;Name=AX062_RS00060;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS00060 > > > > NZ_CP027390.1 RefSeq gene 80547 81860 . + . > ID=gene100;Name=AX062_RS00510;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS00510 > > > > NZ_CP027390.1 RefSeq gene 918692 918997 . - . > ID=gene1000;Name=AX062_RS05010;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS05010 > > > > NZ_CP027390.1 RefSeq gene 919105 919815 . + . > ID=gene1001;Name=AX062_RS05015;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS05015 > > > > NZ_CP027390.1 RefSeq gene 919818 920378 . - . > ID=gene1002;Name=AX062_RS05020;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS05020 > > > > NZ_CP027390.1 RefSeq gene 920413 920754 . - . > ID=gene1003;Name=AX062_RS05025;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS05025 > > > > NZ_CP027390.1 RefSeq gene 920889 921215 . + . > ID=gene1004;Name=AX062_RS05030;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS05030 > > > > NZ_CP027390.1 RefSeq gene 921252 921440 . + . > ID=gene1005;Name=AX062_RS05035;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS05035 > > > > NZ_CP027390.1 RefSeq gene 921421 922635 . + . > ID=gene1006;Name=AX062_RS05040;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS05040 > > > > > > > > > > > > De : Scott Cain <sc...@sc...> > > Envoyé : vendredi 17 janvier 2020 21:49 > > À : RAJAONISON Andriamiharimamy <mir...@ya...> > > Cc : GMOD Schema/Chado List <gmo...@li...> > > Objet : Re: [Gmod-schema] [ISSUE] Load gff in chado > > > > > > > > Hi Miharimamy, > > > > > > > > The end of the error message where it says "line 2" I'm pretty sure > means the problem is in line 2 of your GFF file. Is that the line you > added? What does it look like? > > > > > > > > Scott > > > > > > > > > > > > On Fri, Jan 17, 2020 at 5:57 AM RAJAONISON Andriamiharimamy < > mir...@ya...> wrote: > > > > Hi Scott, > > > > > > > > Thank you again for your help because adding the parent feature removes > the srcfeature related error. > > > > Yet I get another issue that I don’t understand : > > > > > > > > --------------------- WARNING --------------------- > > > > MSG: Can not set Bio::Location::Simple::end() equal to start; start not > set > > > > --------------------------------------------------- > > > > Use of uninitialized value $featuretype in pattern match (m//) at > chado-1.31/load/bin/gmod_bulk_load_gff3.pl line 819, <GEN0> line 2. > > > > Use of uninitialized value $featuretype in pattern match (m//) at > chado-1.31/load/bin/gmod_bulk_load_gff3.pl line 820, <GEN0> line 2. > > > > > > > > ------------- EXCEPTION: Bio::Root::Exception ------------- > > > > MSG: no cvterm for > > > > STACK: Error::throw > > > > STACK: Bio::Root::Root::throw /usr/local/share/perl5/Bio/Root/Root.pm:447 > > > > STACK: Bio::GMOD::DB::Adapter::get_type > /usr/local/share/perl5/Bio/GMOD/DB/Adapter.pm:4629 > > > > STACK: chado-1.31/load/bin/gmod_bulk_load_gff3.pl:848 > > > > ----------------------------------------------------------- > > > > > > > > Abnormal termination, trying to clean up... > > > > > > > > Attempting to clean up the loader temp table (so that --recreate_cache > > > > won't be needed)... > > > > Trying to remove the run lock (so that --remove_lock won't be needed)... > > > > Exiting... > > > > > > > > Thank you > > > > > > > > Miharimamy > > > > > > > > > > > > De : Scott Cain <sc...@sc...> > > Envoyé : jeudi 16 janvier 2020 22:31 > > À : RAJAONISON Andriamiharimamy <mir...@ya...> > > Cc : GMOD Schema/Chado List <gmo...@li...> > > Objet : Re: [Gmod-schema] [ISSUE] Load gff in chado > > > > > > > > Hi Miharimamy, > > > > > > > > Thanks for sending this report. Generally, loading GFF into Chado can > be difficult, as the perl-based loader that you are using can be quite > particular about the format of the GFF and producers of GFF generally > aren't so particular. Since the loader makes the (in my view, correct) > decision to not load anything if it can't load everything in a file, it > quits. So, taking each of the problems you found in order: > > > > > > > > 1. "dbxref=NCBI:" isn't a valid accession (presumably there should be > something after the "NCBI:") and so the loader won't continue. Two options > here are to contact SGD and point out that there is a problem with their > GFF, or delete the offending item and try again, which it looks like you > did for item 2. > > > > > > > > 2. If the item Note is missing from the cvterm table, I think that > probably means that you didn't install the schema using the "make" > procedure that would have installed some necessary items in the cv and > cvterm table. > > > > > > > > 3 and 4: Messages that srcfeatures can't be found mean that the feature > on which the features in your GFF reside hasn't been defined yet (that is, > the thing referred to in column 1 of the GFF doesn't exist). Frequently, > creators of GFF don't define the reference sequence in the GFF for whatever > reason (it's not required by the GFF3 spec, since it might be credibly > defined elsewhere). To define it in the GFF you have, add a line before > anything else that looks something like this: > > > > > > > > NZ_CP027390.1 . chromosome 1 1234566 . . . > ID=NZ_CP027390.1;Name=NZ_CP027390.1 > > > > > > > > Where "NZ_CP027390.1" is the name of the reference sequence, > "chromosome" is the Sequence Ontology term for the type of thing (other > options would include "contig" and "supercontig"), and the "123456" is the > length of the sequence. > > > > > > > > Finally, I would add that using Chado through Tripal frequently makes > life easier, though I think all of these issues would still have been > problems (with the probable exception of item 2--Tripal would have > initialized the "Note" item in cvterm I think). > > > > > > > > Scott > > > > > > > > > > > > On Thu, Jan 16, 2020 at 11:15 AM RAJAONISON Andriamiharimamy via > Gmod-schema <gmo...@li...> wrote: > > > > Hi, > > > > > > > > Hi, > > > > Hope this mail will find you well. Send you my best wishes for this new > year. > > > > I am reaching to you because I have issues loading GFF files into you > Chado. I tried several files but none of them seems to work. > > > > > > > > Thank you in advance for you help > > > > > > > > Miharimamy > > > > > > > > The S.cerevisiae example in gmod.org > > > > Command : > > > > chado-1.31/load/bin/gmod_bulk_load_gff3.pl --organism yeast --gfffile > saccharomyces/saccharomyces_cerevisiae.gff.sorted --fastafile > saccharomyces/saccharomyces_cerevisiae.gff.sorted.fasta > > > > > > > > Output : > > > > ------------- EXCEPTION: Bio::Root::Exception ------------- > > > > MSG: Error in line: > > > > chrIII SGD chromosome 1 316620 . . . > ID=chrIII;dbxref=NCBI:;Name=chrIII > > > > > > > > Dbxref value 'NCBI:' did not conform to GFF3 specification > > > > STACK: Error::throw > > > > STACK: Bio::Root::Root::throw /usr/local/share/perl5/Bio/Root/Root.pm:447 > > > > STACK: Bio::FeatureIO::gff::_handle_feature > /usr/local/share/perl5/Bio/FeatureIO/gff.pm:659 > > > > STACK: Bio::FeatureIO::gff::next_feature > /usr/local/share/perl5/Bio/FeatureIO/gff.pm:187 > > > > STACK: chado-1.31/load/bin/gmod_bulk_load_gff3.pl:782 > > > > ----------------------------------------------------------- > > > > S.cerevisiae without dbxref > > > > Command : > > > > chado-1.31/load/bin/gmod_bulk_load_gff3.pl --organism yeast --gfffile > saccharomyces/test_saccharomyces_ncbi --fastafile > saccharomyces/saccharomyces_cerevisiae.gff.sorted.fasta > > > > > > > > Output : > > > > ------------- EXCEPTION: Bio::Root::Exception ------------- > > > > MSG: I couldn't find the 'Note' cvterm in the database; > > > > Did you load the feature property controlled vocabulary? > > > > STACK: Error::throw > > > > STACK: Bio::Root::Root::throw /usr/local/share/perl5/Bio/Root/Root.pm:447 > > > > STACK: Bio::GMOD::DB::Adapter::handle_note > /usr/local/share/perl5/Bio/GMOD/DB/Adapter.pm:3901 > > > > STACK: chado-1.31/load/bin/gmod_bulk_load_gff3.pl:974 > > > > > > > > A GFF from NCBI > > > > Command : > > > > chado-1.31/load/bin/gmod_bulk_load_gff3.pl --organism bacteria > --gfffile GCF_000008865.2/GCF_003018035.1_ASM301803v1_genomic.gff.sorted > > > > > > > > Output : > > > > Unable to find srcfeature NZ_CP027390.1 in the database. > > > > Perhaps you need to rerun your data load with the '--recreate_cache' > option. at /usr/local/share/perl5/Bio/GMOD/DB/Adapter.pm line 4605, <GEN0> > line 2. > > > > > Bio::GMOD::DB::Adapter::src_second_chance('Bio::GMOD::DB::Adapter=HASH(0x241a8d0)', > 'Bio::SeqFeature::Annotated=HASH(0x26c61d0)') called at chado-1.31/load/bin/ > gmod_bulk_load_gff3.pl line 851 > > > > > > > > A GFF3 from prokka > > > > Command : > > > > chado-1.31/load/bin/gmod_bulk_load_gff3.pl --organism bacteria -gfffile > GFF3/GCF_000455285.gff.sorted --fastafile > GFF3/GCF_000455285.gff.sorted.fasta > > > > Output : > > > > Unable to find srcfeature gnl|Prokka|GCF_000455285_1 in the database. > > > > Perhaps you need to rerun your data load with the '--recreate_cache' > option. at /usr/local/share/perl5/Bio/GMOD/DB/Adapter.pm line 4605, <GEN0> > line 2. > > > > > Bio::GMOD::DB::Adapter::src_second_chance('Bio::GMOD::DB::Adapter=HASH(0x33008f0)', > 'Bio::SeqFeature::Annotated=HASH(0x35ac0a0)') called at chado-1.31/load/bin/ > gmod_bulk_load_gff3.pl line 851 > > > > > > > > > > > > > > > > > > > > _______________________________________________ > > Gmod-schema mailing list > > Gmo...@li... > > https://lists.sourceforge.net/lists/listinfo/gmod-schema > > > > > > > > -- > > > > ------------------------------------------------------------------------ > > Scott Cain, Ph. D. scott at scottcain > dot net > > GMOD Coordinator (http://gmod.org/) 216-392-3087 > > Ontario Institute for Cancer Research > > > > > > > > -- > > > > ------------------------------------------------------------------------ > > Scott Cain, Ph. D. scott at scottcain > dot net > > GMOD Coordinator (http://gmod.org/) 216-392-3087 > > Ontario Institute for Cancer Research > > > > > > > > -- > > > > ------------------------------------------------------------------------ > > Scott Cain, Ph. D. scott at scottcain > dot net > > GMOD Coordinator (http://gmod.org/) 216-392-3087 > > Ontario Institute for Cancer Research > > > > -- > ------------------------------------------------------------------------ > Scott Cain, Ph. D. scott at scottcain > dot net > GMOD Coordinator (http://gmod.org/) 216-392-3087 > Ontario Institute for Cancer Research > > > > -- > > ------------------------------------------------------------------------ > Scott Cain, Ph. D. scott at scottcain > dot net > GMOD Coordinator (http://gmod.org/) 216-392-3087 > Ontario Institute for Cancer Research > > > > -- > > ------------------------------------------------------------------------ > Scott Cain, Ph. D. scott at scottcain > dot net > GMOD Coordinator (http://gmod.org/) 216-392-3087 > Ontario Institute for Cancer Research > -- ------------------------------------------------------------------------ Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research |