gmod-schema Mailing List for Generic Model Organism Database Project (Page 3)

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

Hi Scott,

You were right, the issue lied in the line 2. The field separator was 4 spaces instead of tabulation.

I converted it and tried to load the file back but went back to the first error. 

Thank you for your time,

Miharimamy

Preparing data for inserting into the chado database

(This may take a while ...)

Unable to find srcfeature NZ_CP027391.1 in the database.

Perhaps you need to rerun your data load with the '--recreate_cache' option. at /usr/local/share/perl5/Bio/GMOD/DB/Adapter.pm line 4605, <GEN0> line 5939.

        Bio::GMOD::DB::Adapter::src_second_chance('Bio::GMOD::DB::Adapter=HASH(0x297c120)', 'Bio::SeqFeature::Annotated=HASH(0x2bcd420)') called at chado-1.31/load/bin/gmod_bulk_load_gff3.pl line 851

Abnormal termination, trying to clean up...

Attempting to clean up the loader temp table (so that --recreate_cache

won't be needed)...

Trying to remove the run lock (so that --remove_lock won't be needed)...

Exiting...

##gff-version 3

NZ_CP027390.1   .       chromosome      1       5901472 .       .       .       ID=NZ_CP027390.1;Name=NZ_CP027390.1

NZ_CP027390.1   RefSeq  gene    931     1197    .       +       .       ID=gene0;Name=AX062_RS00010;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS00010

NZ_CP027390.1   RefSeq  gene    1191    1577    .       -       .       ID=gene1;Name=AX062_RS00015;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS00015

NZ_CP027390.1   RefSeq  gene    10194   10736   .       +       .       ID=gene10;Name=AX062_RS00060;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS00060

NZ_CP027390.1   RefSeq  gene    80547   81860   .       +       .       ID=gene100;Name=AX062_RS00510;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS00510

NZ_CP027390.1   RefSeq  gene    918692  918997  .       -       .       ID=gene1000;Name=AX062_RS05010;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS05010

NZ_CP027390.1   RefSeq  gene    919105  919815  .       +       .       ID=gene1001;Name=AX062_RS05015;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS05015

NZ_CP027390.1   RefSeq  gene    919818  920378  .       -       .       ID=gene1002;Name=AX062_RS05020;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS05020

NZ_CP027390.1   RefSeq  gene    920413  920754  .       -       .       ID=gene1003;Name=AX062_RS05025;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS05025

NZ_CP027390.1   RefSeq  gene    920889  921215  .       +       .       ID=gene1004;Name=AX062_RS05030;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS05030

NZ_CP027390.1   RefSeq  gene    921252  921440  .       +       .       ID=gene1005;Name=AX062_RS05035;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS05035

NZ_CP027390.1   RefSeq  gene    921421  922635  .       +       .       ID=gene1006;Name=AX062_RS05040;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS05040

De : Scott Cain <sc...@sc...> 
Envoyé : vendredi 17 janvier 2020 21:49
À : RAJAONISON Andriamiharimamy <mir...@ya...>
Cc : GMOD Schema/Chado List <gmo...@li...>
Objet : Re: [Gmod-schema] [ISSUE] Load gff in chado

Hi Miharimamy,

The end of the error message where it says "line 2" I'm pretty sure means the problem is in line 2 of your GFF file.  Is that the line you added? What does it look like?

Scott

On Fri, Jan 17, 2020 at 5:57 AM RAJAONISON Andriamiharimamy <mir...@ya... <mailto:mir...@ya...> > wrote:

Hi Scott,

Thank you again for your help because adding the parent feature removes the srcfeature related error. 

Yet I get another issue that I don’t understand : 

--------------------- WARNING ---------------------

MSG: Can not set Bio::Location::Simple::end() equal to start; start not set

---------------------------------------------------

Use of uninitialized value $featuretype in pattern match (m//) at chado-1.31/load/bin/gmod_bulk_load_gff3.pl <http://gmod_bulk_load_gff3.pl>  line 819, <GEN0> line 2.

Use of uninitialized value $featuretype in pattern match (m//) at chado-1.31/load/bin/gmod_bulk_load_gff3.pl <http://gmod_bulk_load_gff3.pl>  line 820, <GEN0> line 2.

------------- EXCEPTION: Bio::Root::Exception -------------

MSG: no cvterm for

STACK: Error::throw

STACK: Bio::Root::Root::throw /usr/local/share/perl5/Bio/Root/Root.pm:447

STACK: Bio::GMOD::DB::Adapter::get_type /usr/local/share/perl5/Bio/GMOD/DB/Adapter.pm:4629

STACK: chado-1.31/load/bin/gmod_bulk_load_gff3.pl:848 <http://gmod_bulk_load_gff3.pl:848> 

-----------------------------------------------------------

Abnormal termination, trying to clean up...

Attempting to clean up the loader temp table (so that --recreate_cache

won't be needed)...

Trying to remove the run lock (so that --remove_lock won't be needed)...

Exiting...

Thank you 

Miharimamy

De : Scott Cain <sc...@sc... <mailto:sc...@sc...> > 
Envoyé : jeudi 16 janvier 2020 22:31
À : RAJAONISON Andriamiharimamy <mir...@ya... <mailto:mir...@ya...> >
Cc : GMOD Schema/Chado List <gmo...@li... <mailto:gmo...@li...> >
Objet : Re: [Gmod-schema] [ISSUE] Load gff in chado

Hi Miharimamy,

Thanks for sending this report.  Generally, loading GFF into Chado can be difficult, as the perl-based loader that you are using can be quite particular about the format of the GFF and producers of GFF generally aren't so particular. Since the loader makes the (in my view, correct) decision to not load anything if it can't load everything in a file, it quits. So, taking each of the problems you found in order:

1. "dbxref=NCBI:" isn't a valid accession (presumably there should be something after the "NCBI:") and so the loader won't continue.  Two options here are to contact SGD and point out that there is a problem with their GFF, or delete the offending item and try again, which it looks like you did for item 2.

2. If the item Note is missing from the cvterm table, I think that probably means that you didn't install the schema using the "make" procedure that would have installed some necessary items in the cv and cvterm table.

3 and 4: Messages that srcfeatures can't be found mean that the feature on which the features in your GFF reside hasn't been defined yet (that is, the thing referred to in column 1 of the GFF doesn't exist). Frequently, creators of GFF don't define the reference sequence in the GFF for whatever reason (it's not required by the GFF3 spec, since it might be credibly defined elsewhere).  To define it in the GFF you have, add a line before anything else that looks something like this:

NZ_CP027390.1    .    chromosome   1     1234566  .    .    .  ID=NZ_CP027390.1;Name=NZ_CP027390.1

Where "NZ_CP027390.1" is the name of the reference sequence, "chromosome" is the Sequence Ontology term for the type of thing (other options would include "contig" and "supercontig"), and the "123456" is the length of the sequence.

Finally, I would add that using Chado through Tripal frequently makes life easier, though I think all of these issues would still have been problems (with the probable exception of item 2--Tripal would have initialized the "Note" item in cvterm I think).

Scott

On Thu, Jan 16, 2020 at 11:15 AM RAJAONISON Andriamiharimamy via Gmod-schema <gmo...@li... <mailto:gmo...@li...> > wrote:

Hi, 

Hi, 

Hope this mail will find you well. Send you my best wishes for this new year. 

I am reaching to you because I have issues loading GFF files into you Chado. I tried several files but none of them seems to work.

Thank you in advance for you help 

Miharimamy

1.	The S.cerevisiae example in gmod.org <http://gmod.org> 

Command : 

chado-1.31/load/bin/gmod_bulk_load_gff3.pl <http://gmod_bulk_load_gff3.pl>  --organism yeast --gfffile saccharomyces/saccharomyces_cerevisiae.gff.sorted --fastafile saccharomyces/saccharomyces_cerevisiae.gff.sorted.fasta

Output : 

------------- EXCEPTION: Bio::Root::Exception -------------

MSG: Error in line:

chrIII  SGD     chromosome      1       316620  .       .       .       ID=chrIII;dbxref=NCBI:;Name=chrIII

Dbxref value 'NCBI:' did not conform to GFF3 specification

STACK: Error::throw

STACK: Bio::Root::Root::throw /usr/local/share/perl5/Bio/Root/Root.pm:447

STACK: Bio::FeatureIO::gff::_handle_feature /usr/local/share/perl5/Bio/FeatureIO/gff.pm:659 <http://gff.pm:659> 

STACK: Bio::FeatureIO::gff::next_feature /usr/local/share/perl5/Bio/FeatureIO/gff.pm:187 <http://gff.pm:187> 

STACK: chado-1.31/load/bin/gmod_bulk_load_gff3.pl:782 <http://gmod_bulk_load_gff3.pl:782> 

-----------------------------------------------------------

2.	S.cerevisiae without dbxref

Command : 

chado-1.31/load/bin/gmod_bulk_load_gff3.pl <http://gmod_bulk_load_gff3.pl>  --organism yeast --gfffile saccharomyces/test_saccharomyces_ncbi --fastafile saccharomyces/saccharomyces_cerevisiae.gff.sorted.fasta

Output : 

------------- EXCEPTION: Bio::Root::Exception -------------

MSG: I couldn't find the 'Note' cvterm in the database;

Did you load the feature property controlled vocabulary?

STACK: Error::throw

STACK: Bio::Root::Root::throw /usr/local/share/perl5/Bio/Root/Root.pm:447

STACK: Bio::GMOD::DB::Adapter::handle_note /usr/local/share/perl5/Bio/GMOD/DB/Adapter.pm:3901

STACK: chado-1.31/load/bin/gmod_bulk_load_gff3.pl:974 <http://gmod_bulk_load_gff3.pl:974> 

3.	A GFF from NCBI

Command : 

chado-1.31/load/bin/gmod_bulk_load_gff3.pl <http://gmod_bulk_load_gff3.pl>  --organism bacteria --gfffile GCF_000008865.2/GCF_003018035.1_ASM301803v1_genomic.gff.sorted

Output :

Unable to find srcfeature NZ_CP027390.1 in the database.

Perhaps you need to rerun your data load with the '--recreate_cache' option. at /usr/local/share/perl5/Bio/GMOD/DB/Adapter.pm line 4605, <GEN0> line 2.

        Bio::GMOD::DB::Adapter::src_second_chance('Bio::GMOD::DB::Adapter=HASH(0x241a8d0)', 'Bio::SeqFeature::Annotated=HASH(0x26c61d0)') called at chado-1.31/load/bin/gmod_bulk_load_gff3.pl <http://gmod_bulk_load_gff3.pl>  line 851

4.	A GFF3 from prokka

Command :

chado-1.31/load/bin/gmod_bulk_load_gff3.pl <http://gmod_bulk_load_gff3.pl>  --organism bacteria -gfffile GFF3/GCF_000455285.gff.sorted --fastafile GFF3/GCF_000455285.gff.sorted.fasta

Output :

Unable to find srcfeature gnl|Prokka|GCF_000455285_1 in the database.

Perhaps you need to rerun your data load with the '--recreate_cache' option. at /usr/local/share/perl5/Bio/GMOD/DB/Adapter.pm line 4605, <GEN0> line 2.

        Bio::GMOD::DB::Adapter::src_second_chance('Bio::GMOD::DB::Adapter=HASH(0x33008f0)', 'Bio::SeqFeature::Annotated=HASH(0x35ac0a0)') called at chado-1.31/load/bin/gmod_bulk_load_gff3.pl <http://gmod_bulk_load_gff3.pl>  line 851

_______________________________________________
Gmod-schema mailing list
Gmo...@li... <mailto:Gmo...@li...> 
https://lists.sourceforge.net/lists/listinfo/gmod-schema

-- 

------------------------------------------------------------------------
Scott Cain, Ph. D.                                   scott at scottcain dot net
GMOD Coordinator (http://gmod.org/)                     216-392-3087
Ontario Institute for Cancer Research

-- 

------------------------------------------------------------------------
Scott Cain, Ph. D.                                   scott at scottcain dot net
GMOD Coordinator (http://gmod.org/)                     216-392-3087
Ontario Institute for Cancer Research

2002	Jan	Feb	Mar	Apr	May	Jun	Jul	Aug	Sep	Oct (28)	Nov (87)	Dec (16)
2003	Jan (109)	Feb (107)	Mar (117)	Apr (5)	May (156)	Jun (83)	Jul (86)	Aug (25)	Sep (17)	Oct (14)	Nov (82)	Dec (50)
2004	Jan (14)	Feb (75)	Mar (110)	Apr (83)	May (20)	Jun (36)	Jul (12)	Aug (37)	Sep (9)	Oct (11)	Nov (52)	Dec (68)
2005	Jan (46)	Feb (94)	Mar (68)	Apr (55)	May (67)	Jun (65)	Jul (67)	Aug (96)	Sep (79)	Oct (46)	Nov (24)	Dec (64)
2006	Jan (39)	Feb (31)	Mar (48)	Apr (58)	May (31)	Jun (57)	Jul (29)	Aug (40)	Sep (22)	Oct (31)	Nov (44)	Dec (51)
2007	Jan (103)	Feb (172)	Mar (59)	Apr (41)	May (33)	Jun (50)	Jul (60)	Aug (51)	Sep (21)	Oct (40)	Nov (89)	Dec (39)
2008	Jan (28)	Feb (20)	Mar (19)	Apr (29)	May (29)	Jun (24)	Jul (32)	Aug (16)	Sep (35)	Oct (23)	Nov (17)	Dec (19)
2009	Jan (4)	Feb (23)	Mar (16)	Apr (16)	May (38)	Jun (54)	Jul (18)	Aug (40)	Sep (58)	Oct (6)	Nov (8)	Dec (29)
2010	Jan (40)	Feb (40)	Mar (63)	Apr (95)	May (136)	Jun (58)	Jul (91)	Aug (55)	Sep (77)	Oct (52)	Nov (85)	Dec (37)
2011	Jan (22)	Feb (46)	Mar (73)	Apr (138)	May (75)	Jun (35)	Jul (41)	Aug (13)	Sep (13)	Oct (11)	Nov (21)	Dec (5)
2012	Jan (13)	Feb (34)	Mar (59)	Apr (4)	May (13)	Jun (1)	Jul (1)	Aug (1)	Sep (3)	Oct (2)	Nov (4)	Dec (1)
2013	Jan (18)	Feb (28)	Mar (19)	Apr (42)	May (43)	Jun (41)	Jul (41)	Aug (31)	Sep (6)	Oct (2)	Nov (2)	Dec (70)
2014	Jan (55)	Feb (98)	Mar (44)	Apr (40)	May (15)	Jun (18)	Jul (20)	Aug (1)	Sep (13)	Oct (3)	Nov (37)	Dec (85)
2015	Jan (16)	Feb (12)	Mar (16)	Apr (13)	May (16)	Jun (3)	Jul (23)	Aug	Sep	Oct	Nov (9)	Dec (2)
2016	Jan (12)	Feb (1)	Mar (9)	Apr (13)	May (4)	Jun (5)	Jul	Aug	Sep (10)	Oct (11)	Nov (1)	Dec
2017	Jan	Feb (1)	Mar (11)	Apr (8)	May	Jun (6)	Jul	Aug	Sep	Oct (3)	Nov (2)	Dec (1)
2018	Jan (6)	Feb (6)	Mar (3)	Apr (9)	May (3)	Jun	Jul	Aug (3)	Sep (8)	Oct (1)	Nov (1)	Dec (4)
2019	Jan (4)	Feb	Mar (1)	Apr	May (2)	Jun	Jul	Aug	Sep	Oct (2)	Nov (1)	Dec
2020	Jan (22)	Feb (4)	Mar	Apr	May	Jun (1)	Jul (2)	Aug (2)	Sep (1)	Oct	Nov	Dec (1)
2021	Jan	Feb	Mar	Apr	May (1)	Jun	Jul (2)	Aug (2)	Sep	Oct	Nov	Dec
2022	Jan (1)	Feb	Mar (1)	Apr	May	Jun	Jul	Aug (2)	Sep	Oct	Nov	Dec
2023	Jan	Feb	Mar (1)	Apr (1)	May (5)	Jun	Jul	Aug	Sep	Oct	Nov	Dec
2024	Jan	Feb	Mar	Apr	May	Jun	Jul (3)	Aug (3)	Sep	Oct	Nov	Dec

gmod-schema Mailing List for Generic Model Organism Database Project (Page 3)

gmod-schema — For discussion of GMOD schema development