You can subscribe to this list here.
2002 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
(28) |
Nov
(87) |
Dec
(16) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2003 |
Jan
(109) |
Feb
(107) |
Mar
(117) |
Apr
(5) |
May
(156) |
Jun
(83) |
Jul
(86) |
Aug
(25) |
Sep
(17) |
Oct
(14) |
Nov
(82) |
Dec
(50) |
2004 |
Jan
(14) |
Feb
(75) |
Mar
(110) |
Apr
(83) |
May
(20) |
Jun
(36) |
Jul
(12) |
Aug
(37) |
Sep
(9) |
Oct
(11) |
Nov
(52) |
Dec
(68) |
2005 |
Jan
(46) |
Feb
(94) |
Mar
(68) |
Apr
(55) |
May
(67) |
Jun
(65) |
Jul
(67) |
Aug
(96) |
Sep
(79) |
Oct
(46) |
Nov
(24) |
Dec
(64) |
2006 |
Jan
(39) |
Feb
(31) |
Mar
(48) |
Apr
(58) |
May
(31) |
Jun
(57) |
Jul
(29) |
Aug
(40) |
Sep
(22) |
Oct
(31) |
Nov
(44) |
Dec
(51) |
2007 |
Jan
(103) |
Feb
(172) |
Mar
(59) |
Apr
(41) |
May
(33) |
Jun
(50) |
Jul
(60) |
Aug
(51) |
Sep
(21) |
Oct
(40) |
Nov
(89) |
Dec
(39) |
2008 |
Jan
(28) |
Feb
(20) |
Mar
(19) |
Apr
(29) |
May
(29) |
Jun
(24) |
Jul
(32) |
Aug
(16) |
Sep
(35) |
Oct
(23) |
Nov
(17) |
Dec
(19) |
2009 |
Jan
(4) |
Feb
(23) |
Mar
(16) |
Apr
(16) |
May
(38) |
Jun
(54) |
Jul
(18) |
Aug
(40) |
Sep
(58) |
Oct
(6) |
Nov
(8) |
Dec
(29) |
2010 |
Jan
(40) |
Feb
(40) |
Mar
(63) |
Apr
(95) |
May
(136) |
Jun
(58) |
Jul
(91) |
Aug
(55) |
Sep
(77) |
Oct
(52) |
Nov
(85) |
Dec
(37) |
2011 |
Jan
(22) |
Feb
(46) |
Mar
(73) |
Apr
(138) |
May
(75) |
Jun
(35) |
Jul
(41) |
Aug
(13) |
Sep
(13) |
Oct
(11) |
Nov
(21) |
Dec
(5) |
2012 |
Jan
(13) |
Feb
(34) |
Mar
(59) |
Apr
(4) |
May
(13) |
Jun
(1) |
Jul
(1) |
Aug
(1) |
Sep
(3) |
Oct
(2) |
Nov
(4) |
Dec
(1) |
2013 |
Jan
(18) |
Feb
(28) |
Mar
(19) |
Apr
(42) |
May
(43) |
Jun
(41) |
Jul
(41) |
Aug
(31) |
Sep
(6) |
Oct
(2) |
Nov
(2) |
Dec
(70) |
2014 |
Jan
(55) |
Feb
(98) |
Mar
(44) |
Apr
(40) |
May
(15) |
Jun
(18) |
Jul
(20) |
Aug
(1) |
Sep
(13) |
Oct
(3) |
Nov
(37) |
Dec
(85) |
2015 |
Jan
(16) |
Feb
(12) |
Mar
(16) |
Apr
(13) |
May
(16) |
Jun
(3) |
Jul
(23) |
Aug
|
Sep
|
Oct
|
Nov
(9) |
Dec
(2) |
2016 |
Jan
(12) |
Feb
(1) |
Mar
(9) |
Apr
(13) |
May
(4) |
Jun
(5) |
Jul
|
Aug
|
Sep
(10) |
Oct
(11) |
Nov
(1) |
Dec
|
2017 |
Jan
|
Feb
(1) |
Mar
(11) |
Apr
(8) |
May
|
Jun
(6) |
Jul
|
Aug
|
Sep
|
Oct
(3) |
Nov
(2) |
Dec
(1) |
2018 |
Jan
(6) |
Feb
(6) |
Mar
(3) |
Apr
(9) |
May
(3) |
Jun
|
Jul
|
Aug
(3) |
Sep
(8) |
Oct
(1) |
Nov
(1) |
Dec
(4) |
2019 |
Jan
(4) |
Feb
|
Mar
(1) |
Apr
|
May
(2) |
Jun
|
Jul
|
Aug
|
Sep
|
Oct
(2) |
Nov
(1) |
Dec
|
2020 |
Jan
(22) |
Feb
(4) |
Mar
|
Apr
|
May
|
Jun
(1) |
Jul
(2) |
Aug
(2) |
Sep
(1) |
Oct
|
Nov
|
Dec
(1) |
2021 |
Jan
|
Feb
|
Mar
|
Apr
|
May
(1) |
Jun
|
Jul
(2) |
Aug
(2) |
Sep
|
Oct
|
Nov
|
Dec
|
2022 |
Jan
(1) |
Feb
|
Mar
(1) |
Apr
|
May
|
Jun
|
Jul
|
Aug
(2) |
Sep
|
Oct
|
Nov
|
Dec
|
2023 |
Jan
|
Feb
|
Mar
(1) |
Apr
(1) |
May
(5) |
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2024 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
(3) |
Aug
(3) |
Sep
|
Oct
|
Nov
|
Dec
|
From: RAJAONISON A. <mir...@ya...> - 2020-01-20 14:54:53
|
Hi Scott, You were right, the issue lied in the line 2. The field separator was 4 spaces instead of tabulation. I converted it and tried to load the file back but went back to the first error. Thank you for your time, Miharimamy Preparing data for inserting into the chado database (This may take a while ...) Unable to find srcfeature NZ_CP027391.1 in the database. Perhaps you need to rerun your data load with the '--recreate_cache' option. at /usr/local/share/perl5/Bio/GMOD/DB/Adapter.pm line 4605, <GEN0> line 5939. Bio::GMOD::DB::Adapter::src_second_chance('Bio::GMOD::DB::Adapter=HASH(0x297c120)', 'Bio::SeqFeature::Annotated=HASH(0x2bcd420)') called at chado-1.31/load/bin/gmod_bulk_load_gff3.pl line 851 Abnormal termination, trying to clean up... Attempting to clean up the loader temp table (so that --recreate_cache won't be needed)... Trying to remove the run lock (so that --remove_lock won't be needed)... Exiting... ##gff-version 3 NZ_CP027390.1 . chromosome 1 5901472 . . . ID=NZ_CP027390.1;Name=NZ_CP027390.1 NZ_CP027390.1 RefSeq gene 931 1197 . + . ID=gene0;Name=AX062_RS00010;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS00010 NZ_CP027390.1 RefSeq gene 1191 1577 . - . ID=gene1;Name=AX062_RS00015;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS00015 NZ_CP027390.1 RefSeq gene 10194 10736 . + . ID=gene10;Name=AX062_RS00060;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS00060 NZ_CP027390.1 RefSeq gene 80547 81860 . + . ID=gene100;Name=AX062_RS00510;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS00510 NZ_CP027390.1 RefSeq gene 918692 918997 . - . ID=gene1000;Name=AX062_RS05010;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS05010 NZ_CP027390.1 RefSeq gene 919105 919815 . + . ID=gene1001;Name=AX062_RS05015;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS05015 NZ_CP027390.1 RefSeq gene 919818 920378 . - . ID=gene1002;Name=AX062_RS05020;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS05020 NZ_CP027390.1 RefSeq gene 920413 920754 . - . ID=gene1003;Name=AX062_RS05025;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS05025 NZ_CP027390.1 RefSeq gene 920889 921215 . + . ID=gene1004;Name=AX062_RS05030;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS05030 NZ_CP027390.1 RefSeq gene 921252 921440 . + . ID=gene1005;Name=AX062_RS05035;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS05035 NZ_CP027390.1 RefSeq gene 921421 922635 . + . ID=gene1006;Name=AX062_RS05040;gbkey=Gene;gene_biotype=protein_coding;locus_tag=AX062_RS05040 De : Scott Cain <sc...@sc...> Envoyé : vendredi 17 janvier 2020 21:49 À : RAJAONISON Andriamiharimamy <mir...@ya...> Cc : GMOD Schema/Chado List <gmo...@li...> Objet : Re: [Gmod-schema] [ISSUE] Load gff in chado Hi Miharimamy, The end of the error message where it says "line 2" I'm pretty sure means the problem is in line 2 of your GFF file. Is that the line you added? What does it look like? Scott On Fri, Jan 17, 2020 at 5:57 AM RAJAONISON Andriamiharimamy <mir...@ya... <mailto:mir...@ya...> > wrote: Hi Scott, Thank you again for your help because adding the parent feature removes the srcfeature related error. Yet I get another issue that I don’t understand : --------------------- WARNING --------------------- MSG: Can not set Bio::Location::Simple::end() equal to start; start not set --------------------------------------------------- Use of uninitialized value $featuretype in pattern match (m//) at chado-1.31/load/bin/gmod_bulk_load_gff3.pl <http://gmod_bulk_load_gff3.pl> line 819, <GEN0> line 2. Use of uninitialized value $featuretype in pattern match (m//) at chado-1.31/load/bin/gmod_bulk_load_gff3.pl <http://gmod_bulk_load_gff3.pl> line 820, <GEN0> line 2. ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: no cvterm for STACK: Error::throw STACK: Bio::Root::Root::throw /usr/local/share/perl5/Bio/Root/Root.pm:447 STACK: Bio::GMOD::DB::Adapter::get_type /usr/local/share/perl5/Bio/GMOD/DB/Adapter.pm:4629 STACK: chado-1.31/load/bin/gmod_bulk_load_gff3.pl:848 <http://gmod_bulk_load_gff3.pl:848> ----------------------------------------------------------- Abnormal termination, trying to clean up... Attempting to clean up the loader temp table (so that --recreate_cache won't be needed)... Trying to remove the run lock (so that --remove_lock won't be needed)... Exiting... Thank you Miharimamy De : Scott Cain <sc...@sc... <mailto:sc...@sc...> > Envoyé : jeudi 16 janvier 2020 22:31 À : RAJAONISON Andriamiharimamy <mir...@ya... <mailto:mir...@ya...> > Cc : GMOD Schema/Chado List <gmo...@li... <mailto:gmo...@li...> > Objet : Re: [Gmod-schema] [ISSUE] Load gff in chado Hi Miharimamy, Thanks for sending this report. Generally, loading GFF into Chado can be difficult, as the perl-based loader that you are using can be quite particular about the format of the GFF and producers of GFF generally aren't so particular. Since the loader makes the (in my view, correct) decision to not load anything if it can't load everything in a file, it quits. So, taking each of the problems you found in order: 1. "dbxref=NCBI:" isn't a valid accession (presumably there should be something after the "NCBI:") and so the loader won't continue. Two options here are to contact SGD and point out that there is a problem with their GFF, or delete the offending item and try again, which it looks like you did for item 2. 2. If the item Note is missing from the cvterm table, I think that probably means that you didn't install the schema using the "make" procedure that would have installed some necessary items in the cv and cvterm table. 3 and 4: Messages that srcfeatures can't be found mean that the feature on which the features in your GFF reside hasn't been defined yet (that is, the thing referred to in column 1 of the GFF doesn't exist). Frequently, creators of GFF don't define the reference sequence in the GFF for whatever reason (it's not required by the GFF3 spec, since it might be credibly defined elsewhere). To define it in the GFF you have, add a line before anything else that looks something like this: NZ_CP027390.1 . chromosome 1 1234566 . . . ID=NZ_CP027390.1;Name=NZ_CP027390.1 Where "NZ_CP027390.1" is the name of the reference sequence, "chromosome" is the Sequence Ontology term for the type of thing (other options would include "contig" and "supercontig"), and the "123456" is the length of the sequence. Finally, I would add that using Chado through Tripal frequently makes life easier, though I think all of these issues would still have been problems (with the probable exception of item 2--Tripal would have initialized the "Note" item in cvterm I think). Scott On Thu, Jan 16, 2020 at 11:15 AM RAJAONISON Andriamiharimamy via Gmod-schema <gmo...@li... <mailto:gmo...@li...> > wrote: Hi, Hi, Hope this mail will find you well. Send you my best wishes for this new year. I am reaching to you because I have issues loading GFF files into you Chado. I tried several files but none of them seems to work. Thank you in advance for you help Miharimamy 1. The S.cerevisiae example in gmod.org <http://gmod.org> Command : chado-1.31/load/bin/gmod_bulk_load_gff3.pl <http://gmod_bulk_load_gff3.pl> --organism yeast --gfffile saccharomyces/saccharomyces_cerevisiae.gff.sorted --fastafile saccharomyces/saccharomyces_cerevisiae.gff.sorted.fasta Output : ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: Error in line: chrIII SGD chromosome 1 316620 . . . ID=chrIII;dbxref=NCBI:;Name=chrIII Dbxref value 'NCBI:' did not conform to GFF3 specification STACK: Error::throw STACK: Bio::Root::Root::throw /usr/local/share/perl5/Bio/Root/Root.pm:447 STACK: Bio::FeatureIO::gff::_handle_feature /usr/local/share/perl5/Bio/FeatureIO/gff.pm:659 <http://gff.pm:659> STACK: Bio::FeatureIO::gff::next_feature /usr/local/share/perl5/Bio/FeatureIO/gff.pm:187 <http://gff.pm:187> STACK: chado-1.31/load/bin/gmod_bulk_load_gff3.pl:782 <http://gmod_bulk_load_gff3.pl:782> ----------------------------------------------------------- 2. S.cerevisiae without dbxref Command : chado-1.31/load/bin/gmod_bulk_load_gff3.pl <http://gmod_bulk_load_gff3.pl> --organism yeast --gfffile saccharomyces/test_saccharomyces_ncbi --fastafile saccharomyces/saccharomyces_cerevisiae.gff.sorted.fasta Output : ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: I couldn't find the 'Note' cvterm in the database; Did you load the feature property controlled vocabulary? STACK: Error::throw STACK: Bio::Root::Root::throw /usr/local/share/perl5/Bio/Root/Root.pm:447 STACK: Bio::GMOD::DB::Adapter::handle_note /usr/local/share/perl5/Bio/GMOD/DB/Adapter.pm:3901 STACK: chado-1.31/load/bin/gmod_bulk_load_gff3.pl:974 <http://gmod_bulk_load_gff3.pl:974> 3. A GFF from NCBI Command : chado-1.31/load/bin/gmod_bulk_load_gff3.pl <http://gmod_bulk_load_gff3.pl> --organism bacteria --gfffile GCF_000008865.2/GCF_003018035.1_ASM301803v1_genomic.gff.sorted Output : Unable to find srcfeature NZ_CP027390.1 in the database. Perhaps you need to rerun your data load with the '--recreate_cache' option. at /usr/local/share/perl5/Bio/GMOD/DB/Adapter.pm line 4605, <GEN0> line 2. Bio::GMOD::DB::Adapter::src_second_chance('Bio::GMOD::DB::Adapter=HASH(0x241a8d0)', 'Bio::SeqFeature::Annotated=HASH(0x26c61d0)') called at chado-1.31/load/bin/gmod_bulk_load_gff3.pl <http://gmod_bulk_load_gff3.pl> line 851 4. A GFF3 from prokka Command : chado-1.31/load/bin/gmod_bulk_load_gff3.pl <http://gmod_bulk_load_gff3.pl> --organism bacteria -gfffile GFF3/GCF_000455285.gff.sorted --fastafile GFF3/GCF_000455285.gff.sorted.fasta Output : Unable to find srcfeature gnl|Prokka|GCF_000455285_1 in the database. Perhaps you need to rerun your data load with the '--recreate_cache' option. at /usr/local/share/perl5/Bio/GMOD/DB/Adapter.pm line 4605, <GEN0> line 2. Bio::GMOD::DB::Adapter::src_second_chance('Bio::GMOD::DB::Adapter=HASH(0x33008f0)', 'Bio::SeqFeature::Annotated=HASH(0x35ac0a0)') called at chado-1.31/load/bin/gmod_bulk_load_gff3.pl <http://gmod_bulk_load_gff3.pl> line 851 _______________________________________________ Gmod-schema mailing list Gmo...@li... <mailto:Gmo...@li...> https://lists.sourceforge.net/lists/listinfo/gmod-schema -- ------------------------------------------------------------------------ Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research -- ------------------------------------------------------------------------ Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research |
From: Scott C. <sc...@sc...> - 2020-01-17 18:49:42
|
Hi Miharimamy, The end of the error message where it says "line 2" I'm pretty sure means the problem is in line 2 of your GFF file. Is that the line you added? What does it look like? Scott On Fri, Jan 17, 2020 at 5:57 AM RAJAONISON Andriamiharimamy < mir...@ya...> wrote: > Hi Scott, > > > > Thank you again for your help because adding the parent feature removes > the srcfeature related error. > > Yet I get another issue that I don’t understand : > > > > --------------------- WARNING --------------------- > > MSG: Can not set Bio::Location::Simple::end() equal to start; start not set > > --------------------------------------------------- > > Use of uninitialized value $featuretype in pattern match (m//) at > chado-1.31/load/bin/gmod_bulk_load_gff3.pl line 819, <GEN0> line 2. > > Use of uninitialized value $featuretype in pattern match (m//) at > chado-1.31/load/bin/gmod_bulk_load_gff3.pl line 820, <GEN0> line 2. > > > > ------------- EXCEPTION: Bio::Root::Exception ------------- > > MSG: no cvterm for > > STACK: Error::throw > > STACK: Bio::Root::Root::throw /usr/local/share/perl5/Bio/Root/Root.pm:447 > > STACK: Bio::GMOD::DB::Adapter::get_type > /usr/local/share/perl5/Bio/GMOD/DB/Adapter.pm:4629 > > STACK: chado-1.31/load/bin/gmod_bulk_load_gff3.pl:848 > > ----------------------------------------------------------- > > > > Abnormal termination, trying to clean up... > > > > Attempting to clean up the loader temp table (so that --recreate_cache > > won't be needed)... > > Trying to remove the run lock (so that --remove_lock won't be needed)... > > Exiting... > > > > Thank you > > > > Miharimamy > > > > > > *De :* Scott Cain <sc...@sc...> > *Envoyé :* jeudi 16 janvier 2020 22:31 > *À :* RAJAONISON Andriamiharimamy <mir...@ya...> > *Cc :* GMOD Schema/Chado List <gmo...@li...> > *Objet :* Re: [Gmod-schema] [ISSUE] Load gff in chado > > > > Hi Miharimamy, > > > > Thanks for sending this report. Generally, loading GFF into Chado can be > difficult, as the perl-based loader that you are using can be quite > particular about the format of the GFF and producers of GFF generally > aren't so particular. Since the loader makes the (in my view, correct) > decision to not load anything if it can't load everything in a file, it > quits. So, taking each of the problems you found in order: > > > > 1. "dbxref=NCBI:" isn't a valid accession (presumably there should be > something after the "NCBI:") and so the loader won't continue. Two options > here are to contact SGD and point out that there is a problem with their > GFF, or delete the offending item and try again, which it looks like you > did for item 2. > > > > 2. If the item Note is missing from the cvterm table, I think that > probably means that you didn't install the schema using the "make" > procedure that would have installed some necessary items in the cv and > cvterm table. > > > > 3 and 4: Messages that srcfeatures can't be found mean that the feature on > which the features in your GFF reside hasn't been defined yet (that is, the > thing referred to in column 1 of the GFF doesn't exist). Frequently, > creators of GFF don't define the reference sequence in the GFF for whatever > reason (it's not required by the GFF3 spec, since it might be credibly > defined elsewhere). To define it in the GFF you have, add a line before > anything else that looks something like this: > > > > NZ_CP027390.1 . chromosome 1 1234566 . . . > ID=NZ_CP027390.1;Name=NZ_CP027390.1 > > > > Where "NZ_CP027390.1" is the name of the reference sequence, "chromosome" > is the Sequence Ontology term for the type of thing (other options would > include "contig" and "supercontig"), and the "123456" is the length of the > sequence. > > > > Finally, I would add that using Chado through Tripal frequently makes life > easier, though I think all of these issues would still have been problems > (with the probable exception of item 2--Tripal would have initialized the > "Note" item in cvterm I think). > > > > Scott > > > > > > On Thu, Jan 16, 2020 at 11:15 AM RAJAONISON Andriamiharimamy via > Gmod-schema <gmo...@li...> wrote: > > Hi, > > > > Hi, > > Hope this mail will find you well. Send you my best wishes for this new > year. > > I am reaching to you because I have issues loading GFF files into you > Chado. I tried several files but none of them seems to work. > > > > Thank you in advance for you help > > > > Miharimamy > > > > 1. The S.cerevisiae example in gmod.org > > *Command : * > > chado-1.31/load/bin/gmod_bulk_load_gff3.pl --organism yeast --gfffile > saccharomyces/saccharomyces_cerevisiae.gff.sorted --fastafile > saccharomyces/saccharomyces_cerevisiae.gff.sorted.fasta > > > > *Output : * > > ------------- EXCEPTION: Bio::Root::Exception ------------- > > MSG: Error in line: > > chrIII SGD chromosome 1 316620 . . . > ID=chrIII;dbxref=NCBI:;Name=chrIII > > > > Dbxref value 'NCBI:' did not conform to GFF3 specification > > STACK: Error::throw > > STACK: Bio::Root::Root::throw /usr/local/share/perl5/Bio/Root/Root.pm:447 > > STACK: Bio::FeatureIO::gff::_handle_feature > /usr/local/share/perl5/Bio/FeatureIO/gff.pm:659 > > STACK: Bio::FeatureIO::gff::next_feature > /usr/local/share/perl5/Bio/FeatureIO/gff.pm:187 > > STACK: chado-1.31/load/bin/gmod_bulk_load_gff3.pl:782 > > ----------------------------------------------------------- > > 1. S.cerevisiae without dbxref > > *Command : * > > chado-1.31/load/bin/gmod_bulk_load_gff3.pl --organism yeast --gfffile > saccharomyces/test_saccharomyces_ncbi --fastafile > saccharomyces/saccharomyces_cerevisiae.gff.sorted.fasta > > > > *Output : * > > ------------- EXCEPTION: Bio::Root::Exception ------------- > > MSG: I couldn't find the 'Note' cvterm in the database; > > Did you load the feature property controlled vocabulary? > > STACK: Error::throw > > STACK: Bio::Root::Root::throw /usr/local/share/perl5/Bio/Root/Root.pm:447 > > STACK: Bio::GMOD::DB::Adapter::handle_note > /usr/local/share/perl5/Bio/GMOD/DB/Adapter.pm:3901 > > STACK: chado-1.31/load/bin/gmod_bulk_load_gff3.pl:974 > > > > 1. A GFF from NCBI > > *Command : * > > chado-1.31/load/bin/gmod_bulk_load_gff3.pl --organism bacteria --gfffile > GCF_000008865.2/GCF_003018035.1_ASM301803v1_genomic.gff.sorted > > > > *Output :* > > Unable to find srcfeature NZ_CP027390.1 in the database. > > Perhaps you need to rerun your data load with the '--recreate_cache' > option. at /usr/local/share/perl5/Bio/GMOD/DB/Adapter.pm line 4605, <GEN0> > line 2. > > > Bio::GMOD::DB::Adapter::src_second_chance('Bio::GMOD::DB::Adapter=HASH(0x241a8d0)', > 'Bio::SeqFeature::Annotated=HASH(0x26c61d0)') called at chado-1.31/load/bin/ > gmod_bulk_load_gff3.pl line 851 > > > > 1. A GFF3 from prokka > > *Command :* > > chado-1.31/load/bin/gmod_bulk_load_gff3.pl --organism bacteria -gfffile > GFF3/GCF_000455285.gff.sorted --fastafile > GFF3/GCF_000455285.gff.sorted.fasta > > *Output :* > > Unable to find srcfeature gnl|Prokka|GCF_000455285_1 in the database. > > Perhaps you need to rerun your data load with the '--recreate_cache' > option. at /usr/local/share/perl5/Bio/GMOD/DB/Adapter.pm line 4605, <GEN0> > line 2. > > > Bio::GMOD::DB::Adapter::src_second_chance('Bio::GMOD::DB::Adapter=HASH(0x33008f0)', > 'Bio::SeqFeature::Annotated=HASH(0x35ac0a0)') called at chado-1.31/load/bin/ > gmod_bulk_load_gff3.pl line 851 > > > > > > > > > > _______________________________________________ > Gmod-schema mailing list > Gmo...@li... > https://lists.sourceforge.net/lists/listinfo/gmod-schema > > > > -- > > ------------------------------------------------------------------------ > Scott Cain, Ph. D. scott at scottcain > dot net > GMOD Coordinator (http://gmod.org/) 216-392-3087 > Ontario Institute for Cancer Research > -- ------------------------------------------------------------------------ Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research |
From: RAJAONISON A. <mir...@ya...> - 2020-01-17 13:57:46
|
Hi Scott, Thank you again for your help because adding the parent feature removes the srcfeature related error. Yet I get another issue that I don’t understand : --------------------- WARNING --------------------- MSG: Can not set Bio::Location::Simple::end() equal to start; start not set --------------------------------------------------- Use of uninitialized value $featuretype in pattern match (m//) at chado-1.31/load/bin/gmod_bulk_load_gff3.pl line 819, <GEN0> line 2. Use of uninitialized value $featuretype in pattern match (m//) at chado-1.31/load/bin/gmod_bulk_load_gff3.pl line 820, <GEN0> line 2. ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: no cvterm for STACK: Error::throw STACK: Bio::Root::Root::throw /usr/local/share/perl5/Bio/Root/Root.pm:447 STACK: Bio::GMOD::DB::Adapter::get_type /usr/local/share/perl5/Bio/GMOD/DB/Adapter.pm:4629 STACK: chado-1.31/load/bin/gmod_bulk_load_gff3.pl:848 ----------------------------------------------------------- Abnormal termination, trying to clean up... Attempting to clean up the loader temp table (so that --recreate_cache won't be needed)... Trying to remove the run lock (so that --remove_lock won't be needed)... Exiting... Thank you Miharimamy De : Scott Cain <sc...@sc...> Envoyé : jeudi 16 janvier 2020 22:31 À : RAJAONISON Andriamiharimamy <mir...@ya...> Cc : GMOD Schema/Chado List <gmo...@li...> Objet : Re: [Gmod-schema] [ISSUE] Load gff in chado Hi Miharimamy, Thanks for sending this report. Generally, loading GFF into Chado can be difficult, as the perl-based loader that you are using can be quite particular about the format of the GFF and producers of GFF generally aren't so particular. Since the loader makes the (in my view, correct) decision to not load anything if it can't load everything in a file, it quits. So, taking each of the problems you found in order: 1. "dbxref=NCBI:" isn't a valid accession (presumably there should be something after the "NCBI:") and so the loader won't continue. Two options here are to contact SGD and point out that there is a problem with their GFF, or delete the offending item and try again, which it looks like you did for item 2. 2. If the item Note is missing from the cvterm table, I think that probably means that you didn't install the schema using the "make" procedure that would have installed some necessary items in the cv and cvterm table. 3 and 4: Messages that srcfeatures can't be found mean that the feature on which the features in your GFF reside hasn't been defined yet (that is, the thing referred to in column 1 of the GFF doesn't exist). Frequently, creators of GFF don't define the reference sequence in the GFF for whatever reason (it's not required by the GFF3 spec, since it might be credibly defined elsewhere). To define it in the GFF you have, add a line before anything else that looks something like this: NZ_CP027390.1 . chromosome 1 1234566 . . . ID=NZ_CP027390.1;Name=NZ_CP027390.1 Where "NZ_CP027390.1" is the name of the reference sequence, "chromosome" is the Sequence Ontology term for the type of thing (other options would include "contig" and "supercontig"), and the "123456" is the length of the sequence. Finally, I would add that using Chado through Tripal frequently makes life easier, though I think all of these issues would still have been problems (with the probable exception of item 2--Tripal would have initialized the "Note" item in cvterm I think). Scott On Thu, Jan 16, 2020 at 11:15 AM RAJAONISON Andriamiharimamy via Gmod-schema <gmo...@li... <mailto:gmo...@li...> > wrote: Hi, Hi, Hope this mail will find you well. Send you my best wishes for this new year. I am reaching to you because I have issues loading GFF files into you Chado. I tried several files but none of them seems to work. Thank you in advance for you help Miharimamy 1. The S.cerevisiae example in gmod.org <http://gmod.org> Command : chado-1.31/load/bin/gmod_bulk_load_gff3.pl <http://gmod_bulk_load_gff3.pl> --organism yeast --gfffile saccharomyces/saccharomyces_cerevisiae.gff.sorted --fastafile saccharomyces/saccharomyces_cerevisiae.gff.sorted.fasta Output : ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: Error in line: chrIII SGD chromosome 1 316620 . . . ID=chrIII;dbxref=NCBI:;Name=chrIII Dbxref value 'NCBI:' did not conform to GFF3 specification STACK: Error::throw STACK: Bio::Root::Root::throw /usr/local/share/perl5/Bio/Root/Root.pm:447 STACK: Bio::FeatureIO::gff::_handle_feature /usr/local/share/perl5/Bio/FeatureIO/gff.pm:659 <http://gff.pm:659> STACK: Bio::FeatureIO::gff::next_feature /usr/local/share/perl5/Bio/FeatureIO/gff.pm:187 <http://gff.pm:187> STACK: chado-1.31/load/bin/gmod_bulk_load_gff3.pl:782 <http://gmod_bulk_load_gff3.pl:782> ----------------------------------------------------------- 2. S.cerevisiae without dbxref Command : chado-1.31/load/bin/gmod_bulk_load_gff3.pl <http://gmod_bulk_load_gff3.pl> --organism yeast --gfffile saccharomyces/test_saccharomyces_ncbi --fastafile saccharomyces/saccharomyces_cerevisiae.gff.sorted.fasta Output : ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: I couldn't find the 'Note' cvterm in the database; Did you load the feature property controlled vocabulary? STACK: Error::throw STACK: Bio::Root::Root::throw /usr/local/share/perl5/Bio/Root/Root.pm:447 STACK: Bio::GMOD::DB::Adapter::handle_note /usr/local/share/perl5/Bio/GMOD/DB/Adapter.pm:3901 STACK: chado-1.31/load/bin/gmod_bulk_load_gff3.pl:974 <http://gmod_bulk_load_gff3.pl:974> 3. A GFF from NCBI Command : chado-1.31/load/bin/gmod_bulk_load_gff3.pl <http://gmod_bulk_load_gff3.pl> --organism bacteria --gfffile GCF_000008865.2/GCF_003018035.1_ASM301803v1_genomic.gff.sorted Output : Unable to find srcfeature NZ_CP027390.1 in the database. Perhaps you need to rerun your data load with the '--recreate_cache' option. at /usr/local/share/perl5/Bio/GMOD/DB/Adapter.pm line 4605, <GEN0> line 2. Bio::GMOD::DB::Adapter::src_second_chance('Bio::GMOD::DB::Adapter=HASH(0x241a8d0)', 'Bio::SeqFeature::Annotated=HASH(0x26c61d0)') called at chado-1.31/load/bin/gmod_bulk_load_gff3.pl <http://gmod_bulk_load_gff3.pl> line 851 4. A GFF3 from prokka Command : chado-1.31/load/bin/gmod_bulk_load_gff3.pl <http://gmod_bulk_load_gff3.pl> --organism bacteria -gfffile GFF3/GCF_000455285.gff.sorted --fastafile GFF3/GCF_000455285.gff.sorted.fasta Output : Unable to find srcfeature gnl|Prokka|GCF_000455285_1 in the database. Perhaps you need to rerun your data load with the '--recreate_cache' option. at /usr/local/share/perl5/Bio/GMOD/DB/Adapter.pm line 4605, <GEN0> line 2. Bio::GMOD::DB::Adapter::src_second_chance('Bio::GMOD::DB::Adapter=HASH(0x33008f0)', 'Bio::SeqFeature::Annotated=HASH(0x35ac0a0)') called at chado-1.31/load/bin/gmod_bulk_load_gff3.pl <http://gmod_bulk_load_gff3.pl> line 851 _______________________________________________ Gmod-schema mailing list Gmo...@li... <mailto:Gmo...@li...> https://lists.sourceforge.net/lists/listinfo/gmod-schema -- ------------------------------------------------------------------------ Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research |
From: Scott C. <sc...@sc...> - 2020-01-16 21:16:52
|
Hi Miharimamy, Thanks for sending this report. Generally, loading GFF into Chado can be difficult, as the perl-based loader that you are using can be quite particular about the format of the GFF and producers of GFF generally aren't so particular. Since the loader makes the (in my view, correct) decision to not load anything if it can't load everything in a file, it quits. So, taking each of the problems you found in order: 1. "dbxref=NCBI:" isn't a valid accession (presumably there should be something after the "NCBI:") and so the loader won't continue. Two options here are to contact SGD and point out that there is a problem with their GFF, or delete the offending item and try again, which it looks like you did for item 2. 2. If the item Note is missing from the cvterm table, I think that probably means that you didn't install the schema using the "make" procedure that would have installed some necessary items in the cv and cvterm table. 3 and 4: Messages that srcfeatures can't be found mean that the feature on which the features in your GFF reside hasn't been defined yet (that is, the thing referred to in column 1 of the GFF doesn't exist). Frequently, creators of GFF don't define the reference sequence in the GFF for whatever reason (it's not required by the GFF3 spec, since it might be credibly defined elsewhere). To define it in the GFF you have, add a line before anything else that looks something like this: NZ_CP027390.1 . chromosome 1 1234566 . . . ID= NZ_CP027390.1;Name=NZ_CP027390.1 Where "NZ_CP027390.1" is the name of the reference sequence, "chromosome" is the Sequence Ontology term for the type of thing (other options would include "contig" and "supercontig"), and the "123456" is the length of the sequence. Finally, I would add that using Chado through Tripal frequently makes life easier, though I think all of these issues would still have been problems (with the probable exception of item 2--Tripal would have initialized the "Note" item in cvterm I think). Scott On Thu, Jan 16, 2020 at 11:15 AM RAJAONISON Andriamiharimamy via Gmod-schema <gmo...@li...> wrote: > Hi, > > > > Hi, > > Hope this mail will find you well. Send you my best wishes for this new > year. > > I am reaching to you because I have issues loading GFF files into you > Chado. I tried several files but none of them seems to work. > > > > Thank you in advance for you help > > > > Miharimamy > > > > 1. The S.cerevisiae example in gmod.org > > *Command : * > > chado-1.31/load/bin/gmod_bulk_load_gff3.pl --organism yeast --gfffile > saccharomyces/saccharomyces_cerevisiae.gff.sorted --fastafile > saccharomyces/saccharomyces_cerevisiae.gff.sorted.fasta > > > > *Output : * > > ------------- EXCEPTION: Bio::Root::Exception ------------- > > MSG: Error in line: > > chrIII SGD chromosome 1 316620 . . . > ID=chrIII;dbxref=NCBI:;Name=chrIII > > > > Dbxref value 'NCBI:' did not conform to GFF3 specification > > STACK: Error::throw > > STACK: Bio::Root::Root::throw /usr/local/share/perl5/Bio/Root/Root.pm:447 > > STACK: Bio::FeatureIO::gff::_handle_feature > /usr/local/share/perl5/Bio/FeatureIO/gff.pm:659 > > STACK: Bio::FeatureIO::gff::next_feature > /usr/local/share/perl5/Bio/FeatureIO/gff.pm:187 > > STACK: chado-1.31/load/bin/gmod_bulk_load_gff3.pl:782 > > ----------------------------------------------------------- > > 1. S.cerevisiae without dbxref > > *Command : * > > chado-1.31/load/bin/gmod_bulk_load_gff3.pl --organism yeast --gfffile > saccharomyces/test_saccharomyces_ncbi --fastafile > saccharomyces/saccharomyces_cerevisiae.gff.sorted.fasta > > > > *Output : * > > ------------- EXCEPTION: Bio::Root::Exception ------------- > > MSG: I couldn't find the 'Note' cvterm in the database; > > Did you load the feature property controlled vocabulary? > > STACK: Error::throw > > STACK: Bio::Root::Root::throw /usr/local/share/perl5/Bio/Root/Root.pm:447 > > STACK: Bio::GMOD::DB::Adapter::handle_note > /usr/local/share/perl5/Bio/GMOD/DB/Adapter.pm:3901 > > STACK: chado-1.31/load/bin/gmod_bulk_load_gff3.pl:974 > > > > 1. A GFF from NCBI > > *Command : * > > chado-1.31/load/bin/gmod_bulk_load_gff3.pl --organism bacteria --gfffile > GCF_000008865.2/GCF_003018035.1_ASM301803v1_genomic.gff.sorted > > > > *Output :* > > Unable to find srcfeature NZ_CP027390.1 in the database. > > Perhaps you need to rerun your data load with the '--recreate_cache' > option. at /usr/local/share/perl5/Bio/GMOD/DB/Adapter.pm line 4605, <GEN0> > line 2. > > > Bio::GMOD::DB::Adapter::src_second_chance('Bio::GMOD::DB::Adapter=HASH(0x241a8d0)', > 'Bio::SeqFeature::Annotated=HASH(0x26c61d0)') called at chado-1.31/load/bin/ > gmod_bulk_load_gff3.pl line 851 > > > > 1. A GFF3 from prokka > > *Command :* > > chado-1.31/load/bin/gmod_bulk_load_gff3.pl --organism bacteria -gfffile > GFF3/GCF_000455285.gff.sorted --fastafile > GFF3/GCF_000455285.gff.sorted.fasta > > *Output :* > > Unable to find srcfeature gnl|Prokka|GCF_000455285_1 in the database. > > Perhaps you need to rerun your data load with the '--recreate_cache' > option. at /usr/local/share/perl5/Bio/GMOD/DB/Adapter.pm line 4605, <GEN0> > line 2. > > > Bio::GMOD::DB::Adapter::src_second_chance('Bio::GMOD::DB::Adapter=HASH(0x33008f0)', > 'Bio::SeqFeature::Annotated=HASH(0x35ac0a0)') called at chado-1.31/load/bin/ > gmod_bulk_load_gff3.pl line 851 > > > > > > > > > _______________________________________________ > Gmod-schema mailing list > Gmo...@li... > https://lists.sourceforge.net/lists/listinfo/gmod-schema > -- ------------------------------------------------------------------------ Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research |
From: RAJAONISON A. <mir...@ya...> - 2020-01-15 11:30:32
|
Hi, Hi, Hope this mail will find you well. Send you my best wishes for this new year. I am reaching to you because I have issues loading GFF files into you Chado. I tried several files but none of them seems to work. Thank you in advance for you help Miharimamy 1. The S.cerevisiae example in gmod.org Command : chado-1.31/load/bin/gmod_bulk_load_gff3.pl --organism yeast --gfffile saccharomyces/saccharomyces_cerevisiae.gff.sorted --fastafile saccharomyces/saccharomyces_cerevisiae.gff.sorted.fasta Output : ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: Error in line: chrIII SGD chromosome 1 316620 . . . ID=chrIII;dbxref=NCBI:;Name=chrIII Dbxref value 'NCBI:' did not conform to GFF3 specification STACK: Error::throw STACK: Bio::Root::Root::throw /usr/local/share/perl5/Bio/Root/Root.pm:447 STACK: Bio::FeatureIO::gff::_handle_feature /usr/local/share/perl5/Bio/FeatureIO/gff.pm:659 STACK: Bio::FeatureIO::gff::next_feature /usr/local/share/perl5/Bio/FeatureIO/gff.pm:187 STACK: chado-1.31/load/bin/gmod_bulk_load_gff3.pl:782 ----------------------------------------------------------- 2. S.cerevisiae without dbxref Command : chado-1.31/load/bin/gmod_bulk_load_gff3.pl --organism yeast --gfffile saccharomyces/test_saccharomyces_ncbi --fastafile saccharomyces/saccharomyces_cerevisiae.gff.sorted.fasta Output : ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: I couldn't find the 'Note' cvterm in the database; Did you load the feature property controlled vocabulary? STACK: Error::throw STACK: Bio::Root::Root::throw /usr/local/share/perl5/Bio/Root/Root.pm:447 STACK: Bio::GMOD::DB::Adapter::handle_note /usr/local/share/perl5/Bio/GMOD/DB/Adapter.pm:3901 STACK: chado-1.31/load/bin/gmod_bulk_load_gff3.pl:974 3. A GFF from NCBI Command : chado-1.31/load/bin/gmod_bulk_load_gff3.pl --organism bacteria --gfffile GCF_000008865.2/GCF_003018035.1_ASM301803v1_genomic.gff.sorted Output : Unable to find srcfeature NZ_CP027390.1 in the database. Perhaps you need to rerun your data load with the '--recreate_cache' option. at /usr/local/share/perl5/Bio/GMOD/DB/Adapter.pm line 4605, <GEN0> line 2. Bio::GMOD::DB::Adapter::src_second_chance('Bio::GMOD::DB::Adapter=HASH(0x241 a8d0)', 'Bio::SeqFeature::Annotated=HASH(0x26c61d0)') called at chado-1.31/load/bin/gmod_bulk_load_gff3.pl line 851 4. A GFF3 from prokka Command : chado-1.31/load/bin/gmod_bulk_load_gff3.pl --organism bacteria -gfffile GFF3/GCF_000455285.gff.sorted --fastafile GFF3/GCF_000455285.gff.sorted.fasta Output : Unable to find srcfeature gnl|Prokka|GCF_000455285_1 in the database. Perhaps you need to rerun your data load with the '--recreate_cache' option. at /usr/local/share/perl5/Bio/GMOD/DB/Adapter.pm line 4605, <GEN0> line 2. Bio::GMOD::DB::Adapter::src_second_chance('Bio::GMOD::DB::Adapter=HASH(0x330 08f0)', 'Bio::SeqFeature::Annotated=HASH(0x35ac0a0)') called at chado-1.31/load/bin/gmod_bulk_load_gff3.pl line 851 |
From: Scott C. <sc...@sc...> - 2019-11-19 03:55:22
|
Hello, As announced last month, there will be a GMOD codefest held before the Plant and Animal Genome meeting in San Diego, January 9-10, 2020. The meeting will be held at the Handlery hotel (down the street from the Town and Country Hotel). Logistical and other information can be found at the GMOD wiki as the details become more solid: http://gmod.org/wiki/Codefest_2020 Several potential projects are described in this document: https://docs.google.com/document/d/1_CnUW_W4tNyl7lSlihCwZDKT45VQQxcI3I-VgjnC2Dc/edit We know for sure that there will be developers from the Tripal, Apollo, JBrowse, and Chado projects. Of course, other projects like Galaxy, InterMine, and MAKER are welcome too! See you in San Diego! Scott -- ------------------------------------------------------------------------ Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research |
From: Nathan D. <nat...@lb...> - 2019-10-10 23:01:34
|
I plan on being there to work on user’s Apollo-based problems (or JBrowse ones if that is more pertinent). I’ve copied on the Apollo list. Looking forward to seeing you there, Nathan > On Oct 10, 2019, at 2:40 PM, Scott Cain <sc...@sc...> wrote: > > I am pleased to announce there will be a GMOD codefest occurring before the Plant and Animal Genomes meeting in San Diego. The codefest will be at the Town and Country hotel on January 9 and 10. If you would like to suggest a problem or project to address, add it to this Google Doc <https://docs.google.com/document/d/1_CnUW_W4tNyl7lSlihCwZDKT45VQQxcI3I-VgjnC2Dc/edit?usp=sharing>. The codefest is open to anyone who'd like to work on any GMOD project (or, better yet, any combination of GMOD projects), including but not limited to Tripal <http://gmod.org/wiki/Tripal>, Chado <http://gmod.org/wiki/Chado>, JBrowse <http://gmod.org/wiki/JBrowse> (1 and 2), Galaxy <http://gmod.org/wiki/Galaxy> and InterMine <http://gmod.org/wiki/InterMine>. We already know that there will be Tripal, Chado and JBrowse developers present. Registration information will be coming soon. > > You can follow the changes to this page: > > http://gmod.org/wiki/Codefest_2020 <http://gmod.org/wiki/Codefest_2020> > > to keep up to date, or follow The Tweet of GMOD (@gmodproject) on Twitter. > > See you in San Diego, > Scott > > > -- > ------------------------------------------------------------------------ > Scott Cain, Ph. D. scott at scottcain dot net > GMOD Coordinator (http://gmod.org/ <http://gmod.org/>) 216-392-3087 > Ontario Institute for Cancer Research > _______________________________________________ > Gmod-devel mailing list > Gmo...@li... > https://lists.sourceforge.net/lists/listinfo/gmod-devel |
From: Scott C. <sc...@sc...> - 2019-10-10 21:40:49
|
I am pleased to announce there will be a GMOD codefest occurring before the Plant and Animal Genomes meeting in San Diego. The codefest will be at the Town and Country hotel on January 9 and 10. If you would like to suggest a problem or project to address, add it to this Google Doc <https://docs.google.com/document/d/1_CnUW_W4tNyl7lSlihCwZDKT45VQQxcI3I-VgjnC2Dc/edit?usp=sharing>. The codefest is open to anyone who'd like to work on any GMOD project (or, better yet, any combination of GMOD projects), including but not limited to Tripal <http://gmod.org/wiki/Tripal>, Chado <http://gmod.org/wiki/Chado>, JBrowse <http://gmod.org/wiki/JBrowse> (1 and 2), Galaxy <http://gmod.org/wiki/Galaxy> and InterMine <http://gmod.org/wiki/InterMine>. We already know that there will be Tripal, Chado and JBrowse developers present. Registration information will be coming soon. You can follow the changes to this page: http://gmod.org/wiki/Codefest_2020 to keep up to date, or follow The Tweet of GMOD (@gmodproject) on Twitter. See you in San Diego, Scott -- ------------------------------------------------------------------------ Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research |
From: Meg S. <me...@gm...> - 2019-05-31 15:17:12
|
Hi, The second Tripal position, working with i5k, has now posted: https://ut.taleo.net/careersection/ut_system/jobdetail.ftl?job=19000000ZD&tz=GMT-04%3A00&tzname=America%2FNew_York Please share widely! Thanks, Meg Staton On Tue, May 14, 2019 at 8:22 AM Meg Staton <me...@gm...> wrote: > Hi, > I've got two Tripal positions opening up in my lab. The first is currently > written as a postdoc but could be altered to be a developer if the right > person comes along. Its to work on the hardwood genomics site, and > currently only has funding for 1 year. We added a machine learning > component to this job description as its of interest generally to my lab, > but this could be altered to fit the candidate. > > > https://ut.taleo.net/careersection/ut_system/jobdetail.ftl?job=19000000W9&tz=GMT-04%3A00&tzname=America%2FNew_York > > The second position is a developer for i5k. This person will work closely > with me and with Monica Polechau and Chris Childers of the USDA. The > position is funded for 18 months with a good possibility of continuing > renewal with the USDA. > > This second position is not yet posted (I'm having trouble getting the > paperwork accepted, but it should be soon!). But I'm eager to fill it and > hoping to get folks thinking about it, so I'm posting the draft job ad > here. I'll send the official one when its up. > > Remote work is feasible for both positions. Free tuition is a guaranteed > benefit of full time employment with the University of Tennessee, so > relocation to Knoxville opens the possibility of pursuing an MS or PhD > degree. > > I'm happy to answer any additional questions or discuss details more with > anyone interested. Please share widely! > > Meg > > -- > Margaret Staton > Assistant Professor > Department of Entomology and Plant Pathology > Office: 154 PBB > Mail: 370 PBB, 2505 EJ Chapman Drive > Knoxville, TN 37996-4560 > she/her/hers > > 864-506-4515 Mobile > mst...@ut... > > > > > -- Margaret Staton Assistant Professor Department of Entomology and Plant Pathology Office: 154 PBB Mail: 370 PBB, 2505 EJ Chapman Drive Knoxville, TN 37996-4560 she/her/hers 864-506-4515 Mobile mst...@ut... |
From: Meg S. <me...@gm...> - 2019-05-14 12:22:54
|
Hi, I've got two Tripal positions opening up in my lab. The first is currently written as a postdoc but could be altered to be a developer if the right person comes along. Its to work on the hardwood genomics site, and currently only has funding for 1 year. We added a machine learning component to this job description as its of interest generally to my lab, but this could be altered to fit the candidate. https://ut.taleo.net/careersection/ut_system/jobdetail.ftl?job=19000000W9&tz=GMT-04%3A00&tzname=America%2FNew_York The second position is a developer for i5k. This person will work closely with me and with Monica Polechau and Chris Childers of the USDA. The position is funded for 18 months with a good possibility of continuing renewal with the USDA. This second position is not yet posted (I'm having trouble getting the paperwork accepted, but it should be soon!). But I'm eager to fill it and hoping to get folks thinking about it, so I'm posting the draft job ad here. I'll send the official one when its up. Remote work is feasible for both positions. Free tuition is a guaranteed benefit of full time employment with the University of Tennessee, so relocation to Knoxville opens the possibility of pursuing an MS or PhD degree. I'm happy to answer any additional questions or discuss details more with anyone interested. Please share widely! Meg -- Margaret Staton Assistant Professor Department of Entomology and Plant Pathology Office: 154 PBB Mail: 370 PBB, 2505 EJ Chapman Drive Knoxville, TN 37996-4560 she/her/hers 864-506-4515 Mobile mst...@ut... |
From: Scott C. <sc...@sc...> - 2019-03-06 20:13:16
|
Hi All, In an effort to speed up Chado development from its current glacial pace, we've decided to change the governance style of from benevolent dictator (yours truly) to using a Project Management Committee (PMC). The committee is composed of Stephen Ficklin, Lacey-Anne Sanderson, Bradford Condon, Josh Goodman, Naama Menda and myself. One of the first tasks we're undertaking is establishing guidelines for how to operate, and one of the guidelines is to have 6 reviewers for a pull request that institutes a "large" change, where large is defined in the guidelines. Now, I contented that establishing guidelines is a large change, and Lacey issued a pull request for the guidelines and everybody else on the PMC approved of the pull request, but that's only 5 acceptances. Would anybody else on the channel be willing to take a look and give a thumbs up review? Thanks! https://github.com/GMOD/Chado/pull/97 Also, I should mention that there is a Chado Slack channel in the Tripal project, which is open to non-Tripal users. If you'd like to be in the channel, let me know and I can get you an invite. Scott -- ------------------------------------------------------------------------ Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research |
From: Scott C. <sc...@sc...> - 2019-01-11 22:41:28
|
Hi All, The last day and a half a few people have been working on modernizing Chado and it's build procedure (lots of good stuff going on). One of the things we're considering is removing "modules" from the Chado build paradigm. Using automated build/migration tools (Flyway) makes maintaining modules as individual files much more difficult. As a result, we're thinking of doing away with them to be replaced with a single default schema SQL file (which is what gets release when we build releases anyway). There is an issue open in the Chado issue tracker to discuss this: https://github.com/GMOD/Chado/issues/90 and I wanted to bring it to the community's attention in case anybody has any strong feelings about this one way or the other. I feel like this is a concept that has probably outlived it usefulness, at least as actual individual files representing the modules. While I think I will continue to think of the Chado schema as composed of individual modules, I don't think it's necessary to maintain it as these separate files any more. Thoughts anyone? Thanks, Scott -- ------------------------------------------------------------------------ Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research |
From: Scott C. <sc...@sc...> - 2019-01-08 22:19:56
|
Hi all, Whether you are attending the codefest in San Diego this week or not, please take a few minutes to review the github issues that I've tagged to be considered for this event. That is, everything with the "2019 PAG Hackathon" tag. https://github.com/GMOD/Chado/issues?q=is%3Aissue+is%3Aopen+label%3A%222019+PAG+Hackathon%22 Note that most or all of them are tagged with "priority low"; that's because I didn't want to be the only one assigning priority to items. If I tagged it to be looked at this week, it's because I think it's at least moderately important. If there are issues that you feel are important, please feel free to add a "priority high" tag to it, and perhaps a note as to why you think it should be a high priority. And for those of you attending the codefest, I look forward to seeing you in a few days! Scott -- ------------------------------------------------------------------------ Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research |
From: Scott C. <sc...@sc...> - 2019-01-07 21:22:58
|
And if I remember correctly, we plan to look at issues on Thursday and governance related topics on Friday (and continuing to work on issues as time allows). For issues, see https://github.com/GMOD/Chado/labels/2019%20PAG%20Hackathon About governance: the benevolent dictator-for-life thing is definitely becoming strained, so time to look to something else! I look forward to seeing some of you on Thursday, Scott On Mon, Jan 7, 2019 at 9:36 AM Stephen Ficklin <spf...@gm...> wrote: > Dear Chado Community, > > As a reminder at this year's Tripal Codefest we are hosting a session on > Chado at the Plant and Animal Genome Conference. Anyone is welcome to > attend, even if you are not involved with Tripal. The session will run > Thurs 8am-5pm and Friday 8am-2:45pm in the Crescent room at the Town and > Country. Scott Cain is leading the session. > > Best, > Stephen > > > > _______________________________________________ > Gmod-schema mailing list > Gmo...@li... > https://lists.sourceforge.net/lists/listinfo/gmod-schema > -- ------------------------------------------------------------------------ Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research |
From: Stephen F. <spf...@gm...> - 2019-01-07 17:36:48
|
Dear Chado Community, As a reminder at this year's Tripal Codefest we are hosting a session on Chado at the Plant and Animal Genome Conference. Anyone is welcome to attend, even if you are not involved with Tripal. The session will run Thurs 8am-5pm and Friday 8am-2:45pm in the Crescent room at the Town and Country. Scott Cain is leading the session. Best, Stephen |
From: Guignon, V. (Bioversity-France) <v.g...@cg...> - 2018-12-04 17:21:58
|
Hi, For the "2. Tripal v3 Breeders API (BrAPI) Implementation", you already got the GoTo meeting link https://global.gotomeeting.com/join/173361157 and here is a shared Google doc for the notes: https://docs.google.com/document/d/1xm18hsKFoLThkXxZ8FxGZPUREyoW44DWhSQ_N29P-qc/edit?usp=sharing See you in a couple of minutes! Val -----Original Message----- From: Stephen Ficklin [mailto:ste...@ws...] Sent: lundi 3 décembre 2018 17:52 To: GMOD Tripal <gmo...@li...>; GMOD Schema/Chado List <gmo...@li...> Subject: [Gmod-tripal] Tripal & Chado Codefest Meetings Tomorrow Hi All, Tomorrow the normal Tripal User's Meeting is being moved to plan for the Tripal/Chado Codefest just before PAG in San Diego. To help ensure our Codefest in January is as productive as possible we will have three separate meetings for each of the 3 primary topics for the Codefest. If you will be attending the Codefest and if you can attend the online meeting tomorrow please join one of the of the following meetings below: 1. Tripal v3 Python Proof of Concept. Group Lead: Stephen Ficklin Connect via Zoom: https://zoom.us/j/455940438 2. Tripal v3 Breeders API (BrAPI) Implementation: Group Lead: Valentin Guignon Connect via GoToMeeting https://global.gotomeeting.com/join/173361157 3. Chado Updates (open to non-Tripal developers). Group Lead: Scott Cain Connect via Zoom: https://zoom.us/u/acrqoPpqZ6 The meetings all begin at 5pm European Central, 12pm US Eastern, 10am Regina, 8am US Pacific. You find the time in your area via this link: https://www.timeanddate.com/worldclock/fixedtime.html?iso=20181204T1600 If you are considering attending but have not yet signed up please add your name to our registration document: https://docs.google.com/spreadsheets/d/1agEK8S4yxphTOC7cojvsVjTxsPH5PenvtHDBDz5XotE/edit#gid=0 Chat soon! Stephen -- Stephen Ficklin, PhD Assistant Professor Bioinformatics and Systems Genetics Department of Horticulture Washington State University 153 Johnson Hall Pullman, WA 99164-6414 Office: (509) 335-4295 http://ficklinlab.cahnrs.wsu.edu/ _______________________________________________ Gmod-tripal mailing list Gmo...@li... https://lists.sourceforge.net/lists/listinfo/gmod-tripal |
From: Scott C. <sc...@sc...> - 2018-12-04 16:25:39
|
Hi Stephen, The link for the Chado call doesn't appear to be a "connect" link. It should be a long string of digits I think. Scott On Mon, Dec 3, 2018 at 7:25 AM Stephen Ficklin <ste...@ws...> wrote: > Hi All, > > Tomorrow the normal Tripal User's Meeting is being moved to plan for the > Tripal/Chado Codefest just before PAG in San Diego. To help ensure our > Codefest in January is as productive as possible we will have three > separate meetings for each of the 3 primary topics for the Codefest. If > you will be attending the Codefest and if you can attend the online > meeting tomorrow please join one of the of the following meetings below: > > 1. Tripal v3 Python Proof of Concept. > Group Lead: Stephen Ficklin > Connect via Zoom: https://zoom.us/j/455940438 > > 2. Tripal v3 Breeders API (BrAPI) Implementation: > Group Lead: Valentin Guignon > Connect via GoToMeeting https://global.gotomeeting.com/join/173361157 > > 3. Chado Updates (open to non-Tripal developers). > Group Lead: Scott Cain > Connect via Zoom: https://zoom.us/u/acrqoPpqZ6 > > The meetings all begin at 5pm European Central, 12pm US Eastern, 10am > Regina, 8am US Pacific. You find the time in your area via this link: > > https://www.timeanddate.com/worldclock/fixedtime.html?iso=20181204T1600 > > If you are considering attending but have not yet signed up please add > your name to our registration document: > > https://docs.google.com/spreadsheets/d/1agEK8S4yxphTOC7cojvsVjTxsPH5PenvtHDBDz5XotE/edit#gid=0 > > Chat soon! > Stephen > > -- > Stephen Ficklin, PhD > Assistant Professor > Bioinformatics and Systems Genetics > Department of Horticulture > Washington State University > 153 Johnson Hall > Pullman, WA 99164-6414 > Office: (509) 335-4295 > > http://ficklinlab.cahnrs.wsu.edu/ > > > > _______________________________________________ > Gmod-tripal mailing list > Gmo...@li... > https://lists.sourceforge.net/lists/listinfo/gmod-tripal > -- ------------------------------------------------------------------------ Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research |
From: bradford c. <bra...@gm...> - 2018-12-04 16:01:30
|
Hi all, I think Stephen copied the wrong link for the Chado meeting. This is the correct link: https://utia.zoom.us/j/811848876 <https://utia.zoom.us/j/811848876> Apologies Bradford > On Dec 3, 2018, at 11:52 AM, Stephen Ficklin <ste...@ws...> wrote: > > Hi All, > > Tomorrow the normal Tripal User's Meeting is being moved to plan for the Tripal/Chado Codefest just before PAG in San Diego. To help ensure our Codefest in January is as productive as possible we will have three separate meetings for each of the 3 primary topics for the Codefest. If you will be attending the Codefest and if you can attend the online meeting tomorrow please join one of the of the following meetings below: > > 1. Tripal v3 Python Proof of Concept. > Group Lead: Stephen Ficklin > Connect via Zoom: https://zoom.us/j/455940438 > > 2. Tripal v3 Breeders API (BrAPI) Implementation: > Group Lead: Valentin Guignon > Connect via GoToMeeting https://global.gotomeeting.com/join/173361157 > > 3. Chado Updates (open to non-Tripal developers). > Group Lead: Scott Cain > Connect via Zoom: https://zoom.us/u/acrqoPpqZ6 > > The meetings all begin at 5pm European Central, 12pm US Eastern, 10am Regina, 8am US Pacific. You find the time in your area via this link: > > https://www.timeanddate.com/worldclock/fixedtime.html?iso=20181204T1600 > > If you are considering attending but have not yet signed up please add your name to our registration document: > https://docs.google.com/spreadsheets/d/1agEK8S4yxphTOC7cojvsVjTxsPH5PenvtHDBDz5XotE/edit#gid=0 > > Chat soon! > Stephen > > -- > Stephen Ficklin, PhD > Assistant Professor > Bioinformatics and Systems Genetics > Department of Horticulture > Washington State University > 153 Johnson Hall > Pullman, WA 99164-6414 > Office: (509) 335-4295 > > http://ficklinlab.cahnrs.wsu.edu/ > > > > _______________________________________________ > Gmod-tripal mailing list > Gmo...@li... > https://lists.sourceforge.net/lists/listinfo/gmod-tripal |
From: Stephen F. <ste...@ws...> - 2018-12-03 17:25:39
|
Hi All, Tomorrow the normal Tripal User's Meeting is being moved to plan for the Tripal/Chado Codefest just before PAG in San Diego. To help ensure our Codefest in January is as productive as possible we will have three separate meetings for each of the 3 primary topics for the Codefest. If you will be attending the Codefest and if you can attend the online meeting tomorrow please join one of the of the following meetings below: 1. Tripal v3 Python Proof of Concept. Group Lead: Stephen Ficklin Connect via Zoom: https://zoom.us/j/455940438 2. Tripal v3 Breeders API (BrAPI) Implementation: Group Lead: Valentin Guignon Connect via GoToMeeting https://global.gotomeeting.com/join/173361157 3. Chado Updates (open to non-Tripal developers). Group Lead: Scott Cain Connect via Zoom: https://zoom.us/u/acrqoPpqZ6 The meetings all begin at 5pm European Central, 12pm US Eastern, 10am Regina, 8am US Pacific. You find the time in your area via this link: https://www.timeanddate.com/worldclock/fixedtime.html?iso=20181204T1600 If you are considering attending but have not yet signed up please add your name to our registration document: https://docs.google.com/spreadsheets/d/1agEK8S4yxphTOC7cojvsVjTxsPH5PenvtHDBDz5XotE/edit#gid=0 Chat soon! Stephen -- Stephen Ficklin, PhD Assistant Professor Bioinformatics and Systems Genetics Department of Horticulture Washington State University 153 Johnson Hall Pullman, WA 99164-6414 Office: (509) 335-4295 http://ficklinlab.cahnrs.wsu.edu/ |
From: Stephen F. <spf...@gm...> - 2018-11-16 19:44:00
|
Dear Chado Community, Last year a GMOD Hackathon was held prior to PAG with some work done on Chado. Also, the last few years we have held a Tripal Hackathon (now Codefest) on the Thursday and Friday before PAG in San Diego. This year we will combine the Chado with our Tripal Hackathons into a Tripal/Chado Codefest, held Jan 10th-11th in San Diego. This Codefest will have several topics one of which is dedicated to Chado. If you use Chado, regardless if you use Tripal or not, we invite you to attend the Chado session of the Codefest. Scott Cain, GMOD Project Lead, will direct the session. If you are interested in attending please let us know by adding your name to the Chado section of our signup sheet: https://docs.google.com/spreadsheets/d/1agEK8S4yxphTOC7cojvsVjTxsPH5PenvtHDBDz5XotE/edit#gid=0 If you have specific issues you want to address please let either myself or Scott Cain know. We may schedule an online meeting Tuesday Dec 4th to discuss priorities if discussion is needed. Hope to see you then! Stephen |
From: Scott C. <sc...@sc...> - 2018-10-18 19:12:56
|
Hi All, It's that time of year again: if you would be interested in presenting in the GMOD workshop at PAG in San Diego this January, please let me know. You don't have to have a full blown abstract yet; a general idea of what you'd like to talk about is fine. If you know anybody who might be interested but isn't on one of these lists, please feel free to forward this to them. Thanks, Scott -- ------------------------------------------------------------------------ Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research |
From: Sofia R. <so...@so...> - 2018-09-13 17:05:28
|
Thanks Ethy! This is a lot to figure out, there are quite a few different routes to take. I really like the idea of using the assembly version name in the identifier for the gene model. I need to figure out which approach make sense to input and retrieve with Tripal and to the individuals generating the gene models. I think the plan is to also add manually curated genes as well. The versioning of the entire set is also a good thought. Sofia On Thu, Sep 13, 2018 at 7:18 AM, Cannon, Ethalinda K [COM S] < ekc...@ia...> wrote: > Sorry to be late to the party; this is something I've worked on at length > with maize genes. > > > First: I'll note that the AgBioData consortium <https://www.agbiodata.org> > is forming a genome and gene model nomenclature group. Anyone working with > genome and/or gene model nomenclature is welcome to join. There is a > recording <https://www.youtube.com/watch?v=kNW6YReFP28&feature=youtu.be> > of our nomenclature discussion last week, and copies of the slides > <https://www.agbiodata.org/sites/default/files/Genome%20Nomenclature%20meeting%20slides.pdf> > are available. > > After playing with the idea of versioning gene models the maize > group decided to instead version the sets. We haven't (yet) been successful > with hand-curation of gene models and instead improve the gene model sets > via re-analysis. > > Note that, the .<digit> suffix indicates alternative isoforms in many > nomenclature patterns. > > An analysis record represents each gene model set and gene models are > linked via analysisfeature. It is possible for the same gene model feature > record to attached to multiple versions if it hasn't changed. Sequence > isn't stored in the feature record but retrieved from the appropriate BLAST > db as needed. This takes care of (rare?) situations in which the name stays > the same but there are minor changes to the sequence. I have a rather > clunky way of indicating the current version via analysisprop. > > There is a request in for the addition of an analysis.type_id field for > Chado 1.4 (https://github.com/GMOD/Chado/pull/52). > > For maintaining history, I use feature_relationship with a set of cvterms > indicating, for example, whether a gene model has been split or merged. > Split and merged gene models get new names. > > Because we have gene models from several different maize genome > assemblies, we run an analysis to find likely orthologs across the multiple > gene model sets. These are also linked via feature_relationship records. > > Hope this helps. > > Ethy > ------------------------------ > *From:* Joe Carlson <jwc...@lb...> > *Sent:* Wednesday, September 12, 2018 4:36 PM > *To:* Sofia Robb > *Cc:* GMOD Schema/Chado List > *Subject:* Re: [Gmod-schema] gene/mRNA version > > > On Sep 12, 2018, at 1:43 PM, Sofia Robb <so...@so...> wrote: > > Good point about merging and splitting genes. > > I think this is meant to be a pretty stable assembly and the hopes are > that the annotations are good. But split and merged genes are quite typical > issues I have seen in many different annotation sets, and I suspect we will > find some in this as well. My first gut solution to merging or splitting is > that these would have to have new stable ids, if we go the stable ID route. > > When you say using the feature_relationship table for tracking are you > thinking that the cvterm_id would be some term like version_of and the > subject would be the versioned feature and the object would be the stable > feature (new_version 'version_of' stable_version)? Or are you saying that > the stable ID route isn't great in your opinion and that the cvterm should > be something like new_feature 'is_new_version_of' old_feature? > > > I was thinking of having a ‘previous_version_of’ (or some such label) and > link annotations through the feature_relation table. I really don’t know > which solution is best: it depends on what you want the tracking to do. Or > how fine-grained you need the tracking to be. My one concern with merges is > that you’ll not be able to have multiple stable id’s for one gene unless > you keep track of the rank field or modify the schema. > > joe > > > Thank you for taking your time to discuss this with me. > Sofia > > > On Wed, Sep 12, 2018 at 2:28 PM, Joe Carlson <jwc...@lb...> wrote: > > > On Sep 12, 2018, at 1:09 PM, Sofia Robb <so...@so...> wrote: > > Hi Joe and other Chado users, > > Joe, Thanks for your response. I would like to know more about your data. > I have a few questions and will follow them up with a dump of my current > ideas on how to solve this. > > > I’m managing the backend db for the phytozome project at JGI ( > phytozome.jgi.doe.gov), a comparative land plant db. We have ~ 250 plant > genomes (assemblies, annotation and analysis results) loaded right now. The > size of the db is ~ 1.5T. > > > Are you the source of the sequence? > > > We have the land plants sequenced by the JGI, things done by collaborators > and other model organisms. It’s roughly an equal mixture of each. > > Or are pulling the data from another database? > > > Data import is with fasta files for chromosomes and proteins; gff3 for > structure. > > What do you do if the actual sequence changes? Do you just overwrite the > previous sequence data? > > > I never overwrite or delete. Once it goes in the database, it stays in the > database. > > > We are going to be the official repository of this data and have been > asked to keep track the history of changes. This is more than I have had to > keep track of in the past. > > I had been thinking of trying to implement some loading of the data which > gets across the idea that each feature has a stable version which is equal > to it its current version and any number of older versions. Now this is > just an idea (largely based on the representation of data from ensembl). > > The stable version would have a stable id which lacks the '.\d' suffix. > And there would be a feature record for each version which includes the > '.\d' suffix. I would mark older versions obsolete. What I am still working > on in this idea is what I could add as properties (gff 9th column) to help > with searches. Perhaps I could add a stableID=xyz in each record? I think > this would help with a query, I could search for the stableID and obsolete > when I need to retrieve the history of changes? > > feature.uniquename: some_gene.1 > featureprop.cvterm_id: some term that indicates the concept stableID > featureprop.value: some_gene > feature.is_obsolete: true > > > feature.uniquename some_gene.2 > featureprop.cvterm_id: some term that indicates the concept stableID > featureprop.value: some_gene > feature.is_obsolete: false > > feature.uniquename some_gene > featureprop.cvterm_id: some term that indicates the concept stableID > featureprop.value: some_gene > feature.is_obsolete: false > > > How you do this depends a bit on the nature of the reannotations. If you > have a fairly stable assembly and annotation then it entirely makes sense > to count on there being a stable identifier. In what I have, we often have > dramatically different assemblies from one version to another (many of our > assemblies do not have pseudo molecules) and we cannot count on stable ids. > > Your assigning a stable id as a property will work if the changes are not > too extensive. But think of the case where 2 genes in 1 version are > modified in such a way that 1 gene is split and half is merged into another > gene. What rules are you going to use to assign the stable id for the > merged gene? > > An alternative tracking mechanism between versions is to use a > feature_relationship. You could keep track of things a bit better with this > table if there are extensive merges and splits. For the most part we are > not maintaining gene history except in a few of our important genomes. > > Joe > > > > Thank you, > Sofia > > > > On Wed, Sep 12, 2018 at 1:34 PM, Joe Carlson <jwc...@lb...> wrote: > > For what it’s worth, I’ve been using dbxref’s to track annotation > versions. I’ve modified the schema to make dbxref_id in the feature table > to be not null, and use a record in the dbxref table to label the source - > and version - of the data. > > Appending a numerical identifier to the name means that a query for a > particular version will require a VERY expensive sql constraint "and name > like ‘%.N’” in the queries. > > Joe > > > On Sep 12, 2018, at 12:16 PM, Sofia Robb <so...@so...> wrote: > > Hello All, > > I have a question about how others are handling sequence feature versions. > I am using Tripal and have posted this question in the Tripal repository > Issues as well. > > I have a group that is developing gene/mRNA models. They are using an > ensembl like system for versioning of gene and transcript id. And they want > to maintain a history of previous versions. > > They plan on incrementing a digit after the id when a new version is > generated. > > gene nv2m00005394.1 > mRNA nv2m00005394.1.mRNA.1 > > Chr11 GFF3Conv gene 3598792 3603486 . - . Alias=Sox9;Name=nv2m00005394.1;ID=nv2m00005394.1 > Chr11 GFF3Conv mRNA 3598792 3603486 . - . ID=nv2m00005394.1.mRNA.1;Parent=nv2m00005394.1 > > How should I handle this? Create a new feature for each version and mark > the old one obsolete? How do I make it easy for users to find the correct > ID when they don't know there has been an update? I have some ideas, but it > would require the geneID and mRNAIDs to have different bases, ie > nv2g00005394 (change g->m) for gene and nv2m00005394 for mRNA. > > Any advice would be fantastic!!! > Thank you! > Sofia > > _______________________________________________ > Gmod-schema mailing list > Gmo...@li... > https://lists.sourceforge.net/lists/listinfo/gmod-schema > > > > > > > |
From: Cannon, E. K [C. S] <ekc...@ia...> - 2018-09-13 14:54:11
|
Sorry to be late to the party; this is something I've worked on at length with maize genes. First: I'll note that the AgBioData consortium<https://www.agbiodata.org> is forming a genome and gene model nomenclature group. Anyone working with genome and/or gene model nomenclature is welcome to join. There is a recording<https://www.youtube.com/watch?v=kNW6YReFP28&feature=youtu.be> of our nomenclature discussion last week, and copies of the slides<https://www.agbiodata.org/sites/default/files/Genome%20Nomenclature%20meeting%20slides.pdf> are available. After playing with the idea of versioning gene models the maize group decided to instead version the sets. We haven't (yet) been successful with hand-curation of gene models and instead improve the gene model sets via re-analysis. Note that, the .<digit> suffix indicates alternative isoforms in many nomenclature patterns. An analysis record represents each gene model set and gene models are linked via analysisfeature. It is possible for the same gene model feature record to attached to multiple versions if it hasn't changed. Sequence isn't stored in the feature record but retrieved from the appropriate BLAST db as needed. This takes care of (rare?) situations in which the name stays the same but there are minor changes to the sequence. I have a rather clunky way of indicating the current version via analysisprop. There is a request in for the addition of an analysis.type_id field for Chado 1.4 (https://github.com/GMOD/Chado/pull/52). For maintaining history, I use feature_relationship with a set of cvterms indicating, for example, whether a gene model has been split or merged. Split and merged gene models get new names. Because we have gene models from several different maize genome assemblies, we run an analysis to find likely orthologs across the multiple gene model sets. These are also linked via feature_relationship records. Hope this helps. Ethy ________________________________ From: Joe Carlson <jwc...@lb...> Sent: Wednesday, September 12, 2018 4:36 PM To: Sofia Robb Cc: GMOD Schema/Chado List Subject: Re: [Gmod-schema] gene/mRNA version On Sep 12, 2018, at 1:43 PM, Sofia Robb <so...@so...<mailto:so...@so...>> wrote: Good point about merging and splitting genes. I think this is meant to be a pretty stable assembly and the hopes are that the annotations are good. But split and merged genes are quite typical issues I have seen in many different annotation sets, and I suspect we will find some in this as well. My first gut solution to merging or splitting is that these would have to have new stable ids, if we go the stable ID route. When you say using the feature_relationship table for tracking are you thinking that the cvterm_id would be some term like version_of and the subject would be the versioned feature and the object would be the stable feature (new_version 'version_of' stable_version)? Or are you saying that the stable ID route isn't great in your opinion and that the cvterm should be something like new_feature 'is_new_version_of' old_feature? I was thinking of having a ‘previous_version_of’ (or some such label) and link annotations through the feature_relation table. I really don’t know which solution is best: it depends on what you want the tracking to do. Or how fine-grained you need the tracking to be. My one concern with merges is that you’ll not be able to have multiple stable id’s for one gene unless you keep track of the rank field or modify the schema. joe Thank you for taking your time to discuss this with me. Sofia On Wed, Sep 12, 2018 at 2:28 PM, Joe Carlson <jwc...@lb...<mailto:jwc...@lb...>> wrote: On Sep 12, 2018, at 1:09 PM, Sofia Robb <so...@so...<mailto:so...@so...>> wrote: Hi Joe and other Chado users, Joe, Thanks for your response. I would like to know more about your data. I have a few questions and will follow them up with a dump of my current ideas on how to solve this. I’m managing the backend db for the phytozome project at JGI (phytozome.jgi.doe.gov<http://phytozome.jgi.doe.gov/>), a comparative land plant db. We have ~ 250 plant genomes (assemblies, annotation and analysis results) loaded right now. The size of the db is ~ 1.5T. Are you the source of the sequence? We have the land plants sequenced by the JGI, things done by collaborators and other model organisms. It’s roughly an equal mixture of each. Or are pulling the data from another database? Data import is with fasta files for chromosomes and proteins; gff3 for structure. What do you do if the actual sequence changes? Do you just overwrite the previous sequence data? I never overwrite or delete. Once it goes in the database, it stays in the database. We are going to be the official repository of this data and have been asked to keep track the history of changes. This is more than I have had to keep track of in the past. I had been thinking of trying to implement some loading of the data which gets across the idea that each feature has a stable version which is equal to it its current version and any number of older versions. Now this is just an idea (largely based on the representation of data from ensembl). The stable version would have a stable id which lacks the '.\d' suffix. And there would be a feature record for each version which includes the '.\d' suffix. I would mark older versions obsolete. What I am still working on in this idea is what I could add as properties (gff 9th column) to help with searches. Perhaps I could add a stableID=xyz in each record? I think this would help with a query, I could search for the stableID and obsolete when I need to retrieve the history of changes? feature.uniquename: some_gene.1 featureprop.cvterm_id: some term that indicates the concept stableID featureprop.value: some_gene feature.is_obsolete: true feature.uniquename some_gene.2 featureprop.cvterm_id: some term that indicates the concept stableID featureprop.value: some_gene feature.is_obsolete: false feature.uniquename some_gene featureprop.cvterm_id: some term that indicates the concept stableID featureprop.value: some_gene feature.is_obsolete: false How you do this depends a bit on the nature of the reannotations. If you have a fairly stable assembly and annotation then it entirely makes sense to count on there being a stable identifier. In what I have, we often have dramatically different assemblies from one version to another (many of our assemblies do not have pseudo molecules) and we cannot count on stable ids. Your assigning a stable id as a property will work if the changes are not too extensive. But think of the case where 2 genes in 1 version are modified in such a way that 1 gene is split and half is merged into another gene. What rules are you going to use to assign the stable id for the merged gene? An alternative tracking mechanism between versions is to use a feature_relationship. You could keep track of things a bit better with this table if there are extensive merges and splits. For the most part we are not maintaining gene history except in a few of our important genomes. Joe Thank you, Sofia On Wed, Sep 12, 2018 at 1:34 PM, Joe Carlson <jwc...@lb...<mailto:jwc...@lb...>> wrote: For what it’s worth, I’ve been using dbxref’s to track annotation versions. I’ve modified the schema to make dbxref_id in the feature table to be not null, and use a record in the dbxref table to label the source - and version - of the data. Appending a numerical identifier to the name means that a query for a particular version will require a VERY expensive sql constraint "and name like ‘%.N’” in the queries. Joe On Sep 12, 2018, at 12:16 PM, Sofia Robb <so...@so...<mailto:so...@so...>> wrote: Hello All, I have a question about how others are handling sequence feature versions. I am using Tripal and have posted this question in the Tripal repository Issues as well. I have a group that is developing gene/mRNA models. They are using an ensembl like system for versioning of gene and transcript id. And they want to maintain a history of previous versions. They plan on incrementing a digit after the id when a new version is generated. gene nv2m00005394.1 mRNA nv2m00005394.1.mRNA.1 Chr11 GFF3Conv gene 3598792 3603486 . - . Alias=Sox9;Name=nv2m00005394.1;ID=nv2m00005394.1 Chr11 GFF3Conv mRNA 3598792 3603486 . - . ID=nv2m00005394.1.mRNA.1;Parent=nv2m00005394.1 How should I handle this? Create a new feature for each version and mark the old one obsolete? How do I make it easy for users to find the correct ID when they don't know there has been an update? I have some ideas, but it would require the geneID and mRNAIDs to have different bases, ie nv2g00005394 (change g->m) for gene and nv2m00005394 for mRNA. Any advice would be fantastic!!! Thank you! Sofia _______________________________________________ Gmod-schema mailing list Gmo...@li...<mailto:Gmo...@li...> https://lists.sourceforge.net/lists/listinfo/gmod-schema |
From: Joe C. <jwc...@lb...> - 2018-09-12 21:36:33
|
> On Sep 12, 2018, at 1:43 PM, Sofia Robb <so...@so...> wrote: > > Good point about merging and splitting genes. > > I think this is meant to be a pretty stable assembly and the hopes are that the annotations are good. But split and merged genes are quite typical issues I have seen in many different annotation sets, and I suspect we will find some in this as well. My first gut solution to merging or splitting is that these would have to have new stable ids, if we go the stable ID route. > > When you say using the feature_relationship table for tracking are you thinking that the cvterm_id would be some term like version_of and the subject would be the versioned feature and the object would be the stable feature (new_version 'version_of' stable_version)? Or are you saying that the stable ID route isn't great in your opinion and that the cvterm should be something like new_feature 'is_new_version_of' old_feature? I was thinking of having a ‘previous_version_of’ (or some such label) and link annotations through the feature_relation table. I really don’t know which solution is best: it depends on what you want the tracking to do. Or how fine-grained you need the tracking to be. My one concern with merges is that you’ll not be able to have multiple stable id’s for one gene unless you keep track of the rank field or modify the schema. joe > > Thank you for taking your time to discuss this with me. > Sofia > > > On Wed, Sep 12, 2018 at 2:28 PM, Joe Carlson <jwc...@lb... <mailto:jwc...@lb...>> wrote: > >> On Sep 12, 2018, at 1:09 PM, Sofia Robb <so...@so... <mailto:so...@so...>> wrote: >> >> Hi Joe and other Chado users, >> >> Joe, Thanks for your response. I would like to know more about your data. I have a few questions and will follow them up with a dump of my current ideas on how to solve this. > > I’m managing the backend db for the phytozome project at JGI (phytozome.jgi.doe.gov <http://phytozome.jgi.doe.gov/>), a comparative land plant db. We have ~ 250 plant genomes (assemblies, annotation and analysis results) loaded right now. The size of the db is ~ 1.5T. >> >> Are you the source of the sequence? > > We have the land plants sequenced by the JGI, things done by collaborators and other model organisms. It’s roughly an equal mixture of each. > >> Or are pulling the data from another database? > > Data import is with fasta files for chromosomes and proteins; gff3 for structure. >> What do you do if the actual sequence changes? Do you just overwrite the previous sequence data? > > I never overwrite or delete. Once it goes in the database, it stays in the database. >> >> We are going to be the official repository of this data and have been asked to keep track the history of changes. This is more than I have had to keep track of in the past. >> >> I had been thinking of trying to implement some loading of the data which gets across the idea that each feature has a stable version which is equal to it its current version and any number of older versions. Now this is just an idea (largely based on the representation of data from ensembl). >> >> The stable version would have a stable id which lacks the '.\d' suffix. And there would be a feature record for each version which includes the '.\d' suffix. I would mark older versions obsolete. What I am still working on in this idea is what I could add as properties (gff 9th column) to help with searches. Perhaps I could add a stableID=xyz in each record? I think this would help with a query, I could search for the stableID and obsolete when I need to retrieve the history of changes? >> >> feature.uniquename: some_gene.1 >> featureprop.cvterm_id: some term that indicates the concept stableID >> featureprop.value: some_gene >> feature.is_obsolete: true >> >> >> feature.uniquename some_gene.2 >> featureprop.cvterm_id: some term that indicates the concept stableID >> featureprop.value: some_gene >> feature.is_obsolete: false >> >> feature.uniquename some_gene >> featureprop.cvterm_id: some term that indicates the concept stableID >> featureprop.value: some_gene >> feature.is_obsolete: false > > How you do this depends a bit on the nature of the reannotations. If you have a fairly stable assembly and annotation then it entirely makes sense to count on there being a stable identifier. In what I have, we often have dramatically different assemblies from one version to another (many of our assemblies do not have pseudo molecules) and we cannot count on stable ids. > > Your assigning a stable id as a property will work if the changes are not too extensive. But think of the case where 2 genes in 1 version are modified in such a way that 1 gene is split and half is merged into another gene. What rules are you going to use to assign the stable id for the merged gene? > > An alternative tracking mechanism between versions is to use a feature_relationship. You could keep track of things a bit better with this table if there are extensive merges and splits. For the most part we are not maintaining gene history except in a few of our important genomes. > > Joe >> >> >> Thank you, >> Sofia >> >> >> >> On Wed, Sep 12, 2018 at 1:34 PM, Joe Carlson <jwc...@lb... <mailto:jwc...@lb...>> wrote: >> For what it’s worth, I’ve been using dbxref’s to track annotation versions. I’ve modified the schema to make dbxref_id in the feature table to be not null, and use a record in the dbxref table to label the source - and version - of the data. >> >> Appending a numerical identifier to the name means that a query for a particular version will require a VERY expensive sql constraint "and name like ‘%.N’” in the queries. >> >> Joe >> >> >>> On Sep 12, 2018, at 12:16 PM, Sofia Robb <so...@so... <mailto:so...@so...>> wrote: >>> >>> Hello All, >>> >>> I have a question about how others are handling sequence feature versions. I am using Tripal and have posted this question in the Tripal repository Issues as well. >>> >>> I have a group that is developing gene/mRNA models. They are using an ensembl like system for versioning of gene and transcript id. And they want to maintain a history of previous versions. >>> >>> They plan on incrementing a digit after the id when a new version is generated. >>> >>> gene nv2m00005394.1 >>> mRNA nv2m00005394.1.mRNA.1 >>> >>> Chr11 GFF3Conv gene 3598792 3603486 . - . Alias=Sox9;Name=nv2m00005394.1;ID=nv2m00005394.1 >>> Chr11 GFF3Conv mRNA 3598792 3603486 . - . ID=nv2m00005394.1.mRNA.1;Parent=nv2m00005394.1 >>> How should I handle this? Create a new feature for each version and mark the old one obsolete? How do I make it easy for users to find the correct ID when they don't know there has been an update? I have some ideas, but it would require the geneID and mRNAIDs to have different bases, ie nv2g00005394 (change g->m) for gene and nv2m00005394 for mRNA. >>> >>> Any advice would be fantastic!!! >>> >>> Thank you! >>> Sofia >>> >>> _______________________________________________ >>> Gmod-schema mailing list >>> Gmo...@li... <mailto:Gmo...@li...> >>> https://lists.sourceforge.net/lists/listinfo/gmod-schema <https://lists.sourceforge.net/lists/listinfo/gmod-schema> >> >> > > |
From: Sofia R. <so...@so...> - 2018-09-12 20:43:24
|
Good point about merging and splitting genes. I think this is meant to be a pretty stable assembly and the hopes are that the annotations are good. But split and merged genes are quite typical issues I have seen in many different annotation sets, and I suspect we will find some in this as well. My first gut solution to merging or splitting is that these would have to have new stable ids, if we go the stable ID route. When you say using the feature_relationship table for tracking are you thinking that the cvterm_id would be some term like version_of and the subject would be the versioned feature and the object would be the stable feature (new_version 'version_of' stable_version)? Or are you saying that the stable ID route isn't great in your opinion and that the cvterm should be something like new_feature 'is_new_version_of' old_feature? Thank you for taking your time to discuss this with me. Sofia On Wed, Sep 12, 2018 at 2:28 PM, Joe Carlson <jwc...@lb...> wrote: > > On Sep 12, 2018, at 1:09 PM, Sofia Robb <so...@so...> wrote: > > Hi Joe and other Chado users, > > Joe, Thanks for your response. I would like to know more about your data. > I have a few questions and will follow them up with a dump of my current > ideas on how to solve this. > > > I’m managing the backend db for the phytozome project at JGI ( > phytozome.jgi.doe.gov), a comparative land plant db. We have ~ 250 plant > genomes (assemblies, annotation and analysis results) loaded right now. The > size of the db is ~ 1.5T. > > > Are you the source of the sequence? > > > We have the land plants sequenced by the JGI, things done by collaborators > and other model organisms. It’s roughly an equal mixture of each. > > Or are pulling the data from another database? > > > Data import is with fasta files for chromosomes and proteins; gff3 for > structure. > > What do you do if the actual sequence changes? Do you just overwrite the > previous sequence data? > > > I never overwrite or delete. Once it goes in the database, it stays in the > database. > > > We are going to be the official repository of this data and have been > asked to keep track the history of changes. This is more than I have had to > keep track of in the past. > > I had been thinking of trying to implement some loading of the data which > gets across the idea that each feature has a stable version which is equal > to it its current version and any number of older versions. Now this is > just an idea (largely based on the representation of data from ensembl). > > The stable version would have a stable id which lacks the '.\d' suffix. > And there would be a feature record for each version which includes the > '.\d' suffix. I would mark older versions obsolete. What I am still working > on in this idea is what I could add as properties (gff 9th column) to help > with searches. Perhaps I could add a stableID=xyz in each record? I think > this would help with a query, I could search for the stableID and obsolete > when I need to retrieve the history of changes? > > feature.uniquename: some_gene.1 > featureprop.cvterm_id: some term that indicates the concept stableID > featureprop.value: some_gene > feature.is_obsolete: true > > > feature.uniquename some_gene.2 > featureprop.cvterm_id: some term that indicates the concept stableID > featureprop.value: some_gene > feature.is_obsolete: false > > feature.uniquename some_gene > featureprop.cvterm_id: some term that indicates the concept stableID > featureprop.value: some_gene > feature.is_obsolete: false > > > How you do this depends a bit on the nature of the reannotations. If you > have a fairly stable assembly and annotation then it entirely makes sense > to count on there being a stable identifier. In what I have, we often have > dramatically different assemblies from one version to another (many of our > assemblies do not have pseudo molecules) and we cannot count on stable ids. > > Your assigning a stable id as a property will work if the changes are not > too extensive. But think of the case where 2 genes in 1 version are > modified in such a way that 1 gene is split and half is merged into another > gene. What rules are you going to use to assign the stable id for the > merged gene? > > An alternative tracking mechanism between versions is to use a > feature_relationship. You could keep track of things a bit better with this > table if there are extensive merges and splits. For the most part we are > not maintaining gene history except in a few of our important genomes. > > Joe > > > > Thank you, > Sofia > > > > On Wed, Sep 12, 2018 at 1:34 PM, Joe Carlson <jwc...@lb...> wrote: > >> For what it’s worth, I’ve been using dbxref’s to track annotation >> versions. I’ve modified the schema to make dbxref_id in the feature table >> to be not null, and use a record in the dbxref table to label the source - >> and version - of the data. >> >> Appending a numerical identifier to the name means that a query for a >> particular version will require a VERY expensive sql constraint "and name >> like ‘%.N’” in the queries. >> >> Joe >> >> >> On Sep 12, 2018, at 12:16 PM, Sofia Robb <so...@so...> wrote: >> >> Hello All, >> >> I have a question about how others are handling sequence feature >> versions. I am using Tripal and have posted this question in the Tripal >> repository Issues as well. >> >> I have a group that is developing gene/mRNA models. They are using an >> ensembl like system for versioning of gene and transcript id. And they want >> to maintain a history of previous versions. >> >> They plan on incrementing a digit after the id when a new version is >> generated. >> >> gene nv2m00005394.1 >> mRNA nv2m00005394.1.mRNA.1 >> >> Chr11 GFF3Conv gene 3598792 3603486 . - . Alias=Sox9;Name=nv2m00005394.1;ID=nv2m00005394.1 >> Chr11 GFF3Conv mRNA 3598792 3603486 . - . ID=nv2m00005394.1.mRNA.1;Parent=nv2m00005394.1 >> >> How should I handle this? Create a new feature for each version and mark >> the old one obsolete? How do I make it easy for users to find the correct >> ID when they don't know there has been an update? I have some ideas, but it >> would require the geneID and mRNAIDs to have different bases, ie >> nv2g00005394 (change g->m) for gene and nv2m00005394 for mRNA. >> >> Any advice would be fantastic!!! >> Thank you! >> Sofia >> >> _______________________________________________ >> Gmod-schema mailing list >> Gmo...@li... >> https://lists.sourceforge.net/lists/listinfo/gmod-schema >> >> >> > > |