You can subscribe to this list here.
2002 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
(28) |
Nov
(87) |
Dec
(16) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2003 |
Jan
(109) |
Feb
(107) |
Mar
(117) |
Apr
(5) |
May
(156) |
Jun
(83) |
Jul
(86) |
Aug
(25) |
Sep
(17) |
Oct
(14) |
Nov
(82) |
Dec
(50) |
2004 |
Jan
(14) |
Feb
(75) |
Mar
(110) |
Apr
(83) |
May
(20) |
Jun
(36) |
Jul
(12) |
Aug
(37) |
Sep
(9) |
Oct
(11) |
Nov
(52) |
Dec
(68) |
2005 |
Jan
(46) |
Feb
(94) |
Mar
(68) |
Apr
(55) |
May
(67) |
Jun
(65) |
Jul
(67) |
Aug
(96) |
Sep
(79) |
Oct
(46) |
Nov
(24) |
Dec
(64) |
2006 |
Jan
(39) |
Feb
(31) |
Mar
(48) |
Apr
(58) |
May
(31) |
Jun
(57) |
Jul
(29) |
Aug
(40) |
Sep
(22) |
Oct
(31) |
Nov
(44) |
Dec
(51) |
2007 |
Jan
(103) |
Feb
(172) |
Mar
(59) |
Apr
(41) |
May
(33) |
Jun
(50) |
Jul
(60) |
Aug
(51) |
Sep
(21) |
Oct
(40) |
Nov
(89) |
Dec
(39) |
2008 |
Jan
(28) |
Feb
(20) |
Mar
(19) |
Apr
(29) |
May
(29) |
Jun
(24) |
Jul
(32) |
Aug
(16) |
Sep
(35) |
Oct
(23) |
Nov
(17) |
Dec
(19) |
2009 |
Jan
(4) |
Feb
(23) |
Mar
(16) |
Apr
(16) |
May
(38) |
Jun
(54) |
Jul
(18) |
Aug
(40) |
Sep
(58) |
Oct
(6) |
Nov
(8) |
Dec
(29) |
2010 |
Jan
(40) |
Feb
(40) |
Mar
(63) |
Apr
(95) |
May
(136) |
Jun
(58) |
Jul
(91) |
Aug
(55) |
Sep
(77) |
Oct
(52) |
Nov
(85) |
Dec
(37) |
2011 |
Jan
(22) |
Feb
(46) |
Mar
(73) |
Apr
(138) |
May
(75) |
Jun
(35) |
Jul
(41) |
Aug
(13) |
Sep
(13) |
Oct
(11) |
Nov
(21) |
Dec
(5) |
2012 |
Jan
(13) |
Feb
(34) |
Mar
(59) |
Apr
(4) |
May
(13) |
Jun
(1) |
Jul
(1) |
Aug
(1) |
Sep
(3) |
Oct
(2) |
Nov
(4) |
Dec
(1) |
2013 |
Jan
(18) |
Feb
(28) |
Mar
(19) |
Apr
(42) |
May
(43) |
Jun
(41) |
Jul
(41) |
Aug
(31) |
Sep
(6) |
Oct
(2) |
Nov
(2) |
Dec
(70) |
2014 |
Jan
(55) |
Feb
(98) |
Mar
(44) |
Apr
(40) |
May
(15) |
Jun
(18) |
Jul
(20) |
Aug
(1) |
Sep
(13) |
Oct
(3) |
Nov
(37) |
Dec
(85) |
2015 |
Jan
(16) |
Feb
(12) |
Mar
(16) |
Apr
(13) |
May
(16) |
Jun
(3) |
Jul
(23) |
Aug
|
Sep
|
Oct
|
Nov
(9) |
Dec
(2) |
2016 |
Jan
(12) |
Feb
(1) |
Mar
(9) |
Apr
(13) |
May
(4) |
Jun
(5) |
Jul
|
Aug
|
Sep
(10) |
Oct
(11) |
Nov
(1) |
Dec
|
2017 |
Jan
|
Feb
(1) |
Mar
(11) |
Apr
(8) |
May
|
Jun
(6) |
Jul
|
Aug
|
Sep
|
Oct
(3) |
Nov
(2) |
Dec
(1) |
2018 |
Jan
(6) |
Feb
(6) |
Mar
(3) |
Apr
(9) |
May
(3) |
Jun
|
Jul
|
Aug
(3) |
Sep
(8) |
Oct
(1) |
Nov
(1) |
Dec
(4) |
2019 |
Jan
(4) |
Feb
|
Mar
(1) |
Apr
|
May
(2) |
Jun
|
Jul
|
Aug
|
Sep
|
Oct
(2) |
Nov
(1) |
Dec
|
2020 |
Jan
(22) |
Feb
(4) |
Mar
|
Apr
|
May
|
Jun
(1) |
Jul
(2) |
Aug
(2) |
Sep
(1) |
Oct
|
Nov
|
Dec
(1) |
2021 |
Jan
|
Feb
|
Mar
|
Apr
|
May
(1) |
Jun
|
Jul
(2) |
Aug
(2) |
Sep
|
Oct
|
Nov
|
Dec
|
2022 |
Jan
(1) |
Feb
|
Mar
(1) |
Apr
|
May
|
Jun
|
Jul
|
Aug
(2) |
Sep
|
Oct
|
Nov
|
Dec
|
2023 |
Jan
|
Feb
|
Mar
(1) |
Apr
(1) |
May
(5) |
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2024 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
(3) |
Aug
(3) |
Sep
|
Oct
|
Nov
|
Dec
|
From: Joe C. <jwc...@lb...> - 2018-09-12 20:28:59
|
> On Sep 12, 2018, at 1:09 PM, Sofia Robb <so...@so...> wrote: > > Hi Joe and other Chado users, > > Joe, Thanks for your response. I would like to know more about your data. I have a few questions and will follow them up with a dump of my current ideas on how to solve this. I’m managing the backend db for the phytozome project at JGI (phytozome.jgi.doe.gov <http://phytozome.jgi.doe.gov/>), a comparative land plant db. We have ~ 250 plant genomes (assemblies, annotation and analysis results) loaded right now. The size of the db is ~ 1.5T. > > Are you the source of the sequence? We have the land plants sequenced by the JGI, things done by collaborators and other model organisms. It’s roughly an equal mixture of each. > Or are pulling the data from another database? Data import is with fasta files for chromosomes and proteins; gff3 for structure. > What do you do if the actual sequence changes? Do you just overwrite the previous sequence data? I never overwrite or delete. Once it goes in the database, it stays in the database. > > We are going to be the official repository of this data and have been asked to keep track the history of changes. This is more than I have had to keep track of in the past. > > I had been thinking of trying to implement some loading of the data which gets across the idea that each feature has a stable version which is equal to it its current version and any number of older versions. Now this is just an idea (largely based on the representation of data from ensembl). > > The stable version would have a stable id which lacks the '.\d' suffix. And there would be a feature record for each version which includes the '.\d' suffix. I would mark older versions obsolete. What I am still working on in this idea is what I could add as properties (gff 9th column) to help with searches. Perhaps I could add a stableID=xyz in each record? I think this would help with a query, I could search for the stableID and obsolete when I need to retrieve the history of changes? > > feature.uniquename: some_gene.1 > featureprop.cvterm_id: some term that indicates the concept stableID > featureprop.value: some_gene > feature.is_obsolete: true > > > feature.uniquename some_gene.2 > featureprop.cvterm_id: some term that indicates the concept stableID > featureprop.value: some_gene > feature.is_obsolete: false > > feature.uniquename some_gene > featureprop.cvterm_id: some term that indicates the concept stableID > featureprop.value: some_gene > feature.is_obsolete: false How you do this depends a bit on the nature of the reannotations. If you have a fairly stable assembly and annotation then it entirely makes sense to count on there being a stable identifier. In what I have, we often have dramatically different assemblies from one version to another (many of our assemblies do not have pseudo molecules) and we cannot count on stable ids. Your assigning a stable id as a property will work if the changes are not too extensive. But think of the case where 2 genes in 1 version are modified in such a way that 1 gene is split and half is merged into another gene. What rules are you going to use to assign the stable id for the merged gene? An alternative tracking mechanism between versions is to use a feature_relationship. You could keep track of things a bit better with this table if there are extensive merges and splits. For the most part we are not maintaining gene history except in a few of our important genomes. Joe > > > Thank you, > Sofia > > > > On Wed, Sep 12, 2018 at 1:34 PM, Joe Carlson <jwc...@lb... <mailto:jwc...@lb...>> wrote: > For what it’s worth, I’ve been using dbxref’s to track annotation versions. I’ve modified the schema to make dbxref_id in the feature table to be not null, and use a record in the dbxref table to label the source - and version - of the data. > > Appending a numerical identifier to the name means that a query for a particular version will require a VERY expensive sql constraint "and name like ‘%.N’” in the queries. > > Joe > > >> On Sep 12, 2018, at 12:16 PM, Sofia Robb <so...@so... <mailto:so...@so...>> wrote: >> >> Hello All, >> >> I have a question about how others are handling sequence feature versions. I am using Tripal and have posted this question in the Tripal repository Issues as well. >> >> I have a group that is developing gene/mRNA models. They are using an ensembl like system for versioning of gene and transcript id. And they want to maintain a history of previous versions. >> >> They plan on incrementing a digit after the id when a new version is generated. >> >> gene nv2m00005394.1 >> mRNA nv2m00005394.1.mRNA.1 >> >> Chr11 GFF3Conv gene 3598792 3603486 . - . Alias=Sox9;Name=nv2m00005394.1;ID=nv2m00005394.1 >> Chr11 GFF3Conv mRNA 3598792 3603486 . - . ID=nv2m00005394.1.mRNA.1;Parent=nv2m00005394.1 >> How should I handle this? Create a new feature for each version and mark the old one obsolete? How do I make it easy for users to find the correct ID when they don't know there has been an update? I have some ideas, but it would require the geneID and mRNAIDs to have different bases, ie nv2g00005394 (change g->m) for gene and nv2m00005394 for mRNA. >> >> Any advice would be fantastic!!! >> >> Thank you! >> Sofia >> >> _______________________________________________ >> Gmod-schema mailing list >> Gmo...@li... <mailto:Gmo...@li...> >> https://lists.sourceforge.net/lists/listinfo/gmod-schema <https://lists.sourceforge.net/lists/listinfo/gmod-schema> > > |
From: Sofia R. <so...@so...> - 2018-09-12 20:09:41
|
Hi Joe and other Chado users, Joe, Thanks for your response. I would like to know more about your data. I have a few questions and will follow them up with a dump of my current ideas on how to solve this. Are you the source of the sequence? Or are pulling the data from another database? What do you do if the actual sequence changes? Do you just overwrite the previous sequence data? We are going to be the official repository of this data and have been asked to keep track the history of changes. This is more than I have had to keep track of in the past. I had been thinking of trying to implement some loading of the data which gets across the idea that each feature has a stable version which is equal to it its current version and any number of older versions. Now this is just an idea (largely based on the representation of data from ensembl). The stable version would have a stable id which lacks the '.\d' suffix. And there would be a feature record for each version which includes the '.\d' suffix. I would mark older versions obsolete. What I am still working on in this idea is what I could add as properties (gff 9th column) to help with searches. Perhaps I could add a stableID=xyz in each record? I think this would help with a query, I could search for the stableID and obsolete when I need to retrieve the history of changes? feature.uniquename: some_gene.1 featureprop.cvterm_id: some term that indicates the concept stableID featureprop.value: some_gene feature.is_obsolete: true feature.uniquename some_gene.2 featureprop.cvterm_id: some term that indicates the concept stableID featureprop.value: some_gene feature.is_obsolete: false feature.uniquename some_gene featureprop.cvterm_id: some term that indicates the concept stableID featureprop.value: some_gene feature.is_obsolete: false Thank you, Sofia On Wed, Sep 12, 2018 at 1:34 PM, Joe Carlson <jwc...@lb...> wrote: > For what it’s worth, I’ve been using dbxref’s to track annotation > versions. I’ve modified the schema to make dbxref_id in the feature table > to be not null, and use a record in the dbxref table to label the source - > and version - of the data. > > Appending a numerical identifier to the name means that a query for a > particular version will require a VERY expensive sql constraint "and name > like ‘%.N’” in the queries. > > Joe > > > On Sep 12, 2018, at 12:16 PM, Sofia Robb <so...@so...> wrote: > > Hello All, > > I have a question about how others are handling sequence feature versions. > I am using Tripal and have posted this question in the Tripal repository > Issues as well. > > I have a group that is developing gene/mRNA models. They are using an > ensembl like system for versioning of gene and transcript id. And they want > to maintain a history of previous versions. > > They plan on incrementing a digit after the id when a new version is > generated. > > gene nv2m00005394.1 > mRNA nv2m00005394.1.mRNA.1 > > Chr11 GFF3Conv gene 3598792 3603486 . - . Alias=Sox9;Name=nv2m00005394.1;ID=nv2m00005394.1 > Chr11 GFF3Conv mRNA 3598792 3603486 . - . ID=nv2m00005394.1.mRNA.1;Parent=nv2m00005394.1 > > How should I handle this? Create a new feature for each version and mark > the old one obsolete? How do I make it easy for users to find the correct > ID when they don't know there has been an update? I have some ideas, but it > would require the geneID and mRNAIDs to have different bases, ie > nv2g00005394 (change g->m) for gene and nv2m00005394 for mRNA. > > Any advice would be fantastic!!! > Thank you! > Sofia > > _______________________________________________ > Gmod-schema mailing list > Gmo...@li... > https://lists.sourceforge.net/lists/listinfo/gmod-schema > > > |
From: Joe C. <jwc...@lb...> - 2018-09-12 19:35:23
|
For what it’s worth, I’ve been using dbxref’s to track annotation versions. I’ve modified the schema to make dbxref_id in the feature table to be not null, and use a record in the dbxref table to label the source - and version - of the data. Appending a numerical identifier to the name means that a query for a particular version will require a VERY expensive sql constraint "and name like ‘%.N’” in the queries. Joe > On Sep 12, 2018, at 12:16 PM, Sofia Robb <so...@so...> wrote: > > Hello All, > > I have a question about how others are handling sequence feature versions. I am using Tripal and have posted this question in the Tripal repository Issues as well. > > I have a group that is developing gene/mRNA models. They are using an ensembl like system for versioning of gene and transcript id. And they want to maintain a history of previous versions. > > They plan on incrementing a digit after the id when a new version is generated. > > gene nv2m00005394.1 > mRNA nv2m00005394.1.mRNA.1 > > Chr11 GFF3Conv gene 3598792 3603486 . - . Alias=Sox9;Name=nv2m00005394.1;ID=nv2m00005394.1 > Chr11 GFF3Conv mRNA 3598792 3603486 . - . ID=nv2m00005394.1.mRNA.1;Parent=nv2m00005394.1 > How should I handle this? Create a new feature for each version and mark the old one obsolete? How do I make it easy for users to find the correct ID when they don't know there has been an update? I have some ideas, but it would require the geneID and mRNAIDs to have different bases, ie nv2g00005394 (change g->m) for gene and nv2m00005394 for mRNA. > > Any advice would be fantastic!!! > > Thank you! > Sofia > > _______________________________________________ > Gmod-schema mailing list > Gmo...@li... > https://lists.sourceforge.net/lists/listinfo/gmod-schema |
From: Sofia R. <so...@so...> - 2018-09-12 19:16:24
|
Hello All, I have a question about how others are handling sequence feature versions. I am using Tripal and have posted this question in the Tripal repository Issues as well. I have a group that is developing gene/mRNA models. They are using an ensembl like system for versioning of gene and transcript id. And they want to maintain a history of previous versions. They plan on incrementing a digit after the id when a new version is generated. gene nv2m00005394.1 mRNA nv2m00005394.1.mRNA.1 Chr11 GFF3Conv gene 3598792 3603486 . - . Alias=Sox9;Name=nv2m00005394.1;ID=nv2m00005394.1 Chr11 GFF3Conv mRNA 3598792 3603486 . - . ID=nv2m00005394.1.mRNA.1;Parent=nv2m00005394.1 How should I handle this? Create a new feature for each version and mark the old one obsolete? How do I make it easy for users to find the correct ID when they don't know there has been an update? I have some ideas, but it would require the geneID and mRNAIDs to have different bases, ie nv2g00005394 (change g->m) for gene and nv2m00005394 for mRNA. Any advice would be fantastic!!! Thank you! Sofia |
From: Scott C. <sc...@sc...> - 2018-08-11 02:47:15
|
Hi Blake, I'm not sure there is a standard way of dealing with this, but I'm not clear what feature that featureprop would be hanging off of, if you see my meaning. At WormBase, potential sequencing errors have features of their own, so that information can be hung of them as properties (publications, curator comments, etc). In GFF it looks like this: I RNASeq possible_base_call_error 369894 369894 . + . Name=WBsf899478 Not that "possible_base_call_error" is a valid SO term that is a child of sequence_feature, so is valid to use in GFF and in the feature.type_id column in Chado. Given that there is a SO term for this situation, perhaps this is the closest thing to a standard for this situation. See http://www.sequenceontology.org/browser/current_svn/term/SO:0000701 for the term. Note that "base_call_error_correction" is also a SO term for identifying where you've made corrections, so that you can fix the sequence and hang properties of that too. Scott On Fri, Aug 10, 2018 at 3:39 PM, Inderski, Blake - ARS < Bla...@ar...> wrote: > > GMOD Schema/Chado community, > > > > I’ve noticed sequences that contain high dissimilarity at the terminal end/s. What’s considered best practice for labeling data that contains sequencing error? My best guess is to create a featureprop entry. > > > > Thanks, > > Blake > > > > > > This electronic message contains information generated by the USDA solely for the intended recipients. Any unauthorized interception of this message or the use or disclosure of the information it contains may violate the law and subject the violator to civil or criminal penalties. If you believe you have received this message in error, please notify the sender and delete the email immediately. > > ------------------------------------------------------------------------------ > Check out the vibrant tech community on one of the world's most > engaging tech sites, Slashdot.org! http://sdm.link/slashdot > _______________________________________________ > Gmod-schema mailing list > Gmo...@li... > https://lists.sourceforge.net/lists/listinfo/gmod-schema > -- ------------------------------------------------------------------------ Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research |
From: Inderski, B. - A. <Blake.Inderski@ARS.USDA.GOV> - 2018-08-10 19:39:42
|
GMOD Schema/Chado community, I’ve noticed sequences that contain high dissimilarity at the terminal end/s. What’s considered best practice for labeling data that contains sequencing error? My best guess is to create a featureprop entry. Thanks, Blake This electronic message contains information generated by the USDA solely for the intended recipients. Any unauthorized interception of this message or the use or disclosure of the information it contains may violate the law and subject the violator to civil or criminal penalties. If you believe you have received this message in error, please notify the sender and delete the email immediately. |
From: Scott C. <sc...@sc...> - 2018-08-03 20:47:38
|
Hi All, So I'm thinking having Chado 1.4 out for PAG would be a good idea, and since it feels like I devote about a day month to thinking about Chado, it's about time to really get started. So, starting with the current status, there are 4 pull requests outstanding: https://github.com/GMOD/Chado/pulls (I just pulled a few others that I was pretty confident in.) These four seem pretty reasonable to me, but I'll leave them out for a little while, as we work through the issues. There are 24 open issues with the tag "Chado 1.4 suggestion": https://github.com/GMOD/Chado/issues?q=is%3Aissue+is%3Aopen+label%3A%22Chado+1.4+Suggestion%22 most of which have several comments and several have some code. My primary tasks for the 1.4 release is rewriting the build and installation code. I think the installation code will remain Makefile.PL-based, but it could certainly be modernized. The build code will probably require a fair amount of reworking so that we can get away from the perl-Tk code which is quite the albatross when it comes to building releases. So, does anybody have any other issues that should be considered for a 1.4 release? Thanks, Scott -- ------------------------------------------------------------------------ Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research |
From: Bradford C. <bra...@gm...> - 2018-05-04 17:46:47
|
Hi Scott, earache’s chado container is great. Our group also has Tripal 2 and Tripal 3 containers for chado: https://github.com/statonlab/docker-containers <https://github.com/statonlab/docker-containers> We also have a command line docker management tool https://github.com/statonlab/tripaldock <https://github.com/statonlab/tripaldock> for those who find docker confusing to set up. These images dont contain JBrowse or Apollo, sorry. > Also, do you have any suggestions for what else should be included? I figured I would put in some yeast sample data, but having some sample data that could be edited by Apollo and be integrated into Tripal would be really nice. Check out https://github.com/statonlab/tripal_dev_mini_dataset <https://github.com/statonlab/tripal_dev_mini_dataset> The idea is to provide a truncated dataset with all of the data one might want to load into Chado. I have detailed instructions on how to load everything into a Tripal 3 site. In theory im also keeping SQL dumps there but im not great about that, I might switch to using the tripal python API to write a script that adds everything. Bradford Condon Postdoctoral Scholar, University of Tennessee Knoxville www.bradfordcondon.com > On May 4, 2018, at 11:39 AM, Josh Goodman <jog...@in...> wrote: > > > These are a little out of date, but might be a good starting point. > > https://hub.docker.com/r/erasche/chado/ <https://hub.docker.com/r/erasche/chado/> > https://hub.docker.com/r/jbrowse/gmod-jbrowse/ <https://hub.docker.com/r/jbrowse/gmod-jbrowse/> > > I've worked with the JBrowse image but not the Chado one. > > I'm happy to help if you would like some assistance. > > Josh > > > On Fri, May 4, 2018 at 10:35 AM Scott Cain <sc...@sc... <mailto:sc...@sc...>> wrote: > Hi All, > > I'm thinking about creating a new GMOD in the Cloud to replace the now very old one. Since technology has advanced quite a bit since I last did this, I'm thinking I might be able to use docker to do it all. This instance would contain Chado, Tripal, JBrowse and Apollo. Are there "canonical" docker images for all of these? I assume I'll probably want to modify them to work together in a nice way, but having a good starting point would be nice. > > Also, do you have any suggestions for what else should be included? I figured I would put in some yeast sample data, but having some sample data that could be edited by Apollo and be integrated into Tripal would be really nice. > > Thanks, > Scott > > > -- > ------------------------------------------------------------------------ > Scott Cain, Ph. D. scott at scottcain dot net > GMOD Coordinator (http://gmod.org/ <http://gmod.org/>) 216-392-3087 <tel:(216)%20392-3087> > Ontario Institute for Cancer Research > ------------------------------------------------------------------------------ > Check out the vibrant tech community on one of the world's most > engaging tech sites, Slashdot.org! http://sdm.link/slashdot_______________________________________________ > Gmod-schema mailing list > Gmo...@li... > https://lists.sourceforge.net/lists/listinfo/gmod-schema |
From: Josh G. <jog...@in...> - 2018-05-04 16:15:37
|
These are a little out of date, but might be a good starting point. https://hub.docker.com/r/erasche/chado/ https://hub.docker.com/r/jbrowse/gmod-jbrowse/ I've worked with the JBrowse image but not the Chado one. I'm happy to help if you would like some assistance. Josh On Fri, May 4, 2018 at 10:35 AM Scott Cain <sc...@sc...> wrote: > Hi All, > > I'm thinking about creating a new GMOD in the Cloud to replace the now > very old one. Since technology has advanced quite a bit since I last did > this, I'm thinking I might be able to use docker to do it all. This > instance would contain Chado, Tripal, JBrowse and Apollo. Are there > "canonical" docker images for all of these? I assume I'll probably want to > modify them to work together in a nice way, but having a good starting > point would be nice. > > Also, do you have any suggestions for what else should be included? I > figured I would put in some yeast sample data, but having some sample data > that could be edited by Apollo and be integrated into Tripal would be > really nice. > > Thanks, > Scott > > > -- > ------------------------------------------------------------------------ > Scott Cain, Ph. D. scott at scottcain > dot net > GMOD Coordinator (http://gmod.org/) 216-392-3087 > <(216)%20392-3087> > Ontario Institute for Cancer Research > |
From: Scott C. <sc...@sc...> - 2018-05-04 14:35:44
|
Hi All, I'm thinking about creating a new GMOD in the Cloud to replace the now very old one. Since technology has advanced quite a bit since I last did this, I'm thinking I might be able to use docker to do it all. This instance would contain Chado, Tripal, JBrowse and Apollo. Are there "canonical" docker images for all of these? I assume I'll probably want to modify them to work together in a nice way, but having a good starting point would be nice. Also, do you have any suggestions for what else should be included? I figured I would put in some yeast sample data, but having some sample data that could be edited by Apollo and be integrated into Tripal would be really nice. Thanks, Scott -- ------------------------------------------------------------------------ Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research |
From: Bradford C. <bra...@gm...> - 2018-04-27 16:56:20
|
Hi everyone, friendly reminder our chado groups discussion is starting soon, 1 pm Eastern time, Friday April 27th The google doc is available here: https://docs.google.com/document/d/1J3KVdavfjEXqeZ6tBR_5yDDEaZzSJF9VqM086SEC2wM/edit# <https://docs.google.com/document/d/1J3KVdavfjEXqeZ6tBR_5yDDEaZzSJF9VqM086SEC2wM/edit#> Please find the Zoom meeting details below. Topic: Chado featre groups Time: Apr 27, 2018 1 PM Eastern Time (US and Canada) Join from PC, Mac, Linux, iOS or Android: https://utia.zoom.us/j/338953789 <https://utia.zoom.us/j/338953789> Or iPhone one-tap : US: +16699006833,,338953789# or +14086380968,,338953789# Or Telephone: Dial(for higher quality, dial a number based on your current location): US: +1 669 900 6833 or +1 408 638 0968 or +1 646 876 9923 Meeting ID: 338 953 789 International numbers available: https://zoom.us/u/TzbaY80G <https://zoom.us/u/TzbaY80G> Or an H.323/SIP room system: H.323: 162.255.37.11 (US West) 162.255.36.11 (US East) 221.122.88.195 (China) 115.114.131.7 (India) 213.19.144.110 (EMEA) 202.177.207.158 (Australia) 209.9.211.110 (Hong Kong) 64.211.144.160 (Brazil) 69.174.57.160 (Canada) Meeting ID: 338 953 789 SIP: 338...@zo... <mailto:338...@zo...> Bradford Condon Postdoctoral Scholar, University of Tennessee Knoxville www.bradfordcondon.com |
From: Bradford C. <bra...@gm...> - 2018-04-24 17:47:28
|
Hi all, Looking forward to discussing grouping features this Friday, The google doc for this meeting can be found here: https://docs.google.com/document/d/1J3KVdavfjEXqeZ6tBR_5yDDEaZzSJF9VqM086SEC2wM/edit <https://docs.google.com/document/d/1J3KVdavfjEXqeZ6tBR_5yDDEaZzSJF9VqM086SEC2wM/edit> Please find the Zoom meeting details below. Topic: Chado featre groups Time: Apr 27, 2018 1 PM Eastern Time (US and Canada) Join from PC, Mac, Linux, iOS or Android: https://utia.zoom.us/j/338953789 Or iPhone one-tap : US: +16699006833,,338953789# or +14086380968,,338953789# Or Telephone: Dial(for higher quality, dial a number based on your current location): US: +1 669 900 6833 or +1 408 638 0968 or +1 646 876 9923 Meeting ID: 338 953 789 International numbers available: https://zoom.us/u/TzbaY80G Or an H.323/SIP room system: H.323: 162.255.37.11 (US West) 162.255.36.11 (US East) 221.122.88.195 (China) 115.114.131.7 (India) 213.19.144.110 (EMEA) 202.177.207.158 (Australia) 209.9.211.110 (Hong Kong) 64.211.144.160 (Brazil) 69.174.57.160 (Canada) Meeting ID: 338 953 789 SIP: 338...@zo... Cheers Bradford Bradford Condon Postdoctoral Scholar, University of Tennessee Knoxville www.bradfordcondon.com |
From: Bradford C. <bra...@gm...> - 2018-04-20 13:38:39
|
Hi everyone, I apologize for not including the time zone. 1PM eastern <http://www.thetimezoneconverter.com/?t=1pm&tz=Eastern%20Daylight%20Time%20(EDT)&>. Bradford Condon Postdoctoral Scholar, University of Tennessee Knoxville www.bradfordcondon.com > On Apr 20, 2018, at 9:34 AM, Surya Saha <ss...@co...> wrote: > > Hi Bradford, > > What is the time zone for the meeting at 1pm on Friday April 27th. Thanks > > -Surya > > > On Fri, Apr 20, 2018 at 8:36 AM, Bradford Condon <bra...@gm... <mailto:bra...@gm...>> wrote: > Hi all, > > thank you for participating in the doodle poll. No time worked for everyone, but let’s meet at 1pm Friday April 27th. I’ll send out connection information and a google document summary soon. > > Thank you > > Bradford > > > > Bradford Condon > Postdoctoral Scholar, University of Tennessee Knoxville > www.bradfordcondon.com <http://www.bradfordcondon.com/> > > > > > > ------------------------------------------------------------------------------ > Check out the vibrant tech community on one of the world's most > engaging tech sites, Slashdot.org! http://sdm.link/slashdot <http://sdm.link/slashdot> > _______________________________________________ > Gmod-schema mailing list > Gmo...@li... <mailto:Gmo...@li...> > https://lists.sourceforge.net/lists/listinfo/gmod-schema <https://lists.sourceforge.net/lists/listinfo/gmod-schema> > > > > > -- > > Surya Saha > Sol Genomics Network > Boyce Thompson Institute, Ithaca, NY, USA > https://citrusgreening.org/ <http://www.linkedin.com/in/suryasaha> > http://www.linkedin.com/in/suryasaha <http://www.linkedin.com/in/suryasaha> > https://twitter.com/SahaSurya <https://twitter.com/SahaSurya> |
From: Bradford C. <bra...@gm...> - 2018-04-20 12:36:26
|
Hi all, thank you for participating in the doodle poll. No time worked for everyone, but let’s meet at 1pm Friday April 27th. I’ll send out connection information and a google document summary soon. Thank you Bradford Bradford Condon Postdoctoral Scholar, University of Tennessee Knoxville www.bradfordcondon.com |
From: Adhemar <az...@gm...> - 2018-04-18 20:07:08
|
We're working on it and will let you know when it becomes available. Here's a screenshot: https://raw.githubusercontent.com/lmb-embrapa/machado/master/static/screenshot.png |
From: Scott C. <sc...@sc...> - 2018-04-18 19:46:26
|
Hi Adhemar, That's pretty cool. Do you have (or plan on having) a publicly available demo that people could try out? Having a rest interface could potentially be very useful. Thanks, Scott On Wed, Apr 18, 2018 at 3:12 PM, Adhemar <az...@gm...> wrote: > Hello Scott, > > Last year, we've started to develop a Django framework for Chado using > this reference: http://gmod.org/wiki/Chado_Django_HOWTO > The code is finally publicly available at https://github.com/lmb-embrapa > /machado > > Though the software is in its early stages, there are already tools for > loading ontology, Fasta, GFF, Blast, organisms, taxonomy, and publication > files. We've set up a REST framework and there are a few APIs ready. > > We would like to let the GMOD community know about this endeavor and have > feedback. > > Cheers, > Adhemar > > ------------------------------------------------------------ > ------------------ > Check out the vibrant tech community on one of the world's most > engaging tech sites, Slashdot.org! http://sdm.link/slashdot > _______________________________________________ > Gmod-schema mailing list > Gmo...@li... > https://lists.sourceforge.net/lists/listinfo/gmod-schema > > -- ------------------------------------------------------------------------ Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research |
From: Adhemar <az...@gm...> - 2018-04-18 19:13:09
|
Hello Scott, Last year, we've started to develop a Django framework for Chado using this reference: http://gmod.org/wiki/Chado_Django_HOWTO The code is finally publicly available at https://github.com/lmb- embrapa/machado Though the software is in its early stages, there are already tools for loading ontology, Fasta, GFF, Blast, organisms, taxonomy, and publication files. We've set up a REST framework and there are a few APIs ready. We would like to let the GMOD community know about this endeavor and have feedback. Cheers, Adhemar |
From: Bradford C. <bra...@gm...> - 2018-04-16 15:31:10
|
Hi everyone, We in the Tripal community would really like to come up with a simple group module (at least for features) so we can standardize across sites. We would hope to include this module in the next Chado release so it can be used throughout the GMOD Chado community. I’ve attached a doodle poll below to help us pick a time to discuss sometime, between this Friday (April 20) and the next (April 27). Times are in EST. If you’d like to take part please indicate your availability by Wednesday April 18th 12pm EST. https://doodle.com/poll/x5pkgebgu5xzt4bp Thank you Bradford Bradford Condon Postdoctoral Scholar, University of Tennessee Knoxville www.bradfordcondon.com |
From: Scott C. <sc...@sc...> - 2018-04-02 16:08:56
|
---------- Forwarded message ---------- From: Iddo Friedberg <id...@gm...> Date: Mon, Apr 2, 2018 at 10:19 AM Subject: [isb-biocuration] Function @ISMB 2018: Abstract Deadline this week To: isb...@go... Call for Abstracts: Function Special Interest Group at ISMB 2018 We call upon all researchers involved in the computational study of macromolecular function to submit an abstract to the Function-SIG meeting. Authors of selected abstracts will be invited to give a talk and/or present a poster. Travel fellowships will be available for qualifying graduate students and postdoctoral trainees. Time and place: Chicago, IL, USA July 6, 2018 ISMB 2018 home page: https://www.iscb.org/ismb2018 Key Dates: *Thursday April 5, 2018* Abstract Submission Deadline *April 26, 2018*: Notification of Acceptance *July 7, 2018*: Function SIG at ISMB/ECCB 2018 Abstract submission: http://biofunctionprediction.org/content/submit-abstract Sequence and structure genomics have generated a wealth of data, but extracting meaningful information from genomic data is challenging. Both the number and the diversity of discovered sequences are increasing. In addition, there is a need for standardized annotation that could be incorporated into functional annotation on a large scale. Finally, there is a need to assess the quality of the function prediction algorithms and software. For these reasons and many more, automated protein function prediction is of interest to computational biologists in academia and industry. The Function-SIG meetings offer researchers and students in the field to meet with like-minded colleagues, learn about the latest developments in the field, and forge new collaborations. More on the Function SIG: http://biofu nctionprediction.org/ We will also present the more results from the third round of the Critical Assessment of Functional Annotation (CAFA3) and from the additional CAFA3.14 (CAF-Pi). CAFA is an international computational challenge funded by the NSF and BBSRC which aims to asses the accuracy of protein function prediction methods which is drawing dozens of participating groups worldwide. More on CAFA: http://biofunctionprediction.org/cafa/ -- Iddo Friedberg http://iddo-friedberg.net/contact.html ++++++++++[>+++>++++++>++++++++>++++++++++>+++++++++++<<<<<-]>>>>++++.> ++++++..----.<<<<++++++++++++++++++++++++++++.-----------..>>>+.-----. .>-.<<<<--.>>>++.>+++.<+++.----.-.<++++++++++++++++++.>+.>.<++.<<<+.>> >>----.<--.>++++++.<<<<------------------------------------. -- You received this message because you are subscribed to the Google Groups "International Society for Biocuration" group. To unsubscribe from this group and stop receiving emails from it, send an email to isb...@go.... To post to this group, send email to isb...@go.... To view this discussion on the web visit https://groups.google.com/d/ msgid/isb-biocuration/CABm4-MRzXiWia5Ud7eVAXO63KyCz0xL8yvA QeUZPOQQ4Yafaaw%40mail.gmail.com <https://groups.google.com/d/msgid/isb-biocuration/CABm4-MRzXiWia5Ud7eVAXO63KyCz0xL8yvAQeUZPOQQ4Yafaaw%40mail.gmail.com?utm_medium=email&utm_source=footer> . For more options, visit https://groups.google.com/d/optout. -- ------------------------------------------------------------------------ Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research |
From: Bradford C. <bra...@gm...> - 2018-03-26 12:49:29
|
Hello Scott, I’m writing to propose we have a committee meeting to discuss a group module for Chado 1.4 Please let me know what you think. I’ve been having discussions with many groups in the context of adding comparative genomic data to our Tripal site and how to store those groupings in Chado. It’s clear there’s a need in the community for this, as others have added groups as feature properties, or as a new feature type. Neither of these approaches is perfect. Stephen has proposed adding a simple groups module in the model of the property tables. I think this is a good idea. There’s a definite need for groups, and it’s in the community’s best interest when a standard approach is taken. Thank you Bradford Bradford Condon Postdoctoral Scholar, University of Tennessee Knoxville www.bradfordcondon.com |
From: Scott C. <sc...@sc...> - 2018-03-20 14:24:57
|
Hi Nicole, I just looked at the Sequence Ontology website, and it seems that ARS_consensus_sequence is in SO (http://sequenceontology.org/browser/current_svn/term/SO:0002004), so I'm wondering if it made it into your database or if something went wrong with the ontology load. Perhaps try this query to see: SELECT * FROM cvterm WHERE name = 'ARS_consensus_sequence' to check. About the 'unknown' thing: your original email said there where "scaffolds" with an ID of "unknown". Does that mean more than one scaffold with the same ID? That would likely be a problem, and off the top of my head, I don't even remember how the loader would deal with it (I feel like it should throw and error and die, but it might do something silly like "uniquify" the ID). Scott On Tue, Mar 20, 2018 at 9:53 AM, Nichole Wespe <nic...@gm...> wrote: > Hi Scott, > > Thank you very much for your help. Since I am just using the SGD gff to > build an example database to see if this structure will fit my company's > needs, I deleted the "dbxref=NCBI:" occurrences from the gff file. The next > error I got is "MSG: no cvterm for ARS_consensus_sequence." This term > appears in the third column of the gff for sequences located within ARS. > > I had loaded the first four ontologies ([1] Relationship *Ontology* [2] > Sequence *Ontology* [3] Gene *Ontology* [4] *Chado* Feature Properties) when > running "make ontologies." I assume that the cvterm is missing because it > is not included in one of these ontologies. Is this a correct > understanding? How can I fix this to enable the file to load into the > database? > > I also had another question in my first email about the error I get when > loading a custom gff file derived from a draft de novo assembly. The > error I receive is “Unable to find srcfeature unknown in the database.” The > GFF file has the scaffolds labeled with ID=unknown, Name=unknown. Can you > tell me exactly what the “srcfeature unknown” is referring to? > > Thank you! > Nichole > > On Mon, Mar 19, 2018 at 9:46 PM, Scott Cain <sc...@sc...> wrote: > >> Hi Nichole, >> >> I cc'ed schema mailing list where Chado issues are discussed. >> >> Usually, SGD's GFF is very good, but looking at what they currently have, >> I'd say they have a bug in their production pipeline. The "dbxref=NCBI:" >> thing is weird for two reasons: 1) the d in dbxref should be capitalized, >> and 2) the NCBI: prefix should be followed by some sort of identifier. >> What you should do depends on what you want to do with the data: if this is >> a test data set, then I would just delete the dbxref entries from the GFF >> file. The only occur on chromosome lines, so there aren't that many of >> them. On the other hand, if the data are important to you, I'd suggest >> that you complain to the people at SGD about the problem with their GFF. >> >> Scott >> >> >> >> On Mon, Mar 19, 2018 at 6:14 PM, Nichole Wespe <nic...@gm...> >> wrote: >> > >> > Hello, >> > >> > >> > >> > I am trying to load data into a chado database from a GFF3 file. >> > >> > >> > >> > I first attempted to load a personal GFF file created from a draft de >> novo assembly and annotation, with the following command: >> > >> > $ perl /usr/local/bin/gmod_bulk_load_gff3.pl --dbname testdb >> --analysis --organism Phaeosphaeria_avenaria --gfffile P_avenaria.gff >> > >> > >> > >> > The error I receive is “Unable to find srcfeature unknown in the >> database.” The GFF file has the scaffolds labeled with ID=unknown, >> Name=unknown. Can you tell me exactly what the “srcfeature unknown” is >> referring to? >> > >> > >> > >> > >> > >> > Then I decided that I would like to load an example GFF file to see >> what a working one looks like. I am following the instructions here ( >> http://gmod.org/wiki/Load_GFF_Into_Chado) using the sample gff file for >> Saccharomyces cerevisiae linked to on the page. >> > >> > >> > >> > I added the organism to the database as follows: >> > >> > testdb=> INSERT INTO organism (abbreviation, genus, species, >> common_name) values ('S. cerevisiae', 'Saccharomyces', 'cerevisiae', >> 'budding_yeast'); >> > >> > >> > >> > When I run this code: >> > >> > $ perl /usr/local/bin/gmod_gff3_preprocessor.pl --gfffile >> saccharomyces_cerevisiae.gff --outfile saccharomyces_cerevisiae.sorte >> d.gff >> > >> > $ perl /usr/local/bin/gmod_bulk_load_gff3.pl --organism budding_yeast >> --gfffile saccharomyces_cerevisiae.gff.sorted >> > >> > >> > >> > I get this error: >> > >> > ------------- EXCEPTION: Bio::Root::Exception ------------- >> > >> > MSG: Error in line: >> > >> > chrXIV SGD chromosome 1 784333 . >> . . ID=chrXIV;dbxref=NCBI:;Name=chrXIV >> > >> > >> > >> > Dbxref value 'NCBI:' did not conform to GFF3 specification >> > >> > STACK: Error::throw >> > >> > STACK: Bio::Root::Root::throw /usr/local/share/perl/5.22.1/B >> io/Root/Root.pm:486 >> > >> > STACK: Bio::FeatureIO::gff::_handle_feature >> /usr/local/share/perl/5.22.1/Bio/FeatureIO/gff.pm:659 >> > >> > STACK: Bio::FeatureIO::gff::next_feature /usr/local/share/perl/5.22.1/B >> io/FeatureIO/gff.pm:187 >> > >> > STACK: /usr/local/bin/gmod_bulk_load_gff3.pl:772 >> > >> > ----------------------------------------------------------- >> > >> > >> > >> > Can you please help me fix this error, or point me to an example GFF >> file that will work? >> > >> > >> > >> > Thank you very much, >> > >> > Nichole >> > >> > >> >> >> >> -- >> ------------------------------------------------------------------------ >> Scott Cain, Ph. D. scott at scottcain >> dot net >> GMOD Coordinator (http://gmod.org/) 216-392-3087 >> <(216)%20392-3087> >> Ontario Institute for Cancer Research >> > > -- ------------------------------------------------------------------------ Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research |
From: Scott C. <sc...@sc...> - 2018-03-20 01:47:03
|
Hi Nichole, I cc'ed schema mailing list where Chado issues are discussed. Usually, SGD's GFF is very good, but looking at what they currently have, I'd say they have a bug in their production pipeline. The "dbxref=NCBI:" thing is weird for two reasons: 1) the d in dbxref should be capitalized, and 2) the NCBI: prefix should be followed by some sort of identifier. What you should do depends on what you want to do with the data: if this is a test data set, then I would just delete the dbxref entries from the GFF file. The only occur on chromosome lines, so there aren't that many of them. On the other hand, if the data are important to you, I'd suggest that you complain to the people at SGD about the problem with their GFF. Scott On Mon, Mar 19, 2018 at 6:14 PM, Nichole Wespe <nic...@gm...> wrote: > > Hello, > > > > I am trying to load data into a chado database from a GFF3 file. > > > > I first attempted to load a personal GFF file created from a draft de novo assembly and annotation, with the following command: > > $ perl /usr/local/bin/gmod_bulk_load_gff3.pl --dbname testdb --analysis --organism Phaeosphaeria_avenaria --gfffile P_avenaria.gff > > > > The error I receive is “Unable to find srcfeature unknown in the database.” The GFF file has the scaffolds labeled with ID=unknown, Name=unknown. Can you tell me exactly what the “srcfeature unknown” is referring to? > > > > > > Then I decided that I would like to load an example GFF file to see what a working one looks like. I am following the instructions here ( http://gmod.org/wiki/Load_GFF_Into_Chado) using the sample gff file for Saccharomyces cerevisiae linked to on the page. > > > > I added the organism to the database as follows: > > testdb=> INSERT INTO organism (abbreviation, genus, species, common_name) values ('S. cerevisiae', 'Saccharomyces', 'cerevisiae', 'budding_yeast'); > > > > When I run this code: > > $ perl /usr/local/bin/gmod_gff3_preprocessor.pl --gfffile saccharomyces_cerevisiae.gff --outfile saccharomyces_cerevisiae.sorted.gff > > $ perl /usr/local/bin/gmod_bulk_load_gff3.pl --organism budding_yeast --gfffile saccharomyces_cerevisiae.gff.sorted > > > > I get this error: > > ------------- EXCEPTION: Bio::Root::Exception ------------- > > MSG: Error in line: > > chrXIV SGD chromosome 1 784333 . . . ID=chrXIV;dbxref=NCBI:;Name=chrXIV > > > > Dbxref value 'NCBI:' did not conform to GFF3 specification > > STACK: Error::throw > > STACK: Bio::Root::Root::throw /usr/local/share/perl/5.22.1/Bio/Root/Root.pm:486 > > STACK: Bio::FeatureIO::gff::_handle_feature /usr/local/share/perl/5.22.1/Bio/FeatureIO/gff.pm:659 > > STACK: Bio::FeatureIO::gff::next_feature /usr/local/share/perl/5.22.1/Bio/FeatureIO/gff.pm:187 > > STACK: /usr/local/bin/gmod_bulk_load_gff3.pl:772 > > ----------------------------------------------------------- > > > > Can you please help me fix this error, or point me to an example GFF file that will work? > > > > Thank you very much, > > Nichole > > -- ------------------------------------------------------------------------ Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research |
From: Scott C. <sc...@sc...> - 2018-02-23 17:57:31
|
*GCCBOSC 2018: A Bioinformatics Community Conference* *Call for Abstracts * * Dates*: June 25-30, 2018 * Location*: Reed College, Portland, OR * GCCBOSC website*: https://gccbosc2018.sched.com/ * BOSC website: *https://www.open-bio.org/wiki/BOSC_2018 * Email BOSC organizers*: b <bo...@op...>os...@op... * BOSC announcements mailing list*: http://lists.open-bio. org/mailman/listinfo/bosc-announce * Twitter*: @OBF_BOSC <https://twitter.com/OBF_BOSC>, #GCCBOSC *Important Dates* - *Abstract submission <https://www.open-bio.org/wiki/BOSC_Abstract_Submission> deadline: March 16, 2018* - Authors notified: April 10, 2018 - Travel fellowship <https://github.com/OBF/obf-docs/blob/master/Travel_fellowships.md> application deadline: April 15, 2017 - GCCBOSC 2018 Training: June 25-26, 2018 - GCCBOSC 2018 Talks: June 27-28 - GCCBOSC CollaborationFest: June 29-30 *About BOSC* Since 2000, the yearly Bioinformatics Open Source Conference (BOSC) has provided a forum for developers and users to interact and share research results and ideas in open source bioinformatics. BOSC’s broad spectrum of topics includes practical techniques for solving bioinformatics problems; software development practices; standards and ontologies; approaches that promote open science and sharing of data, results and software; and ways to grow open source communities while promoting diversity within them. *Why is BOSC partnering with GCC in 2018?* In past years, BOSC has been part of the ISMB conference. Because of our continuing focus on broadening and deepening the BOSC community, we've been exploring ways to reach those in the bioinformatics community who aren’t already part of the audience attracted by ISMB. As part of that exploration, we have looked at other organizations and conferences that have been successful at establishing a strong and growing community of participants, such as the Galaxy Community Conference (GCC). After much discussion and planning, we decided to hold BOSC in conjunction with GCC in 2018. We hope that this will be an enjoyable and productive experience for all participants, and we welcome your feedback before, during and after the event. As always, BOSC 2018 will include two days of talks and posters, two keynote speakers <https://galaxyproject.org/events/gccbosc2018/keynotes/>, a panel discussion, Birds of a Feather, and more. BOSC sessions will run in parallel with GCC 2018 sessions, with some sessions shared. The two days of talks will be preceded by two days of training <https://galaxyproject.org/events/gccbosc2018/training/> on topics nominated by the community, and will be followed by a two-day CollaborationFest that merges BOSC's Codefest and Galaxy's Developer and User Hackathon Days. *Abstract submission* We encourage you to submit one-page abstracts (due March 16) on any topic relevant to open source bioinformatics or open science. After review, some abstracts will be selected for lightning talks, longer talks, demos and/or posters. Abstract submission instructions and a link to the EasyChair submission portal can be found on https://www.open-bio.org/ wiki/BOSC_Abstract_Submission *BOSC session topics include* (but are not limited to): - Open Science and Reproducible Research - Open Biomedical Data - Citizen/Participatory Science - Standards and Interoperability - Data Science - Workflows - Visualization - Medical and Translational Bioinformatics - Developer Tools and Libraries - Bioinformatics Open Source Project Progress Reports We look forward to receiving your abstract and meeting you at GCCBOSC 2018! Sincerely, BOSC 2018 Organizing Committee: Nomi Harris (chair), Heather Wiencko (co-chair), Brad Chapman (co-chair), Peter Cock, Christopher Fields, Bastian Greshake, Karsten Hokamp, Hilmar Lapp, Monica Munoz-Torres P.S. Don't forget to submit your BOSC abstract by March 16 at https://www.open-bio.org/wiki/BOSC_Abstract_Submission! Please share this announcement with your colleagues! -- ------------------------------------------------------------------------ Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research |
From: Stephen F. <spf...@gm...> - 2018-02-16 00:00:58
|
Hi Bradford, Related to the dbxref option... The dbxref is meant to store accessions for an "object" or "entity", not be the entity. So, I would avoid using the dbxref entry to be the sole representation of an orthologous group. Related to the feature table option.... You want to organize features into ortholgous groups, so I agree it doesn't make sense to add a feature record to represent a group. A group isn't a feature but a relationship between features. Moreover, I think using the feature_relationship table would become problematic too. With the feature_relationship table, you could associate orthologs with one another but you'd have to do that on a pair-wise basis and have a relationship of 'SO:ortholous_to' for every gene in the group with every other gene. That seems a bit overkill. Related to the featureprop table I would agree that it doesn't really provide a "grouping" and makes it problematic if you do want to create your own dbxrefs for your orthologous groups. Related to the group module... I think the challenge with it is that it is a bit complex to put data into. But there are several other issues listed on the Group module page (http://gmod.org/wiki/Chado_Group_Module). In summary, I honestly don't think there's a good way to store orthogous groups in Chado the way it is now. Perhaps someone else may think otherwise... and I'd be happy to be corrected. So rather than a group module, what if we propose to add a set of "group" tables that span all modules, similar to the "relationship" tables that span all modules. Here's an example: Table Name: feature_group: a table just for group features. Fields: feature_group_id (PK); type_id (FK); name; description; and dbxref_id (FK) Table Name: feature_group_feature: groups features. Fields: feature_group_feature_id (PK), feature_group_id (FK); feature_id (FK). We can copy that structure for stock, organism, libraries, etc and allow us to make any type of groups of records within the same tables. It wouldn't have the same power as the group module, but would allow us to at least make simple groupings which is very much needed in Chado in some form. Just my two cents... Stephen On 2/13/2018 6:48 AM, Bradford Condon wrote: > Hi all, > > I am working on an ortholog module that will read in OrthoFinder > output and store it in Chado. > > I’ve read the discussion on the Group Module > <http://gmod.org/wiki/Chado_Group_Module> but after reaching out to > the authors, they chose to implement ortholog groups for features > using feature_ > <https://laceysanderson.github.io/chado-docs/tables/feature_dbxref.html>dbxref > <https://laceysanderson.github.io/chado-docs/tables/feature_dbxref.html>. > > Do people have recommendations they’d like to share for ortholog > groups? I’ve heard three general solutions: > > Group module - This solution appealed to me but I’m gathering it has > issues and may just be too complicated if feature groups are the only > goal. > > dbxref - You create a db with your families, and assign each family an > accession in the db. This works well if you’re using established > groups hosted on say photozome, but what if i can’t link my groups to > established ones and am using an internal db? I guess extra info is > stored in cvtermprop for each accession? > > feature - You create a feature that is the group feature, and > associate the members with it. Unfortunately this seems to contradict > the definition of a feature. > > featureprop - you have to annotate each feature as part of the group > via feature prop. This seems very problematic. > > > Any input or solutions would be greatly appreciated. > > Thank you! > > Bradford > > Bradford Condon > Postdoctoral Scholar, University of Tennessee Knoxville > www.bradfordcondon.com <http://www.bradfordcondon.com> > > > > > > > ------------------------------------------------------------------------------ > Check out the vibrant tech community on one of the world's most > engaging tech sites, Slashdot.org! http://sdm.link/slashdot > > > _______________________________________________ > Gmod-schema mailing list > Gmo...@li... > https://lists.sourceforge.net/lists/listinfo/gmod-schema |
From: Andrew F. <ad...@nc...> - 2018-02-15 23:39:42
|
Hi Bradford- continuing the discussion that we started over on the github: https://github.com/legumeinfo/tripal_phylotree/issues/23#issuecomment-365651510 We are using something of a hybrid approach that is focused on representation of trees for the families (using phylotree module), but which we extend to be able to: a) deal with assignments of new gene sets that come along "in between" builds of our trees, b) deal with assignments of gene sets for species for which we have multiple annotated genomes but don't necessarily want to include them all in the phylogenetic representation This "extension" is done simply using featureprop with a special type_id to indicate that it is a gene family assignment and a value indicating the name of the family (= name of phylotree which is really just the id of the HMM that defines the family in the methods that we're using). Our naming includes a namespace component (e.g. phytozome_10_2.<family_id>) which effectively plays the same role as the db in the dbxref solution (as I think you know phylotree uses dbxrefs). This may not be the most rigorous approach that we could have taken to the extension, but it is nicely lightweight and we haven't been unhappy with it yet (but there's still time!) One other wrinkle that may be worth mentioning: we are currently building all of our trees from polypeptide features that represent the longest translated splice form for a gene. These features are what the phylonodes actually link to, and these is some associated use of featureloc.residue_info against a feature representing the family consensus (in our case, the hmmemit'd sequence) to capture the multiple sequence alignment from which the trees were derived. However, we treat the family assignments as properties of the genes themselves, and we are using another featureprop to indicate the "family representative" of the gene (ie which of its products was used either in the tree or as the basis of the HMM-based assignment). This is a bit denormalized in terms of the fact that some of the info going into featureprop is also present in the trees (ie both the assignment to the family and the specific feature), but it helps to homogenize the representation of the non-tree'd family members for use by other tools that depend on gene family assignments, such as our multi-genome context viewer : ]https://github.com/legumeinfo/lis_context_viewer (which I think you are also considering adopting? I should also note to the GMOD group that some development on this tool was supported by last year's Google Summer of Code under the Open Genome Informatics organization, so thanks a lot for supporting that work, which has added some interesting new features above the initial release...) anyway, we can continue discussion on the github(s) if you want to pursue and/or help refine these pragmatically adopted strategies; just wanted to respond on list in case anyone else wanted to chime in about their approach and/or consider adopting ours. regards Andrew Farmer On 2/13/18 7:48 AM, Bradford Condon wrote: > Hi all, > > I am working on an ortholog module that will read in OrthoFinder > output and store it in Chado. > > I’ve read the discussion on the Group Module > <http://gmod.org/wiki/Chado_Group_Module> but after reaching out to > the authors, they chose to implement ortholog groups for features > using feature_ > <https://laceysanderson.github.io/chado-docs/tables/feature_dbxref.html>dbxref > <https://laceysanderson.github.io/chado-docs/tables/feature_dbxref.html>. > > Do people have recommendations they’d like to share for ortholog > groups? I’ve heard three general solutions: > > Group module - This solution appealed to me but I’m gathering it has > issues and may just be too complicated if feature groups are the only > goal. > > dbxref - You create a db with your families, and assign each family an > accession in the db. This works well if you’re using established > groups hosted on say photozome, but what if i can’t link my groups to > established ones and am using an internal db? I guess extra info is > stored in cvtermprop for each accession? > > feature - You create a feature that is the group feature, and > associate the members with it. Unfortunately this seems to contradict > the definition of a feature. > > featureprop - you have to annotate each feature as part of the group > via feature prop. This seems very problematic. > > > Any input or solutions would be greatly appreciated. > > Thank you! > > Bradford > > Bradford Condon > Postdoctoral Scholar, University of Tennessee Knoxville > www.bradfordcondon.com <http://www.bradfordcondon.com> > > > > > > > ------------------------------------------------------------------------------ > Check out the vibrant tech community on one of the world's most > engaging tech sites, Slashdot.org! http://sdm.link/slashdot > > > _______________________________________________ > Gmod-schema mailing list > Gmo...@li... > https://lists.sourceforge.net/lists/listinfo/gmod-schema -- ...all concepts in which an entire process is semiotically concentrated elude definition; only that which has no history is definable. Friedrich Nietzsche |