From: Bradford C. <bra...@gm...> - 2018-02-13 14:49:09
|
Hi all, I am working on an ortholog module that will read in OrthoFinder output and store it in Chado. I’ve read the discussion on the Group Module <http://gmod.org/wiki/Chado_Group_Module> but after reaching out to the authors, they chose to implement ortholog groups for features using feature_ <https://laceysanderson.github.io/chado-docs/tables/feature_dbxref.html>dbxref <https://laceysanderson.github.io/chado-docs/tables/feature_dbxref.html>. Do people have recommendations they’d like to share for ortholog groups? I’ve heard three general solutions: Group module - This solution appealed to me but I’m gathering it has issues and may just be too complicated if feature groups are the only goal. dbxref - You create a db with your families, and assign each family an accession in the db. This works well if you’re using established groups hosted on say photozome, but what if i can’t link my groups to established ones and am using an internal db? I guess extra info is stored in cvtermprop for each accession? feature - You create a feature that is the group feature, and associate the members with it. Unfortunately this seems to contradict the definition of a feature. featureprop - you have to annotate each feature as part of the group via feature prop. This seems very problematic. Any input or solutions would be greatly appreciated. Thank you! Bradford Bradford Condon Postdoctoral Scholar, University of Tennessee Knoxville www.bradfordcondon.com |
From: Andrew F. <ad...@nc...> - 2018-02-15 23:39:42
|
Hi Bradford- continuing the discussion that we started over on the github: https://github.com/legumeinfo/tripal_phylotree/issues/23#issuecomment-365651510 We are using something of a hybrid approach that is focused on representation of trees for the families (using phylotree module), but which we extend to be able to: a) deal with assignments of new gene sets that come along "in between" builds of our trees, b) deal with assignments of gene sets for species for which we have multiple annotated genomes but don't necessarily want to include them all in the phylogenetic representation This "extension" is done simply using featureprop with a special type_id to indicate that it is a gene family assignment and a value indicating the name of the family (= name of phylotree which is really just the id of the HMM that defines the family in the methods that we're using). Our naming includes a namespace component (e.g. phytozome_10_2.<family_id>) which effectively plays the same role as the db in the dbxref solution (as I think you know phylotree uses dbxrefs). This may not be the most rigorous approach that we could have taken to the extension, but it is nicely lightweight and we haven't been unhappy with it yet (but there's still time!) One other wrinkle that may be worth mentioning: we are currently building all of our trees from polypeptide features that represent the longest translated splice form for a gene. These features are what the phylonodes actually link to, and these is some associated use of featureloc.residue_info against a feature representing the family consensus (in our case, the hmmemit'd sequence) to capture the multiple sequence alignment from which the trees were derived. However, we treat the family assignments as properties of the genes themselves, and we are using another featureprop to indicate the "family representative" of the gene (ie which of its products was used either in the tree or as the basis of the HMM-based assignment). This is a bit denormalized in terms of the fact that some of the info going into featureprop is also present in the trees (ie both the assignment to the family and the specific feature), but it helps to homogenize the representation of the non-tree'd family members for use by other tools that depend on gene family assignments, such as our multi-genome context viewer : ]https://github.com/legumeinfo/lis_context_viewer (which I think you are also considering adopting? I should also note to the GMOD group that some development on this tool was supported by last year's Google Summer of Code under the Open Genome Informatics organization, so thanks a lot for supporting that work, which has added some interesting new features above the initial release...) anyway, we can continue discussion on the github(s) if you want to pursue and/or help refine these pragmatically adopted strategies; just wanted to respond on list in case anyone else wanted to chime in about their approach and/or consider adopting ours. regards Andrew Farmer On 2/13/18 7:48 AM, Bradford Condon wrote: > Hi all, > > I am working on an ortholog module that will read in OrthoFinder > output and store it in Chado. > > I’ve read the discussion on the Group Module > <http://gmod.org/wiki/Chado_Group_Module> but after reaching out to > the authors, they chose to implement ortholog groups for features > using feature_ > <https://laceysanderson.github.io/chado-docs/tables/feature_dbxref.html>dbxref > <https://laceysanderson.github.io/chado-docs/tables/feature_dbxref.html>. > > Do people have recommendations they’d like to share for ortholog > groups? I’ve heard three general solutions: > > Group module - This solution appealed to me but I’m gathering it has > issues and may just be too complicated if feature groups are the only > goal. > > dbxref - You create a db with your families, and assign each family an > accession in the db. This works well if you’re using established > groups hosted on say photozome, but what if i can’t link my groups to > established ones and am using an internal db? I guess extra info is > stored in cvtermprop for each accession? > > feature - You create a feature that is the group feature, and > associate the members with it. Unfortunately this seems to contradict > the definition of a feature. > > featureprop - you have to annotate each feature as part of the group > via feature prop. This seems very problematic. > > > Any input or solutions would be greatly appreciated. > > Thank you! > > Bradford > > Bradford Condon > Postdoctoral Scholar, University of Tennessee Knoxville > www.bradfordcondon.com <http://www.bradfordcondon.com> > > > > > > > ------------------------------------------------------------------------------ > Check out the vibrant tech community on one of the world's most > engaging tech sites, Slashdot.org! http://sdm.link/slashdot > > > _______________________________________________ > Gmod-schema mailing list > Gmo...@li... > https://lists.sourceforge.net/lists/listinfo/gmod-schema -- ...all concepts in which an entire process is semiotically concentrated elude definition; only that which has no history is definable. Friedrich Nietzsche |
From: Cannon, E. K [C. S] <ekc...@ia...> - 2018-02-15 23:06:38
|
Hi Brandon, I was eager to put the group module to use when developing a method for storing post-composed term, specifically, EQ statements. The table count exploded, so Naama and I settled on a completely different solution<http://gmod.org/wiki/Chado_Post-Composed_Phenotypes>. It's odd that creating a means of generically grouping data objects should be so difficult. Maybe the problem was thinking that grouping should be provided via a module, when instead each module should have its own set of grouping tables, similar to all modules having Xprop, X_relationship, X_dbxref, X_Y tables. (Xgroup, Xgroupprop, Xgroup_relationship...?) Ethy ________________________________ From: Bradford Condon <bra...@gm...> Sent: Tuesday, February 13, 2018 8:48 AM To: GMOD Schema/Chado List Subject: [Gmod-schema] Group module / dbxref Hi all, I am working on an ortholog module that will read in OrthoFinder output and store it in Chado. I’ve read the discussion on the Group Module<http://gmod.org/wiki/Chado_Group_Module> but after reaching out to the authors, they chose to implement ortholog groups for features using feature_<https://laceysanderson.github.io/chado-docs/tables/feature_dbxref.html>dbxref<https://laceysanderson.github.io/chado-docs/tables/feature_dbxref.html>. Do people have recommendations they’d like to share for ortholog groups? I’ve heard three general solutions: Group module - This solution appealed to me but I’m gathering it has issues and may just be too complicated if feature groups are the only goal. dbxref - You create a db with your families, and assign each family an accession in the db. This works well if you’re using established groups hosted on say photozome, but what if i can’t link my groups to established ones and am using an internal db? I guess extra info is stored in cvtermprop for each accession? feature - You create a feature that is the group feature, and associate the members with it. Unfortunately this seems to contradict the definition of a feature. featureprop - you have to annotate each feature as part of the group via feature prop. This seems very problematic. Any input or solutions would be greatly appreciated. Thank you! Bradford Bradford Condon Postdoctoral Scholar, University of Tennessee Knoxville www.bradfordcondon.com<http://www.bradfordcondon.com> |
From: Stephen F. <spf...@gm...> - 2018-02-16 00:00:58
|
Hi Bradford, Related to the dbxref option... The dbxref is meant to store accessions for an "object" or "entity", not be the entity. So, I would avoid using the dbxref entry to be the sole representation of an orthologous group. Related to the feature table option.... You want to organize features into ortholgous groups, so I agree it doesn't make sense to add a feature record to represent a group. A group isn't a feature but a relationship between features. Moreover, I think using the feature_relationship table would become problematic too. With the feature_relationship table, you could associate orthologs with one another but you'd have to do that on a pair-wise basis and have a relationship of 'SO:ortholous_to' for every gene in the group with every other gene. That seems a bit overkill. Related to the featureprop table I would agree that it doesn't really provide a "grouping" and makes it problematic if you do want to create your own dbxrefs for your orthologous groups. Related to the group module... I think the challenge with it is that it is a bit complex to put data into. But there are several other issues listed on the Group module page (http://gmod.org/wiki/Chado_Group_Module). In summary, I honestly don't think there's a good way to store orthogous groups in Chado the way it is now. Perhaps someone else may think otherwise... and I'd be happy to be corrected. So rather than a group module, what if we propose to add a set of "group" tables that span all modules, similar to the "relationship" tables that span all modules. Here's an example: Table Name: feature_group: a table just for group features. Fields: feature_group_id (PK); type_id (FK); name; description; and dbxref_id (FK) Table Name: feature_group_feature: groups features. Fields: feature_group_feature_id (PK), feature_group_id (FK); feature_id (FK). We can copy that structure for stock, organism, libraries, etc and allow us to make any type of groups of records within the same tables. It wouldn't have the same power as the group module, but would allow us to at least make simple groupings which is very much needed in Chado in some form. Just my two cents... Stephen On 2/13/2018 6:48 AM, Bradford Condon wrote: > Hi all, > > I am working on an ortholog module that will read in OrthoFinder > output and store it in Chado. > > I’ve read the discussion on the Group Module > <http://gmod.org/wiki/Chado_Group_Module> but after reaching out to > the authors, they chose to implement ortholog groups for features > using feature_ > <https://laceysanderson.github.io/chado-docs/tables/feature_dbxref.html>dbxref > <https://laceysanderson.github.io/chado-docs/tables/feature_dbxref.html>. > > Do people have recommendations they’d like to share for ortholog > groups? I’ve heard three general solutions: > > Group module - This solution appealed to me but I’m gathering it has > issues and may just be too complicated if feature groups are the only > goal. > > dbxref - You create a db with your families, and assign each family an > accession in the db. This works well if you’re using established > groups hosted on say photozome, but what if i can’t link my groups to > established ones and am using an internal db? I guess extra info is > stored in cvtermprop for each accession? > > feature - You create a feature that is the group feature, and > associate the members with it. Unfortunately this seems to contradict > the definition of a feature. > > featureprop - you have to annotate each feature as part of the group > via feature prop. This seems very problematic. > > > Any input or solutions would be greatly appreciated. > > Thank you! > > Bradford > > Bradford Condon > Postdoctoral Scholar, University of Tennessee Knoxville > www.bradfordcondon.com <http://www.bradfordcondon.com> > > > > > > > ------------------------------------------------------------------------------ > Check out the vibrant tech community on one of the world's most > engaging tech sites, Slashdot.org! http://sdm.link/slashdot > > > _______________________________________________ > Gmod-schema mailing list > Gmo...@li... > https://lists.sourceforge.net/lists/listinfo/gmod-schema |