You can subscribe to this list here.
2002 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
(11) |
Jul
(34) |
Aug
(14) |
Sep
(10) |
Oct
(10) |
Nov
(11) |
Dec
(6) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2003 |
Jan
(56) |
Feb
(76) |
Mar
(68) |
Apr
(11) |
May
(97) |
Jun
(16) |
Jul
(29) |
Aug
(35) |
Sep
(18) |
Oct
(32) |
Nov
(23) |
Dec
(77) |
2004 |
Jan
(52) |
Feb
(44) |
Mar
(55) |
Apr
(38) |
May
(106) |
Jun
(82) |
Jul
(76) |
Aug
(47) |
Sep
(36) |
Oct
(56) |
Nov
(46) |
Dec
(61) |
2005 |
Jan
(52) |
Feb
(118) |
Mar
(41) |
Apr
(40) |
May
(35) |
Jun
(99) |
Jul
(84) |
Aug
(104) |
Sep
(53) |
Oct
(107) |
Nov
(68) |
Dec
(30) |
2006 |
Jan
(19) |
Feb
(27) |
Mar
(24) |
Apr
(9) |
May
(22) |
Jun
(11) |
Jul
(34) |
Aug
(8) |
Sep
(15) |
Oct
(55) |
Nov
(16) |
Dec
(2) |
2007 |
Jan
(12) |
Feb
(4) |
Mar
(8) |
Apr
|
May
(19) |
Jun
(3) |
Jul
(1) |
Aug
(6) |
Sep
(12) |
Oct
(3) |
Nov
|
Dec
|
2008 |
Jan
(4) |
Feb
|
Mar
|
Apr
|
May
(1) |
Jun
(1) |
Jul
|
Aug
|
Sep
|
Oct
(1) |
Nov
|
Dec
(21) |
2009 |
Jan
|
Feb
(2) |
Mar
(1) |
Apr
|
May
(1) |
Jun
(8) |
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2010 |
Jan
|
Feb
(1) |
Mar
(4) |
Apr
(3) |
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2011 |
Jan
|
Feb
|
Mar
|
Apr
(4) |
May
(19) |
Jun
(14) |
Jul
(1) |
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2012 |
Jan
|
Feb
|
Mar
(22) |
Apr
(12) |
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2013 |
Jan
(2) |
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
(2) |
Nov
|
Dec
|
2015 |
Jan
|
Feb
|
Mar
|
Apr
|
May
(3) |
Jun
|
Jul
|
Aug
(2) |
Sep
|
Oct
|
Nov
|
Dec
(1) |
2016 |
Jan
(1) |
Feb
(1) |
Mar
|
Apr
(1) |
May
|
Jun
(2) |
Jul
(1) |
Aug
|
Sep
|
Oct
(1) |
Nov
(1) |
Dec
|
2017 |
Jan
|
Feb
|
Mar
|
Apr
|
May
(1) |
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
From: Arnaud K. <ax...@sa...> - 2003-02-12 15:15:41
|
Hi Jonathan Great, thanks for all you've done! Can you clarify the structure of the Oracle instance ? As far as I understand GUS30 has 5 namespaces which actually are implemented in Oracle as schema names or in other words as users. To access the different schemata there are two users, GUSrw, which has read/write access to all of them, and GUSdevReadOnly, which has read access only. Is that correct ? cheers Arnaud Jonathan Crabtree wrote: >I've just committed the preliminary GUS 3.0 schema into the shared CVS >repository on cvs.sanger.ac.uk (in GUS/Model/schema/oracle). It's >preliminary because I haven't done a full test yet (i.e., check out a >clean copy from CVS and use it to create a new GUS instance.) In fact, >I know there's at least one bug that was probably introduced by some >changes that Steve and I made to the build system earlier today. These >changes should make it much easier to install the schema, because the >user/DBA is now presented with a single file that has to be customized >(with Oracle passwords, tablespace names, quotas, etc.). Once that file >is customized, building the system will generate a site-specific set of >schema installation scripts that can then be run directly from SQL*PLUS, >without further modification. However, I believe that these changes are >interacting in an undesirable way with another part of the build system, >and we have to debug the problem (which I don't think will take long.) >In the meantime, there is a simple workaround; when the build process >fails--complaining that somedirectory/Model/Core does not exist--simply >create that directory (and the corresponding ones for the other GUS >namespaces) and re-run the build command. > >In any case, there are also a few other things that need to be cleaned up. >The installation documentation needs to be brought up to date and, as >mentioned above, I have to finish testing what's now in CVS before tagging >it as an official release. In particular, I suspect that the schema >creation files may contain some illegal identifiers (e.g., some >automatically-generated names may exceed the Oracle-imposed 30 character >limit.) We plan to tag the first release as version '3.0-1.0', to >indicate that the GUS schema version is 3.0 and the code version is 1.0; >this convention should make it easy to tell whether the schema has changed >in any given release. We also plan to keep migration scripts (e.g. to >convert a GUS 3.0 database instance into a GUS 3.1 database instance) in >GUS/Model/schema/oracle/migrate. Eventually, when we add support for >MySQL or PostgresQL, those files will go in GUS/Model/mysql or >GUS/Model/postgresql. By the way, "Model" is Steve's abbreviation for >"data model", the idea being that this directory encompasses everything >that relates to our data model. This includes both the database schema >and also any behavior associated with the "objects" defined in the schema >(i.e., the Perl and Java code.) > >I think that's all I have to report for now. Arnaud, I'm afraid I >haven't done anything about the repeat regions yet; it's on my the short >list of things to do once we get this initial release done. If you have >any questions about the build process/install scripts before I wake up >today, give Steve a call. Once the schema files have been successfully >"built", look at $GUS_HOME/schema/oracle/create-db.sh; this script uses >SQL*PLUS to run each of the individual .sql files in the correct order. >The output of each .sql file is logged to a corresponding .log file, >although I have yet to implement any kind of checking mechanism (for >example, to grep through the .log file and check that the correct number >of tables/views were created.) > >Jonathan > > > > > |
From: Steve F. <st...@pc...> - 2003-02-12 14:18:30
|
ok, sure, i'll look in to setting it up. might take a couple of days. steve Arnaud Kerhornou wrote: > Hi Steve > > Why not setting everything right from the beginning ? > Instead of using the feature request, another solution is to customize > a new tracker "schema request" in sourceforge ? We quite like this > idea, anyway let us know what you think about it ? > > cheers > Arnaud > > steve fischer wrote: > >> yes, source forge allows us to set up trackers for bugs, features, etc. >> >> i had in mind to start out real simple with only one tracker... >> called bugs, to get the hang of things. we can move to more trackers >> as needed. or, if you are pretty clear that having one for bugs and >> one for features will be a good way to start, we can consider that too. >> >> steve >> >> Arnaud Kerhornou wrote: >> >>> Hi >>> >>> sourceforge also has feature requests. Would it make more sense to >>> submit schema changes as new features instead of bugs ? >>> >>> Arnaud >>> >>> steve fischer wrote: >>> >>>> Folks- >>>> >>>> we are gearing up to be a bit more formal about changing the schema >>>> mostly so that we can have distinct releases, which hopefully will >>>> be a spread out over time. >>>> >>>> so while we obviously need to continue our active dialogue in this >>>> mail group and other ways, i am hoping that when we actually >>>> resolve to make a schema change, we use our bug tracker (see >>>> www.gusdb.org) to record the request for the change. >>>> >>>> let me know how this works >>>> >>>> steve >>>> > > |
From: Arnaud K. <ax...@sa...> - 2003-02-12 14:11:07
|
Hi Steve Why not setting everything right from the beginning ? Instead of using the feature request, another solution is to customize a new tracker "schema request" in sourceforge ? We quite like this idea, anyway let us know what you think about it ? cheers Arnaud steve fischer wrote: > yes, source forge allows us to set up trackers for bugs, features, etc. > > i had in mind to start out real simple with only one tracker... called > bugs, to get the hang of things. we can move to more trackers as > needed. or, if you are pretty clear that having one for bugs and one > for features will be a good way to start, we can consider that too. > > steve > > Arnaud Kerhornou wrote: > >> Hi >> >> sourceforge also has feature requests. Would it make more sense to >> submit schema changes as new features instead of bugs ? >> >> Arnaud >> >> steve fischer wrote: >> >>> Folks- >>> >>> we are gearing up to be a bit more formal about changing the schema >>> mostly so that we can have distinct releases, which hopefully will >>> be a spread out over time. >>> >>> so while we obviously need to continue our active dialogue in this >>> mail group and other ways, i am hoping that when we actually resolve >>> to make a schema change, we use our bug tracker (see www.gusdb.org) >>> to record the request for the change. >>> >>> let me know how this works >>> >>> steve >>> |
From: Jonathan C. <cra...@sn...> - 2003-02-12 09:45:41
|
I've just committed the preliminary GUS 3.0 schema into the shared CVS repository on cvs.sanger.ac.uk (in GUS/Model/schema/oracle). It's preliminary because I haven't done a full test yet (i.e., check out a clean copy from CVS and use it to create a new GUS instance.) In fact, I know there's at least one bug that was probably introduced by some changes that Steve and I made to the build system earlier today. These changes should make it much easier to install the schema, because the user/DBA is now presented with a single file that has to be customized (with Oracle passwords, tablespace names, quotas, etc.). Once that file is customized, building the system will generate a site-specific set of schema installation scripts that can then be run directly from SQL*PLUS, without further modification. However, I believe that these changes are interacting in an undesirable way with another part of the build system, and we have to debug the problem (which I don't think will take long.) In the meantime, there is a simple workaround; when the build process fails--complaining that somedirectory/Model/Core does not exist--simply create that directory (and the corresponding ones for the other GUS namespaces) and re-run the build command. In any case, there are also a few other things that need to be cleaned up. The installation documentation needs to be brought up to date and, as mentioned above, I have to finish testing what's now in CVS before tagging it as an official release. In particular, I suspect that the schema creation files may contain some illegal identifiers (e.g., some automatically-generated names may exceed the Oracle-imposed 30 character limit.) We plan to tag the first release as version '3.0-1.0', to indicate that the GUS schema version is 3.0 and the code version is 1.0; this convention should make it easy to tell whether the schema has changed in any given release. We also plan to keep migration scripts (e.g. to convert a GUS 3.0 database instance into a GUS 3.1 database instance) in GUS/Model/schema/oracle/migrate. Eventually, when we add support for MySQL or PostgresQL, those files will go in GUS/Model/mysql or GUS/Model/postgresql. By the way, "Model" is Steve's abbreviation for "data model", the idea being that this directory encompasses everything that relates to our data model. This includes both the database schema and also any behavior associated with the "objects" defined in the schema (i.e., the Perl and Java code.) I think that's all I have to report for now. Arnaud, I'm afraid I haven't done anything about the repeat regions yet; it's on my the short list of things to do once we get this initial release done. If you have any questions about the build process/install scripts before I wake up today, give Steve a call. Once the schema files have been successfully "built", look at $GUS_HOME/schema/oracle/create-db.sh; this script uses SQL*PLUS to run each of the individual .sql files in the correct order. The output of each .sql file is logged to a corresponding .log file, although I have yet to implement any kind of checking mechanism (for example, to grep through the .log file and check that the correct number of tables/views were created.) Jonathan |
From: steve f. <sfi...@pc...> - 2003-02-11 14:10:11
|
yes, source forge allows us to set up trackers for bugs, features, etc. i had in mind to start out real simple with only one tracker... called bugs, to get the hang of things. we can move to more trackers as needed. or, if you are pretty clear that having one for bugs and one for features will be a good way to start, we can consider that too. steve Arnaud Kerhornou wrote: > Hi > > sourceforge also has feature requests. Would it make more sense to > submit schema changes as new features instead of bugs ? > > Arnaud > > steve fischer wrote: > >> Folks- >> >> we are gearing up to be a bit more formal about changing the schema >> mostly so that we can have distinct releases, which hopefully will be >> a spread out over time. >> >> so while we obviously need to continue our active dialogue in this >> mail group and other ways, i am hoping that when we actually resolve >> to make a schema change, we use our bug tracker (see www.gusdb.org) >> to record the request for the change. >> >> let me know how this works >> >> steve >> >> >> >> ------------------------------------------------------- >> This SF.NET email is sponsored by: >> SourceForge Enterprise Edition + IBM + LinuxWorld = Something 2 See! >> http://www.vasoftware.com >> _______________________________________________ >> Gusdev-gusdev mailing list >> Gus...@li... >> https://lists.sourceforge.net/lists/listinfo/gusdev-gusdev >> > > |
From: Arnaud K. <ax...@sa...> - 2003-02-11 10:10:21
|
Hi sourceforge also has feature requests. Would it make more sense to submit schema changes as new features instead of bugs ? Arnaud steve fischer wrote: > Folks- > > we are gearing up to be a bit more formal about changing the schema > mostly so that we can have distinct releases, which hopefully will be > a spread out over time. > > so while we obviously need to continue our active dialogue in this > mail group and other ways, i am hoping that when we actually resolve > to make a schema change, we use our bug tracker (see www.gusdb.org) to > record the request for the change. > > let me know how this works > > steve > > > > ------------------------------------------------------- > This SF.NET email is sponsored by: > SourceForge Enterprise Edition + IBM + LinuxWorld = Something 2 See! > http://www.vasoftware.com > _______________________________________________ > Gusdev-gusdev mailing list > Gus...@li... > https://lists.sourceforge.net/lists/listinfo/gusdev-gusdev > |
From: steve f. <sfi...@pc...> - 2003-02-10 17:41:50
|
Folks- we are gearing up to be a bit more formal about changing the schema mostly so that we can have distinct releases, which hopefully will be a spread out over time. so while we obviously need to continue our active dialogue in this mail group and other ways, i am hoping that when we actually resolve to make a schema change, we use our bug tracker (see www.gusdb.org) to record the request for the change. let me know how this works steve |
From: steve f. <sfi...@pc...> - 2003-01-31 20:18:51
|
Folks- i have configured our sourceforge bug tracker with a few categories. it is now available from www.gusdb.org. if you have bugs, use the tracker! I will be "managing" the bugs, ie, i will get an email when a bug is submitted, and assign the bug to the best person. steve |
From: Joan M. <ma...@pc...> - 2003-01-30 15:50:04
|
Hi Arnaud, The XML for mutagen looks fine. I am wondering about the phenotypeClass and Phenotype terms, and if there won't be some overlap between the two. Joan Arnaud Kerhornou wrote: > Hi > > I've attached the xml files for two controlled vocabulary tables : > * Mutagen, > * PhenotypeClass. > > They're both from Flybase. > Re. PhenotypeClass I've just selected a minimal set from the Flybase > one. It covers actually two different things : > * inheritance description: recessive, dominant, codominant, > semidominant. only one term per allele > > * Effects : wild-type, lethal, viable etc. zero, one or more term per > allele > This set will probably need to be extended specifically for each organism. > > Let me know if they are fine with you. > > cheers > Arnaud > > ------------------------------------------------------------------------ > Name: Mutagen.xml > Mutagen.xml Type: XML Document (text/xml) > Encoding: 7bit > > Name: PhenotypeClass.xml > PhenotypeClass.xml Type: XML Document (text/xml) > Encoding: 7bit -- Joan Mazzarelli Computational Biology and Informatics Laboratory Center for Bioinformatics 1429 Blockley Hall University of Pennsylvania Philadelphia, PA 19104 |
From: Arnaud K. <ax...@sa...> - 2003-01-29 16:00:45
|
Hi I've attached the xml files for two controlled vocabulary tables : * Mutagen, * PhenotypeClass. They're both from Flybase. Re. PhenotypeClass I've just selected a minimal set from the Flybase one. It covers actually two different things : * inheritance description: recessive, dominant, codominant, semidominant. only one term per allele * Effects : wild-type, lethal, viable etc. zero, one or more term per allele This set will probably need to be extended specifically for each organism. Let me know if they are fine with you. cheers Arnaud |
From: Joan M. <ma...@pc...> - 2003-01-28 19:16:25
|
Hi, (I think in this use case there are two things we want to represent. * First is that an effector, E3, let's say activates the formation of Complex C1, made of two proteins, P1 and P2. * Second is how E3 activates the formation of C1. For example E3 is a kinase and by phosphorylating P1, it induces the dimerization of P1 and P2 to form C1.) So just for me: entity 3 (E3) interacts with P1 (unphosphorlated); (phosphorlated) P1 can now interact with P2 So P1 has two states, one is phosphorylated in which it acts as a complex component, the other one is unphosphorylated. Would it be worth to represent these two states of a protein (or more generally active/inactive states) ? Yes, but I think this gets into something that interaction does not strictly cover. And the answer requires thinking about GUS proteins. So the protein involved in the interaction is not the same, in other words, in protein land, it has a phosphorlated residue which participates in the interaction, so I think in the database we would have to say this is a new "instance" of the protein (I am not sure this is the right word to use) ....so there was a RNA which has a protein associated with it and then this protein is modified which changes not strictly its overall amino acid sequence but one of its amino acids "chemical nature". (although protein instances (sequences) derived for an RNA can vary depending on the "source".) If we can represent both the proteins forms (phosphorlated and unphos.) somehow, we could use the form which does the interacting as the entity (effector) in the interaction table. But I think this gets into protein areas which we have not discussed in any depth because we would have to be able to create the feature on the amino sequence (ie amino acid 23 of this amino acid sequence is phosphorlated). I guess something like the amino acid residue S at position 200 has been changed to S*. This gets into how to handle postranslational protein modifications. I think you may have had some discussions with Crabtree on this. Do you currently have away to do this when annotating or do you just have this info. associated in the protein (e.g., protein X is phosphorlated on residue 34; pubmed reference)? Joan Arnaud Kerhornou wrote: > Hi > > Joan Mazzarelli wrote: > > >Hi Jonathan, > > > >*formation* (e.g. dimerization)? Assuming that we had reason to explicitly > >represent the formation of a Complex (versus the mere fact of its existence, > >which is handled by Complex/ComplexComponent), wouldn't this be done with > >the Interaction table? If it were, then you'd have to be able to support > >multiple effectors. To represent dimerization, for example, you'd have > >2 inputs (effectors) and 1 output (the target.) The effectors would be > >the same entities referenced by the ComplexComponents and the target would > >be the Complex itself. This sounds redundant, but if (yet another > >hypotheticals) you wanted to represent the fact that a second or third > >protein acted to inhibit the dimerization process (through some as-yet- > >undetermined mechanism) then you'd need to create the dimerization > >Interaction so that you could reference it in yet another Interaction (as > >a target being inhibited by the new protein). > > > > > I think It makes sense representing the dimerization by an interaction. > This way we can differenciate that an effector modulates the activity of > Complex C1 from another situation where another effector modulates the > formation of C1, even though one could argue that regulating the > formation of C1 is likely to also regulate the activity of C1! > > I think in this use case there are two things we want to represent. > * First is that an effector, E3, let's say activates the formation of > Complex C1, made of two proteins, P1 and P2. > * Second is how E3 activates the formation of C1. For example E3 is a > kinase and by phosphorylating P1, it induces the dimerization of P1 and > P2 to form C1. > > So P1 has two states, one is phosphorylated in which it acts as a > complex component, the other one is unphosphorylated. Would it be worth > to represent these two states of a protein (or more generally > active/inactive states) ? > > >Yes this is true. Think modeling all interactions regardless of knowing that they > >form a dimer complex. > > > >If we take out the effect-target wording in the interaction table and say: > >entity 1 (effector) interacts withe entity 2 (target) to create a dimer; entity 1can > >equal entity 2 > > > >Entity 1 interacts with entity 2 ; now if a third entity inhibits the dimerization > >between 1 and 2 than entity 3 would need to be able to interact with entity 1 (or 2). > > > >I think the trouble comes in when if you had a dimer or a 2 component complex (entity > >1 and entity as a complex) and the entity 3 could only interact with the complex to > >disassemble it (or on the molecular level you need surfaces of both entity 1 and > >entity 2 for interaction with entity 3). > > > >I think maybe saying that the complex of 1 and 2 interacts with entity 3 takes care of > >this, but this uses both the interaction table and the complex table to assign the > >complex of 1 and 2 as interacting with entity 3. > > > >I think that the terms effector and target are confusing in the interaction table. > >Actually when we originally designed the interaction table I am remembering we > >struggled with these words, but if we now have direction is known (or not known) in > >the table do we need effector-target? I am not sure about this. > >Also the line of evidence tables for interaction may assume that you are only adding > >evidence for a single direct interaction even if there are multiple lines of evidence > >to support the interaction (yeast 2 hybrid exp., invitro binding exp.). > > > >Joan > > > > > > > Arnaud -- Joan Mazzarelli Computational Biology and Informatics Laboratory Center for Bioinformatics 1429 Blockley Hall University of Pennsylvania Philadelphia, PA 19104 |
From: Angel P. <an...@sn...> - 2003-01-28 17:18:15
|
On Tue, 28 Jan 2003, Arnaud Kerhornou wrote: > Hi > > A quick question. Do I need to specify the primary key or it is > automatically generated by the plugin if it is missing ? Only if you want to update a specific row in the DB. Otherwise the PK is automatically generated. Angel > > Soon I know the answer to this, I can generate XML for two controlled > vocabulary tables: Mutagen and PhenotypeClass. > FYI the controlled vcabularies will be based on Flybase ones. > > Arnaud > > Angel Pizarro wrote: > > >The XML parsing is currently handled by the GUSRow module (formerly > >RelationalRow). As such, there are a few constraints on the creation of > >GUS XML objects, in addition to using the fully qualified object names. > >1) It relies on newlines to get valid input. YOU MUST put object > >declaration on a separate line that attribute declarations. For instance > >the following do not work: > > > ><GUS::Model::RAD::Array><name> PancChip </name> > ></GUS::Model::RAD::Array> > > > ><GUS::Model::RAD::Array> > ><name> PancChip </name> <version> 1.2</version> > ></GUS::Model::RAD::Array> > > > >2) To force a submit while parsing a GUS XML doc, enter the characters > >"//" on a separate line. I am not sure this is valid XML, but I suspect it > >is due to backwards compatability issues with SGML. > > > > > >I am working in my spare time (of which there is not much) to switch to an > >actual XML parser, rather than use of regular expressions (as is now the > >case). At the same time, I will swith the XML syntax to match a more > >database centric scheme, by using the tag attributes as the table columns, > >instead of subelements: > > > ><Object > > att1 = 'val1' > > att2 = 'val2' > > ... > > > > <Child Object .../> > ></Object> > > > >But this is not anywhere near finished (in fact just started) so for now > >we must live with the constaints as they stand. > > > >Angel > > > > > > > > > > ------------------------------------------------------- > This SF.NET email is sponsored by: > SourceForge Enterprise Edition + IBM + LinuxWorld = Something 2 See! > http://www.vasoftware.com > _______________________________________________ > Gusdev-gusdev mailing list > Gus...@li... > https://lists.sourceforge.net/lists/listinfo/gusdev-gusdev > -- Angel Pizarro Programmer Analyst Center for Bioinformatics an...@pc... |
From: Arnaud K. <ax...@sa...> - 2003-01-28 17:03:04
|
Hi A quick question. Do I need to specify the primary key or it is automatically generated by the plugin if it is missing ? Soon I know the answer to this, I can generate XML for two controlled vocabulary tables: Mutagen and PhenotypeClass. FYI the controlled vcabularies will be based on Flybase ones. Arnaud Angel Pizarro wrote: >The XML parsing is currently handled by the GUSRow module (formerly >RelationalRow). As such, there are a few constraints on the creation of >GUS XML objects, in addition to using the fully qualified object names. >1) It relies on newlines to get valid input. YOU MUST put object >declaration on a separate line that attribute declarations. For instance >the following do not work: > ><GUS::Model::RAD::Array><name> PancChip </name> ></GUS::Model::RAD::Array> > ><GUS::Model::RAD::Array> ><name> PancChip </name> <version> 1.2</version> ></GUS::Model::RAD::Array> > >2) To force a submit while parsing a GUS XML doc, enter the characters >"//" on a separate line. I am not sure this is valid XML, but I suspect it >is due to backwards compatability issues with SGML. > > >I am working in my spare time (of which there is not much) to switch to an >actual XML parser, rather than use of regular expressions (as is now the >case). At the same time, I will swith the XML syntax to match a more >database centric scheme, by using the tag attributes as the table columns, >instead of subelements: > ><Object > att1 = 'val1' > att2 = 'val2' > ... > > <Child Object .../> ></Object> > >But this is not anywhere near finished (in fact just started) so for now >we must live with the constaints as they stand. > >Angel > > > |
From: Arnaud K. <ax...@sa...> - 2003-01-28 15:30:14
|
Hi Joan Mazzarelli wrote: >Hi Jonathan, > >*formation* (e.g. dimerization)? Assuming that we had reason to explicitly >represent the formation of a Complex (versus the mere fact of its existence, >which is handled by Complex/ComplexComponent), wouldn't this be done with >the Interaction table? If it were, then you'd have to be able to support >multiple effectors. To represent dimerization, for example, you'd have >2 inputs (effectors) and 1 output (the target.) The effectors would be >the same entities referenced by the ComplexComponents and the target would >be the Complex itself. This sounds redundant, but if (yet another >hypotheticals) you wanted to represent the fact that a second or third >protein acted to inhibit the dimerization process (through some as-yet- >undetermined mechanism) then you'd need to create the dimerization >Interaction so that you could reference it in yet another Interaction (as >a target being inhibited by the new protein). > > I think It makes sense representing the dimerization by an interaction. This way we can differenciate that an effector modulates the activity of Complex C1 from another situation where another effector modulates the formation of C1, even though one could argue that regulating the formation of C1 is likely to also regulate the activity of C1! I think in this use case there are two things we want to represent. * First is that an effector, E3, let's say activates the formation of Complex C1, made of two proteins, P1 and P2. * Second is how E3 activates the formation of C1. For example E3 is a kinase and by phosphorylating P1, it induces the dimerization of P1 and P2 to form C1. So P1 has two states, one is phosphorylated in which it acts as a complex component, the other one is unphosphorylated. Would it be worth to represent these two states of a protein (or more generally active/inactive states) ? >Yes this is true. Think modeling all interactions regardless of knowing that they >form a dimer complex. > >If we take out the effect-target wording in the interaction table and say: >entity 1 (effector) interacts withe entity 2 (target) to create a dimer; entity 1can >equal entity 2 > >Entity 1 interacts with entity 2 ; now if a third entity inhibits the dimerization >between 1 and 2 than entity 3 would need to be able to interact with entity 1 (or 2). > >I think the trouble comes in when if you had a dimer or a 2 component complex (entity >1 and entity as a complex) and the entity 3 could only interact with the complex to >disassemble it (or on the molecular level you need surfaces of both entity 1 and >entity 2 for interaction with entity 3). > >I think maybe saying that the complex of 1 and 2 interacts with entity 3 takes care of >this, but this uses both the interaction table and the complex table to assign the >complex of 1 and 2 as interacting with entity 3. > >I think that the terms effector and target are confusing in the interaction table. >Actually when we originally designed the interaction table I am remembering we >struggled with these words, but if we now have direction is known (or not known) in >the table do we need effector-target? I am not sure about this. >Also the line of evidence tables for interaction may assume that you are only adding >evidence for a single direct interaction even if there are multiple lines of evidence >to support the interaction (yeast 2 hybrid exp., invitro binding exp.). > >Joan > > > Arnaud |
From: Joan M. <ma...@pc...> - 2003-01-27 16:13:39
|
Hi Jonathan, *formation* (e.g. dimerization)? Assuming that we had reason to explicitly represent the formation of a Complex (versus the mere fact of its existence, which is handled by Complex/ComplexComponent), wouldn't this be done with the Interaction table? If it were, then you'd have to be able to support multiple effectors. To represent dimerization, for example, you'd have 2 inputs (effectors) and 1 output (the target.) The effectors would be the same entities referenced by the ComplexComponents and the target would be the Complex itself. This sounds redundant, but if (yet another hypotheticals) you wanted to represent the fact that a second or third protein acted to inhibit the dimerization process (through some as-yet- undetermined mechanism) then you'd need to create the dimerization Interaction so that you could reference it in yet another Interaction (as a target being inhibited by the new protein). Yes this is true. Think modeling all interactions regardless of knowing that they form a dimer complex. If we take out the effect-target wording in the interaction table and say: entity 1 (effector) interacts withe entity 2 (target) to create a dimer; entity 1can equal entity 2 Entity 1 interacts with entity 2 ; now if a third entity inhibits the dimerization between 1 and 2 than entity 3 would need to be able to interact with entity 1 (or 2). I think the trouble comes in when if you had a dimer or a 2 component complex (entity 1 and entity as a complex) and the entity 3 could only interact with the complex to disassemble it (or on the molecular level you need surfaces of both entity 1 and entity 2 for interaction with entity 3). I think maybe saying that the complex of 1 and 2 interacts with entity 3 takes care of this, but this uses both the interaction table and the complex table to assign the complex of 1 and 2 as interacting with entity 3. I think that the terms effector and target are confusing in the interaction table. Actually when we originally designed the interaction table I am remembering we struggled with these words, but if we now have direction is known (or not known) in the table do we need effector-target? I am not sure about this. Also the line of evidence tables for interaction may assume that you are only adding evidence for a single direct interaction even if there are multiple lines of evidence to support the interaction (yeast 2 hybrid exp., invitro binding exp.). Joan Jonathan Crabtree wrote: > Joan- > > > So with the way the tables complex and interaction are set up now, if a complex > > participates in an interaction > > to find this then you have to see if the row_id in complexComponent is also a > > row_id in row set member of interaction. > > Not quite; if the complex is itself the effector or target of the > interaction then you'd want to join ComplexComponent.complex_id (not row_id) > with RowSetMember.row_id (and you'd also constrain RowSetMember.table_id with > the table_id for Complex.) You'd only use the row_id of ComplexComponent > if your Complex of interest was itself a component of another Complex that > acted as the effector or target. > > > With the interaction table two things are interacting (why is row set needed)? > > No; as discussed previously, we decided to extend the Interaction table > to represent the interaction of two *sets* of things (where those sets of > things cannot be represented as Complexes.) Given that, we had to add > something like the RowSet table. > > > I am confused about why it is not possible to build up sequential interactions > > using just single interacting components (see below). > > I believe that it *is* possible to build up sequential interactions using > single interacting components; this is what the Pathway and PathwayInteraction > tables allow you to do. Having to create a new RowSet object--even when > you have only a single effector/target--does require some extra work, but > it's by no means impossible. This is what I was talking about last week > when I said that I could have left the Interaction table as it was > originally (i.e., with a table_id & row_id for both effector and target.) > However, we decided that we might as well replace these with references to > RowSet, so that the joins would always be consistent regardless of the > number of objects acting as the target/effector of the Interaction. > > > Then maybe use pathwayinteraction and pathway (even if the pathway just consists > > of a A binds B which binds C). Or do you want to model biological reactions > > which seems sequential to me like a pathway ? > > The goal is simply to be able to represent interactions between sets of > things. If this happens not to be a useful feature then we could consider > doing away with it. I'm not crazy about the extra joins that it entails, > but at the time it seemed like it would be a reasonable generalization to > make (and nobody objected when it was done.) We could consider changing > it back, but definitely not until after I've finished the 3.0 migration, > since I've already moved the Pathway data into the new Interaction schema. > > > >>> (the target.) Previously the Interaction table could only > > >>> represent the interaction > > >>> between a single pair of entities (OK if they happened to be > > >>> Complexes, for example, > > >>> but a potential problem in other situations.) > > > > but a potential problem in other situations? What are more of these? > > I'm not sure that I have any great examples, but since we've been talking > about Complexes, what about using Interaction to represent Complex > *formation* (e.g. dimerization)? Assuming that we had reason to explicitly > represent the formation of a Complex (versus the mere fact of its existence, > which is handled by Complex/ComplexComponent), wouldn't this be done with > the Interaction table? If it were, then you'd have to be able to support > multiple effectors. To represent dimerization, for example, you'd have > 2 inputs (effectors) and 1 output (the target.) The effectors would be > the same entities referenced by the ComplexComponents and the target would > be the Complex itself. This sounds redundant, but if (yet another > hypotheticals) you wanted to represent the fact that a second or third > protein acted to inhibit the dimerization process (through some as-yet- > undetermined mechanism) then you'd need to create the dimerization > Interaction so that you could reference it in yet another Interaction (as > a target being inhibited by the new protein). > > > (Although the current schema lets us group effectors together, it > > > doesn't let > > > us say (for example) that E1 interacts *directly* with T1 to > > > phosphorylate > > > it, but that E1's active site is only exposed when E1 is bound to E2. In > > > other words, E1's role in the activity can be viewed as "primary", and > > > E2's > > > role is secondary (in some sense) but all we can say in the schema is > > > that > > > the Complex consisting of E1 and E2 interacts with T1 to phosphorylate > > > it.) > > > > The above case: > > > > E2 interacts with E1 (they also happen to be represented by a complex; but now > > consider this only an interaction); then E1 interacts with T1 to modify it. > > Yes, although if the interaction between E2 and E1 were transient then the > result of E2 and E1's interaction would not be a complex, but rather a > modified E1. > > > E2 affects E1; then E1 affects T1. or > > > > Protein X (effector) interacts with protein Y (target); protein Y (effector ) > > modifies protein Z (target). > > I'm not sure I see the difference between these two alternatives, apart from > the fact that the second uses X,Y, and Z instead of E2, E1, and T1?? > > > I guess I have a problem with the E1-E2 concept (or multiple effectors if one > > does not effect or interact with a target directly) in the interaction table. > > Well, I think complex formation is a good example of a situation in which > both effectors interact directly with the target, because they both become > part of it. Whether we actually need to represent this is another > question. Can anyone else come up with any good examples of multiple- > effector/target interactions (that couldn't be easily modeled, as Joan > points out, with a series of simpler "single-valued" Interactions)? > > Jonathan > > ------------------------------------------------------- > This SF.NET email is sponsored by: > SourceForge Enterprise Edition + IBM + LinuxWorld = Something 2 See! > http://www.vasoftware.com > _______________________________________________ > Gusdev-gusdev mailing list > Gus...@li... > https://lists.sourceforge.net/lists/listinfo/gusdev-gusdev -- Joan Mazzarelli Computational Biology and Informatics Laboratory Center for Bioinformatics 1429 Blockley Hall University of Pennsylvania Philadelphia, PA 19104 |
From: Jonathan C. <cra...@sn...> - 2003-01-27 06:46:18
|
Joan- > So with the way the tables complex and interaction are set up now, if a complex > participates in an interaction > to find this then you have to see if the row_id in complexComponent is also a > row_id in row set member of interaction. Not quite; if the complex is itself the effector or target of the interaction then you'd want to join ComplexComponent.complex_id (not row_id) with RowSetMember.row_id (and you'd also constrain RowSetMember.table_id with the table_id for Complex.) You'd only use the row_id of ComplexComponent if your Complex of interest was itself a component of another Complex that acted as the effector or target. > With the interaction table two things are interacting (why is row set needed)? No; as discussed previously, we decided to extend the Interaction table to represent the interaction of two *sets* of things (where those sets of things cannot be represented as Complexes.) Given that, we had to add something like the RowSet table. > I am confused about why it is not possible to build up sequential interactions > using just single interacting components (see below). I believe that it *is* possible to build up sequential interactions using single interacting components; this is what the Pathway and PathwayInteraction tables allow you to do. Having to create a new RowSet object--even when you have only a single effector/target--does require some extra work, but it's by no means impossible. This is what I was talking about last week when I said that I could have left the Interaction table as it was originally (i.e., with a table_id & row_id for both effector and target.) However, we decided that we might as well replace these with references to RowSet, so that the joins would always be consistent regardless of the number of objects acting as the target/effector of the Interaction. > Then maybe use pathwayinteraction and pathway (even if the pathway just consists > of a A binds B which binds C). Or do you want to model biological reactions > which seems sequential to me like a pathway ? The goal is simply to be able to represent interactions between sets of things. If this happens not to be a useful feature then we could consider doing away with it. I'm not crazy about the extra joins that it entails, but at the time it seemed like it would be a reasonable generalization to make (and nobody objected when it was done.) We could consider changing it back, but definitely not until after I've finished the 3.0 migration, since I've already moved the Pathway data into the new Interaction schema. > >>> (the target.) Previously the Interaction table could only > >>> represent the interaction > >>> between a single pair of entities (OK if they happened to be > >>> Complexes, for example, > >>> but a potential problem in other situations.) > > but a potential problem in other situations? What are more of these? I'm not sure that I have any great examples, but since we've been talking about Complexes, what about using Interaction to represent Complex *formation* (e.g. dimerization)? Assuming that we had reason to explicitly represent the formation of a Complex (versus the mere fact of its existence, which is handled by Complex/ComplexComponent), wouldn't this be done with the Interaction table? If it were, then you'd have to be able to support multiple effectors. To represent dimerization, for example, you'd have 2 inputs (effectors) and 1 output (the target.) The effectors would be the same entities referenced by the ComplexComponents and the target would be the Complex itself. This sounds redundant, but if (yet another hypotheticals) you wanted to represent the fact that a second or third protein acted to inhibit the dimerization process (through some as-yet- undetermined mechanism) then you'd need to create the dimerization Interaction so that you could reference it in yet another Interaction (as a target being inhibited by the new protein). > (Although the current schema lets us group effectors together, it > > doesn't let > > us say (for example) that E1 interacts *directly* with T1 to > > phosphorylate > > it, but that E1's active site is only exposed when E1 is bound to E2. In > > other words, E1's role in the activity can be viewed as "primary", and > > E2's > > role is secondary (in some sense) but all we can say in the schema is > > that > > the Complex consisting of E1 and E2 interacts with T1 to phosphorylate > > it.) > > The above case: > > E2 interacts with E1 (they also happen to be represented by a complex; but now > consider this only an interaction); then E1 interacts with T1 to modify it. Yes, although if the interaction between E2 and E1 were transient then the result of E2 and E1's interaction would not be a complex, but rather a modified E1. > E2 affects E1; then E1 affects T1. or > > Protein X (effector) interacts with protein Y (target); protein Y (effector ) > modifies protein Z (target). I'm not sure I see the difference between these two alternatives, apart from the fact that the second uses X,Y, and Z instead of E2, E1, and T1?? > I guess I have a problem with the E1-E2 concept (or multiple effectors if one > does not effect or interact with a target directly) in the interaction table. Well, I think complex formation is a good example of a situation in which both effectors interact directly with the target, because they both become part of it. Whether we actually need to represent this is another question. Can anyone else come up with any good examples of multiple- effector/target interactions (that couldn't be easily modeled, as Joan points out, with a series of simpler "single-valued" Interactions)? Jonathan |
From: mazz <ma...@sn...> - 2003-01-27 00:31:09
|
Hi Arnaud, I have some questions for you. So with the way the tables complex and interaction are set up now, if a complex participates in an interaction to find this then you have to see if the row_id in complexComponent is also a row_id in row set member of interaction. With the interaction table two things are interacting (why is row set needed)? What did you have in mind for interaction type (protein-DNA? or more detailed) and effector action type (inhibits)? I am confused about why it is not possible to build up sequential interactions using just single interacting components (see below). Then maybe use pathwayinteraction and pathway (even if the pathway just consists of a A binds B which binds C). Or do you want to model biological reactions which seems sequential to me like a pathway ? (This allows >>> us to represent >>> the interaction of a set of objects (the effector) with another set >>> of objects >>> (the target.) Previously the Interaction table could only >>> represent the interaction >>> between a single pair of entities (OK if they happened to be >>> Complexes, for example, >>> but a potential problem in other situations.) but a potential problem in other situations? What are more of these? (Although the current schema lets us group effectors together, it > doesn't let > us say (for example) that E1 interacts *directly* with T1 to > phosphorylate > it, but that E1's active site is only exposed when E1 is bound to E2. In > other words, E1's role in the activity can be viewed as "primary", and > E2's > role is secondary (in some sense) but all we can say in the schema is > that > the Complex consisting of E1 and E2 interacts with T1 to phosphorylate > it.) The above case: E2 interacts with E1 (they also happen to be represented by a complex; but now consider this only an interaction); then E1 interacts with T1 to modify it. E2 affects E1; then E1 affects T1. or Protein X (effector) interacts with protein Y (target); protein Y (effector ) modifies protein Z (target). I guess I have a problem with the E1-E2 concept (or multiple effectors if one does not effect or interact with a target directly) in the interaction table. I guess I may also think of Complex and Interaction more separately. For example, the TFIID complex has several components; the complex consists of several proteins (protein-protein interactions; complex type - protein) of which there is no known direction (effector-target concept at least for now). This complex with its complexComponents (which we represent) can than interact with a DNA sequence (target). Interaction type (protein-DNA); (effector action type - binds?). Although, in this example, we know that we would also have the TATA-binding protein (TBP; effector) interacting with the DNA sequence target as an entry in interaction (separate from the entry that the complex TFIID can interact with the DNA target). Also the other component interactions ...TBP-associated factor 70 interacts with TBP .(direction not known) and so on ... if all the interactions individually are known to define the complex entirely. Joan Arnaud Kerhornou wrote: > Hi Jonathan > > Jonathan Crabtree wrote: > > > > > Arnaud- > > > > Thanks for the feedback; I think we're getting close to agreement here. > > I think so too ! > > >> I have noticed that your changes don't cover the DNA/RNA features. Is > >> there any reason for this ? I know there are quite a lot of them and > >> there might be another way of storing data some information such as > >> telomere or centromere regions, origin of replication, inflection > >> point etc. All these features are covered by Sequence Ontology, so a > >> new ChromosomeElement or ChromosomeRegion feature could be generic > >> enough to cover most of them. > >> Let me know what you think. > > > > > > Which DNA/RNA features do you mean (other than those mentioned above)? > > The file I sent you should include views on the top of NAFeatureImp > table. Here the list : > > * ChromosomeElement or we can keep CentromereFeature and TelomereFeature > as they are in gusdev - IMPORTANT > * InfectionPointFeature > * ReplicationFeature, for annotated origins of replication > * RNARegulatory - as there is a DNARegulatory feature => regulatory > element at the RNA level > * RNASecondaryStructure > * SpliceSiteFeature > * TransposableElement > > + an extra attribute in RestrictionFragmentFeature, "type_of_cut" > (Sticky or blunt) > + an extra attribute in GeneSynonym, "is_obsolete" > > + a new view on the top of NASequenceImp, "GenomicSequence" instead of > the existing one, ExternalNASequence. > > I can send the files to you if you want. > > > > > It's possible that I misplaced the e-mail or notes where we discussed > > these. Or are you just saying that we will eventually have a view for > > each type of DNA/RNA feature in the Sequence Ontology? I think that > > this is true, although I hadn't planned to make the change immediately, > > since I believe we had agreed on a "transitional" period in which the > > various NAFeature views would first be given a nullable > > sequence_ontology_id > > Yes we had! So regarding chromosome regions, shall we keep > TelomereFeature and CentromereFeature ? > > > and we would then decide how to best rearrange the views to more closely > > match the ontology terms. I haven't added the sequence_ontology_id > > column to the NAFeature views, but I will do so right away. We do > > currently have some relevant NAFeature views in gusdev that have not > > been migrated into 3.0: > > > > CentromereFeature > > LowComplexityNAFeature > > ScaffoldGapFeature > > TelomereFeature > > > > I have no objection to merging the telomere and centromere features into > > a single view--along with any other chromosomal regions covered by the > > ontology--although it would mean that we wouldn't have a 1-1 mapping > > between sequence ontology terms and views on NAFeature. I think that > > at one point this was proposed as the eventual goal of the rearrangement. > > Anyway, given that I'm not certain of the plan here, I'm going to add > > the sequence_ontology_id column but leave the views unchanged for now. > > They can easily be changed without interfering with our data migration, > > so their fate doesn't have to be settled immediately. We have yet to > > establish a consistent set of rules for deciding when different types > > of features get grouped into a single view and when they get their own > > views, so this is probably a good opportunity to settle the question > > once and for all. The Sequence Ontology is big enough that we probably > > *don't* want a view for each and every term in the ontology; it would > > make maintenance quite difficult. But we could, for example, create a > > view for every top-level (or second-level) sequence ontology term. > > However, even a relatively high-level feature like "chromosomal region" > > (SO:0000711) looks like it's already a 4th or 5th level feature... > > > At > > the other extreme, we could continue what we're doing now, i.e. using > > an ad-hoc classification of features based on the data we actually have > > available, and just make sure that every feature is tagged with the > > correct sequence ontology term. Any thoughts? > > It makes sense as SO may undergo revisions this year. > > > > >>> > >>> alter table DOTS.PROTEINPROPERTY add constraint PROTEINPROPERTY_CK01 > >>> check (property_name in ('isoelectric point', 'molecular mass', > >>> 'charge', 'average residue mass')); > >>> > >>> The table allows multiple protein properties of the same type to be > >>> associated with > >>> entries in DoTS.AASequenceImp. Arnaud had suggested originally that > >>> the last property, average residue mass, could actually be an > >>> attribute of the table that stores the protein sequence itself. > >>> However, it seemed that if the molecular mass attribute could have > >>> multiple values (e.g., from different experiments) then > >>> the same should be true of the average residue mass, which is > >>> essentially a derived property. Let me know if you disagree with > >>> this, or think we should create an explicit controlled vocab. for > >>> these 4 properties. > >>> > >>> > >> A controlled vocabulary table with the four attributes you've > >> mentioned is fine. > > > > > > OK, I'll make this change. > > > >>> -Protein features > >>> *Signal peptide features (stored in DoTS.SignalPeptideFeature) > >>> This view exists already, as DoTS.SignalPeptideFeature, but we need > >>> to add the > >>> ability to store curated data, such as targetting information. It > >>> should be straightforward to modify the view to accomodate this, > >>> but I'm not sure exactly > >>> what needs to be stored. Currently we use the view exclusively for > >>> SignalP > >>> predictions, and from what I understand SignalP is only concerned > >>> with predicting > >>> secreted proteins, meaning that we don't currently have any > >>> explicit targetting information. Is this something we could > >>> represent using the GO ontology for cellular localization? Do we > >>> also need some free text columns? Let me know and I'll make > >>> the changes. All the SignalP-specific columns appear to be > >>> nullable, so we don't > >>> necessarily have to do anything except add the new columns for the > >>> manually curated > >>> information. > >>> > >>> > >> After talking to the curators it appears that GO component suplements > >> targetting information at the feature level but will not be enough. > >> The targeting information is represented by the component ontology in > >> one context i.e. mitochondrial, nuclear, membrane localization but > >> not in the context of the actual residues involved. > >> The actual residues involved in the targeting (or any other > >> phenomena) need to be represented by a protein feature ontology can > >> be mapped onto specific amino acids of a protein. > >> This ontology is the equivalent of Sequence Ontology (SO) which is > >> meant for DNA features. It is being prepared by Val Wood with input > >> from Swiss-prot. > > > > > > OK, so the idea is that the various signal peptides have been classified > > into named classes that should be represented by some kind of ontology? > > > >> As you're going to add a extra attribute sequence_ontology_id to the > >> NA Features, could you do the same to any AA Features ? > > > > > > This will only work if the new ontology is actually part of the Sequence > > Ontology (or if we use the SequenceOntology table to store both > > ontologies.) > > Do you know if this is the case? It's quite possible, since the SO does > > already cover amino acid features. Otherwise we'll have to create a > > separate AASequenceOntology (or whatever the new ontology is called). > > It is at the moment a different project but it would make sense they > merge in the future. Just to give you an idea about Localization > Signals, here is a snapshot: > > %localization signal > %N-terminal signal sequence > %nuclear localization signal > %bipartite nuclear localization signal > %etc > %mitochondrial localization sequence > %thylakoid localization signal > %ER retention signal > > The way the SignalPeptideFeature is designed make difficult the > annotation of localization signal features. We can leave > SignalPeptideFeature as it is as it fits with SignalP software > prediction and in the future create a new feature LocalizationSignalFeature. > > > > >>> *Transmembrane domain features (stored in DoTS.PredictedAAFeature) > >>> "PlasmoDB web site shows hydrophobicity graphics, where is it > >>> stored in GUS?" > >>> The hydrophobicity plots are computed dynamically based on the > >>> amino acid sequence. > >>> Transmembrane domains are currently stored in the > >>> PredictedAAFeature view, although > >>> I will probably create a new view for them when I get around to > >>> eliminating PredictedAAFeature. Another possibility would be to > >>> treat TM domains as another > >>> type of domain, and store them in DomainFeature. What do you think > >>> about this? > >>> > >>> > >> I reckon they could be merged. > > > > > > OK, sounds good. > > > >>> *Post-translational modification features (new view: > >>> DoTS:PostTranslationalModFeature) > >>> Has a "type" column to represent the type of modification. It was > >>> also suggested > >>> that we have a column called "modified_by", which would be a > >>> reference to the Interaction table. However, isn't it possible > >>> that the same post-translational > >>> modification (e.g., phosphorylation of a specific amino acid) could > >>> be the result > >>> of one of several Interactions? > >> > >> yes you're right, the effector could be different. In that case the > >> attribute > >> "modified_by" is not useful. > >> > >>> This argues for an additional relationship between Interaction and > >>> PostTranslationalModFeature, unless we're OK creating multiple > >>> PostTranslationalModFeatures, identical except for their modified_by > >>> attribute. Comments on this? > >>> > >>> > >> I don't think they should be duplicated as they corresponds to a > >> unique site. This unique feature would > >> be associated with different interaction entries. We might not need > >> an extra table between Interaction and PostTranslationalModFeature > >> though. We still can do the following query : "give me all the > >> interaction entries which target is a PostTranslationalModFeature > >> which id is ...". > >> How does it sound ? > > > > > > We could do this, although one question is whether, semantically > > speaking, > > the "target" of an Interaction should be "the thing to be modified" > > (e.g. an > > unphosphorylated sequence or residue) or "the resulting modification" > > (e.g. > > the feature that represents a phosphorylated residue at the appropriate > > location.) The answer is probably that we just shouldn't worry about it > > and should just do whatever is most convenient on a case-by-case basis. > > To do it "correctly" would be problematic either way. For example, if we > > say that the target is the thing to be modified, then we have to create a > > feature that represents a region of sequence that *could* be modified in > > some way and then create another feature to represent the actual > > modification. > > But if we say that the target is the result of the modification then > > we may > > have to create equally unusual tables/views. For example, if the > > result of > > a given interaction is to degrade a protein, then do we have to create a > > table/object that represents a degraded protein (or "nothing", or > > whatever > > it is that's left after the modification)? For now I have no problem > > with > > interpreting the "target" based on context, but in the longer term we may > > want to consider separating the notions of "target prior to modification" > > and either "target after modification" or "effect of modification". > > > > I also realized belatedly that I could have left the Interaction table > > unchanged, rather than introducing specific references to RowSet. This > > would have allowed us to represent either singleton effectors/targets or > > set-valued effectors/targets, without having to always join through > > RowSet > > in the singleton case. On the other hand, if we do associate some > > additional information with the RowSets, then the current representation > > is correct. > > It depends if we want to represent many-to-many relationship between > interaction and members of this interaction. Without the RowSet table, > we can't assign a set of several effectors/targets, right ? Unless we > consider that this set of effectors are being part of a complex and act > as the whole. > > > > >>> *AA repeats (new view: RepeatRegionAAFeature) > >>> I called this view RepeatRegionAAFeature in case we want to have a > >>> similar view > >>> for NASequences. I also created only a single view, instead of > >>> following Arnaud's > >>> original suggestion, which was for both: > >>> > >>> * RepeatRegionFeature as a set of RepeatUnitFeatures, > >>> * RepeatUnitFeature, with the consensus sequence, name and size > >>> > >>> I based the design of this view on that of TandemRepeatFeature, > >>> which we have for > >>> NASequences already. Instead of splitting the consensus sequence, > >>> name, and size > >>> into a separate table, they occupy columns in > >>> RepeatRegionAAFeature. This works > >>> quite well for the tandem repeats we already have (for DNA > >>> sequences.) However, if > >>> there is a known set of named amino acid sequence repeats, then it > >>> would probably > >>> make sense to do what Arnaud suggested, and store these in a > >>> separate table (likely named RepeatUnit, not RepeatUnitFeature, > >>> since they would have no unique locations.) Does this sound > >>> reasonable? That is, put the consensus seqs in the > >>> repeat region table itself if they're anonymous, but if they've > >>> been named, then store them in a separate table. Also note that > >>> this view has a reference to RepeatType, although the current > >>> contents of this table are probably applicable only to DNA sequence > >>> repeats (LINEs, SINEs, ALUs, etc.), since I believe that I parsed > >>> them out of RepBase. > >>> > >>> > >> I proposed a separate repeat feature because one may want to annotate > >> a repeat outside a repeat region, for example LTR repeats attached to > >> a given transposable element. These RepeatFeatures or > >> RepeatUnitFeatures can then have a location. > >> The other case is when a repeat region is made of a set of different > >> repeat units. > > > > > > OK, I didn't realize that this was what you were trying to represent. As > > currently conceived, RepeatRegionAAFeature is meant to represent a region > > that contains one or more immediately adjacent copies of the same type > > of (amino acid sequence) repeat. The assumption is also that these > > regions > > will typically be maximal (with respect to the chosen repeat type, > > consensus, > > and max. mismatch, the last of which is not represented directly in the > > table.) We can still represent more complex repeat structures using this > > single table, but the representation is implicit, not explicit (i.e. you > > have to do a query to find out what other repeats lie within the > > bounds of > > the transposon, meaning that there's no easy way to query for all > > transposable > > elements with a particular flanking LTR structure.) Do you want to > > come up > > with a 2-table version of what I've done? The use cases aren't clear > > enough > > in my mind yet for me to be able to do it. It seems that the bare > > minimum we > > need is just another column in the RepeatRegionAAFeature, parent_id; > > which > > would let us represent explicitly that a particular repeat is a > > *necessary* > > (versus incidental) component of another NA/AAFeature. Both AAFeatureImp > > and NAFeatureImp already have a parent_id, so this would be a > > straightforward > > change. The queries still might not be terribly efficient, but I > > don't know > > what exactly you wanted to support in terms of queries, versus just > > making > > sure that the representation is sufficiently rich to capture the > > structure. > > A case we came across here for Tbrucei is nested repeat regions (at the > DNA level). Each repeat region has coordinates and is annotated with a > unique repeat unit type. This repeat region can be within a bigger > repeat region annotated with a different repeat unit type. > ... which is in other words your suggestion with parent_id as an extra > attribute ... > > Regarding transposon repeat types, if we have a TransposableElement > feature and its type is given as an attribute, a repeat feature will > just be useful to locate the LTRs within a given a transposable element. > Can we keep this functionality ? Then the feature will be simple, just a > repeat_type, and a parent_id atributes. > > > > >> In any case, NA repeats and AA repeats should have the same design. > >> Just the controlled vocabulary representing the types of repeats will > >> be different. > > > > > > Absolutely, yes, although one question is whether AA repeats can have the > > same kind of nested structure that you mention as a possibility for NA > > repeats (the transposon with LTRs). I don't know the answer to this. > > > >>> -DoTS.Interaction (table modified, dependent tables added) > >>> *Added "has_direction" column, as discussed previously. The idea > >>> here is that > >>> not all interactions (particularly physical ones, e.g., > >>> dimerization) have a > >>> direction. If has_direction == 0, then the value of > >>> direction_is_known can > >>> be ignored. > >>> *Added non-nullable "effector_action_type_id" column, referencing > >>> DoTS.EffectorActionType (a new table.) This column/table > >>> represents the possible > >>> things that an effector can do to a target. For example, the > >>> InteractionType > >>> associated with the Interaction could be "binds to" (e.g., a > >>> promoter region), and > >>> the EffectorActionType for that Interaction could be to either > >>> "inhibit" or "enhance" > >>> expression of the coresponding gene. > >>> *Replaced effector_table_id and effector_row_id with > >>> effector_row_set_id, and > >>> similarly for the target_table_id and target_row_id. This allows > >>> us to represent > >>> the interaction of a set of objects (the effector) with another set > >>> of objects > >>> (the target.) Previously the Interaction table could only > >>> represent the interaction > >>> between a single pair of entities (OK if they happened to be > >>> Complexes, for example, > >>> but a potential problem in other situations.) Now both effector > >>> and target are represented as references to DoTS.RowSet, which in > >>> tun references DoTS.RowSetMember, > >>> which...in turn...references the individual database rows that > >>> comprise the effector > >>> or target. These tables (RowSet and RowSetMember) are essentially > >>> the same as Complex and ComplexComponent, except that they are > >>> totally generic; they can be used to group any set of rows in the > >>> database and they store no additional information. However, if > >>> there are any additional columns that we can think of (that are > >>> specific to Interactions) then these tables should be replaced by > >>> less generic ones (e.g. InteractingEntitySet or InteractionSet, or > >>> something along those lines.) > >>> > >>> > >> Sounds fine. The only thing I can see is regarding the > >> EffectorActionType. If each effector, member of a RowSet, has a > >> different action type, the attribute, effector_action_type_id, should > >> go in the RowSetMember table. I don't have any example though. > > > > > > OK, I think I'd be inclined to wait until we have some use cases for > > this. > > Although the current schema lets us group effectors together, it > > doesn't let > > us say (for example) that E1 interacts *directly* with T1 to > > phosphorylate > > it, but that E1's active site is only exposed when E1 is bound to E2. In > > other words, E1's role in the activity can be viewed as "primary", and > > E2's > > role is secondary (in some sense) but all we can say in the schema is > > that > > the Complex consisting of E1 and E2 interacts with T1 to phosphorylate > > it. > > I think that the solution we have now is OK, but it only lets us > > represent > > the overall action of the entire set of effectors. > > Let's leave the design as it is for now. Curators are not going to > curate interactions data in the short term. We shall come back later > with more precise ideas/use cases about them. > > > > > Jonathan > > > > Arnaud > > ------------------------------------------------------- > This SF.NET email is sponsored by: Thawte.com > Understand how to protect your customers personal information by implementing > SSL on your Apache Web Server. Click here to get our FREE Thawte Apache > Guide: http://ads.sourceforge.net/cgi-bin/redirect.pl?thaw0029en > _______________________________________________ > Gusdev-gusdev mailing list > Gus...@li... > https://lists.sourceforge.net/lists/listinfo/gusdev-gusdev |
From: mazz <ma...@sn...> - 2003-01-27 00:26:23
|
Hi Arnaud, I have some questions for you. So with the way the tables complex and interaction are set up now, if a complex participates in an interaction to find this then you have to see if the row_id in complexComponent is also a row_id in row set member of interaction. With the interaction table two things are interacting (why is row set needed)? What did you have in mind for interaction type (protein-DNA? or more detailed) and effector action type (inhibits)? I am confused about why it is not possible to build up sequential interactions using just single interacting components (see below). Then maybe use pathwayinteraction and pathway (even if the pathway just consists of a A binds B which binds C). Or do you want to model biological reactions which seems sequential to me like a pathway ? (This allows >>> us to represent >>> the interaction of a set of objects (the effector) with another set >>> of objects >>> (the target.) Previously the Interaction table could only >>> represent the interaction >>> between a single pair of entities (OK if they happened to be >>> Complexes, for example, >>> but a potential problem in other situations.) but a potential problem in other situations? What are more of these? (Although the current schema lets us group effectors together, it > doesn't let > us say (for example) that E1 interacts *directly* with T1 to > phosphorylate > it, but that E1's active site is only exposed when E1 is bound to E2. In > other words, E1's role in the activity can be viewed as "primary", and > E2's > role is secondary (in some sense) but all we can say in the schema is > that > the Complex consisting of E1 and E2 interacts with T1 to phosphorylate > it.) The above case: E2 interacts with E1 (they also happen to be represented by a complex; but now consider this only an interaction); then E1 interacts with T1 to modify it. E2 affects E1; then E1 affects T1. or Protein X (effector) interacts with protein Y (target); protein Y (effector ) modifies protein Z (target). I guess I have a problem with the E1-E2 concept (or multiple effectors if one does not effect or interact with a target directly) in the interaction table. I guess I may also think of Complex and Interaction more separately. For example, the TFIID complex has several components; the complex consists of several proteins (protein-protein interactions; complex type - protein) of which there is no known direction (effector-target concept at least for now). This complex with its complexComponents (which we represent) can than interact with a DNA sequence (target). Interaction type (protein-DNA); (effector action type - binds?). Although, in this example, we know that we would also have the TATA-binding protein (TBP; effector) interacting with the DNA sequence target as an entry in interaction (separate from the entry that the complex TFIID can interact with the DNA target). Also the other component interactions ...TBP-associated factor 70 interacts with TBP .(direction not known) and so on ... if all the interactions individually are known to define the complex entirely. Joan Arnaud Kerhornou wrote: > Hi Jonathan > > Jonathan Crabtree wrote: > > > > > Arnaud- > > > > Thanks for the feedback; I think we're getting close to agreement here. > > I think so too ! > > >> I have noticed that your changes don't cover the DNA/RNA features. Is > >> there any reason for this ? I know there are quite a lot of them and > >> there might be another way of storing data some information such as > >> telomere or centromere regions, origin of replication, inflection > >> point etc. All these features are covered by Sequence Ontology, so a > >> new ChromosomeElement or ChromosomeRegion feature could be generic > >> enough to cover most of them. > >> Let me know what you think. > > > > > > Which DNA/RNA features do you mean (other than those mentioned above)? > > The file I sent you should include views on the top of NAFeatureImp > table. Here the list : > > * ChromosomeElement or we can keep CentromereFeature and TelomereFeature > as they are in gusdev - IMPORTANT > * InfectionPointFeature > * ReplicationFeature, for annotated origins of replication > * RNARegulatory - as there is a DNARegulatory feature => regulatory > element at the RNA level > * RNASecondaryStructure > * SpliceSiteFeature > * TransposableElement > > + an extra attribute in RestrictionFragmentFeature, "type_of_cut" > (Sticky or blunt) > + an extra attribute in GeneSynonym, "is_obsolete" > > + a new view on the top of NASequenceImp, "GenomicSequence" instead of > the existing one, ExternalNASequence. > > I can send the files to you if you want. > > > > > It's possible that I misplaced the e-mail or notes where we discussed > > these. Or are you just saying that we will eventually have a view for > > each type of DNA/RNA feature in the Sequence Ontology? I think that > > this is true, although I hadn't planned to make the change immediately, > > since I believe we had agreed on a "transitional" period in which the > > various NAFeature views would first be given a nullable > > sequence_ontology_id > > Yes we had! So regarding chromosome regions, shall we keep > TelomereFeature and CentromereFeature ? > > > and we would then decide how to best rearrange the views to more closely > > match the ontology terms. I haven't added the sequence_ontology_id > > column to the NAFeature views, but I will do so right away. We do > > currently have some relevant NAFeature views in gusdev that have not > > been migrated into 3.0: > > > > CentromereFeature > > LowComplexityNAFeature > > ScaffoldGapFeature > > TelomereFeature > > > > I have no objection to merging the telomere and centromere features into > > a single view--along with any other chromosomal regions covered by the > > ontology--although it would mean that we wouldn't have a 1-1 mapping > > between sequence ontology terms and views on NAFeature. I think that > > at one point this was proposed as the eventual goal of the rearrangement. > > Anyway, given that I'm not certain of the plan here, I'm going to add > > the sequence_ontology_id column but leave the views unchanged for now. > > They can easily be changed without interfering with our data migration, > > so their fate doesn't have to be settled immediately. We have yet to > > establish a consistent set of rules for deciding when different types > > of features get grouped into a single view and when they get their own > > views, so this is probably a good opportunity to settle the question > > once and for all. The Sequence Ontology is big enough that we probably > > *don't* want a view for each and every term in the ontology; it would > > make maintenance quite difficult. But we could, for example, create a > > view for every top-level (or second-level) sequence ontology term. > > However, even a relatively high-level feature like "chromosomal region" > > (SO:0000711) looks like it's already a 4th or 5th level feature... > > > At > > the other extreme, we could continue what we're doing now, i.e. using > > an ad-hoc classification of features based on the data we actually have > > available, and just make sure that every feature is tagged with the > > correct sequence ontology term. Any thoughts? > > It makes sense as SO may undergo revisions this year. > > > > >>> > >>> alter table DOTS.PROTEINPROPERTY add constraint PROTEINPROPERTY_CK01 > >>> check (property_name in ('isoelectric point', 'molecular mass', > >>> 'charge', 'average residue mass')); > >>> > >>> The table allows multiple protein properties of the same type to be > >>> associated with > >>> entries in DoTS.AASequenceImp. Arnaud had suggested originally that > >>> the last property, average residue mass, could actually be an > >>> attribute of the table that stores the protein sequence itself. > >>> However, it seemed that if the molecular mass attribute could have > >>> multiple values (e.g., from different experiments) then > >>> the same should be true of the average residue mass, which is > >>> essentially a derived property. Let me know if you disagree with > >>> this, or think we should create an explicit controlled vocab. for > >>> these 4 properties. > >>> > >>> > >> A controlled vocabulary table with the four attributes you've > >> mentioned is fine. > > > > > > OK, I'll make this change. > > > >>> -Protein features > >>> *Signal peptide features (stored in DoTS.SignalPeptideFeature) > >>> This view exists already, as DoTS.SignalPeptideFeature, but we need > >>> to add the > >>> ability to store curated data, such as targetting information. It > >>> should be straightforward to modify the view to accomodate this, > >>> but I'm not sure exactly > >>> what needs to be stored. Currently we use the view exclusively for > >>> SignalP > >>> predictions, and from what I understand SignalP is only concerned > >>> with predicting > >>> secreted proteins, meaning that we don't currently have any > >>> explicit targetting information. Is this something we could > >>> represent using the GO ontology for cellular localization? Do we > >>> also need some free text columns? Let me know and I'll make > >>> the changes. All the SignalP-specific columns appear to be > >>> nullable, so we don't > >>> necessarily have to do anything except add the new columns for the > >>> manually curated > >>> information. > >>> > >>> > >> After talking to the curators it appears that GO component suplements > >> targetting information at the feature level but will not be enough. > >> The targeting information is represented by the component ontology in > >> one context i.e. mitochondrial, nuclear, membrane localization but > >> not in the context of the actual residues involved. > >> The actual residues involved in the targeting (or any other > >> phenomena) need to be represented by a protein feature ontology can > >> be mapped onto specific amino acids of a protein. > >> This ontology is the equivalent of Sequence Ontology (SO) which is > >> meant for DNA features. It is being prepared by Val Wood with input > >> from Swiss-prot. > > > > > > OK, so the idea is that the various signal peptides have been classified > > into named classes that should be represented by some kind of ontology? > > > >> As you're going to add a extra attribute sequence_ontology_id to the > >> NA Features, could you do the same to any AA Features ? > > > > > > This will only work if the new ontology is actually part of the Sequence > > Ontology (or if we use the SequenceOntology table to store both > > ontologies.) > > Do you know if this is the case? It's quite possible, since the SO does > > already cover amino acid features. Otherwise we'll have to create a > > separate AASequenceOntology (or whatever the new ontology is called). > > It is at the moment a different project but it would make sense they > merge in the future. Just to give you an idea about Localization > Signals, here is a snapshot: > > %localization signal > %N-terminal signal sequence > %nuclear localization signal > %bipartite nuclear localization signal > %etc > %mitochondrial localization sequence > %thylakoid localization signal > %ER retention signal > > The way the SignalPeptideFeature is designed make difficult the > annotation of localization signal features. We can leave > SignalPeptideFeature as it is as it fits with SignalP software > prediction and in the future create a new feature LocalizationSignalFeature. > > > > >>> *Transmembrane domain features (stored in DoTS.PredictedAAFeature) > >>> "PlasmoDB web site shows hydrophobicity graphics, where is it > >>> stored in GUS?" > >>> The hydrophobicity plots are computed dynamically based on the > >>> amino acid sequence. > >>> Transmembrane domains are currently stored in the > >>> PredictedAAFeature view, although > >>> I will probably create a new view for them when I get around to > >>> eliminating PredictedAAFeature. Another possibility would be to > >>> treat TM domains as another > >>> type of domain, and store them in DomainFeature. What do you think > >>> about this? > >>> > >>> > >> I reckon they could be merged. > > > > > > OK, sounds good. > > > >>> *Post-translational modification features (new view: > >>> DoTS:PostTranslationalModFeature) > >>> Has a "type" column to represent the type of modification. It was > >>> also suggested > >>> that we have a column called "modified_by", which would be a > >>> reference to the Interaction table. However, isn't it possible > >>> that the same post-translational > >>> modification (e.g., phosphorylation of a specific amino acid) could > >>> be the result > >>> of one of several Interactions? > >> > >> yes you're right, the effector could be different. In that case the > >> attribute > >> "modified_by" is not useful. > >> > >>> This argues for an additional relationship between Interaction and > >>> PostTranslationalModFeature, unless we're OK creating multiple > >>> PostTranslationalModFeatures, identical except for their modified_by > >>> attribute. Comments on this? > >>> > >>> > >> I don't think they should be duplicated as they corresponds to a > >> unique site. This unique feature would > >> be associated with different interaction entries. We might not need > >> an extra table between Interaction and PostTranslationalModFeature > >> though. We still can do the following query : "give me all the > >> interaction entries which target is a PostTranslationalModFeature > >> which id is ...". > >> How does it sound ? > > > > > > We could do this, although one question is whether, semantically > > speaking, > > the "target" of an Interaction should be "the thing to be modified" > > (e.g. an > > unphosphorylated sequence or residue) or "the resulting modification" > > (e.g. > > the feature that represents a phosphorylated residue at the appropriate > > location.) The answer is probably that we just shouldn't worry about it > > and should just do whatever is most convenient on a case-by-case basis. > > To do it "correctly" would be problematic either way. For example, if we > > say that the target is the thing to be modified, then we have to create a > > feature that represents a region of sequence that *could* be modified in > > some way and then create another feature to represent the actual > > modification. > > But if we say that the target is the result of the modification then > > we may > > have to create equally unusual tables/views. For example, if the > > result of > > a given interaction is to degrade a protein, then do we have to create a > > table/object that represents a degraded protein (or "nothing", or > > whatever > > it is that's left after the modification)? For now I have no problem > > with > > interpreting the "target" based on context, but in the longer term we may > > want to consider separating the notions of "target prior to modification" > > and either "target after modification" or "effect of modification". > > > > I also realized belatedly that I could have left the Interaction table > > unchanged, rather than introducing specific references to RowSet. This > > would have allowed us to represent either singleton effectors/targets or > > set-valued effectors/targets, without having to always join through > > RowSet > > in the singleton case. On the other hand, if we do associate some > > additional information with the RowSets, then the current representation > > is correct. > > It depends if we want to represent many-to-many relationship between > interaction and members of this interaction. Without the RowSet table, > we can't assign a set of several effectors/targets, right ? Unless we > consider that this set of effectors are being part of a complex and act > as the whole. > > > > >>> *AA repeats (new view: RepeatRegionAAFeature) > >>> I called this view RepeatRegionAAFeature in case we want to have a > >>> similar view > >>> for NASequences. I also created only a single view, instead of > >>> following Arnaud's > >>> original suggestion, which was for both: > >>> > >>> * RepeatRegionFeature as a set of RepeatUnitFeatures, > >>> * RepeatUnitFeature, with the consensus sequence, name and size > >>> > >>> I based the design of this view on that of TandemRepeatFeature, > >>> which we have for > >>> NASequences already. Instead of splitting the consensus sequence, > >>> name, and size > >>> into a separate table, they occupy columns in > >>> RepeatRegionAAFeature. This works > >>> quite well for the tandem repeats we already have (for DNA > >>> sequences.) However, if > >>> there is a known set of named amino acid sequence repeats, then it > >>> would probably > >>> make sense to do what Arnaud suggested, and store these in a > >>> separate table (likely named RepeatUnit, not RepeatUnitFeature, > >>> since they would have no unique locations.) Does this sound > >>> reasonable? That is, put the consensus seqs in the > >>> repeat region table itself if they're anonymous, but if they've > >>> been named, then store them in a separate table. Also note that > >>> this view has a reference to RepeatType, although the current > >>> contents of this table are probably applicable only to DNA sequence > >>> repeats (LINEs, SINEs, ALUs, etc.), since I believe that I parsed > >>> them out of RepBase. > >>> > >>> > >> I proposed a separate repeat feature because one may want to annotate > >> a repeat outside a repeat region, for example LTR repeats attached to > >> a given transposable element. These RepeatFeatures or > >> RepeatUnitFeatures can then have a location. > >> The other case is when a repeat region is made of a set of different > >> repeat units. > > > > > > OK, I didn't realize that this was what you were trying to represent. As > > currently conceived, RepeatRegionAAFeature is meant to represent a region > > that contains one or more immediately adjacent copies of the same type > > of (amino acid sequence) repeat. The assumption is also that these > > regions > > will typically be maximal (with respect to the chosen repeat type, > > consensus, > > and max. mismatch, the last of which is not represented directly in the > > table.) We can still represent more complex repeat structures using this > > single table, but the representation is implicit, not explicit (i.e. you > > have to do a query to find out what other repeats lie within the > > bounds of > > the transposon, meaning that there's no easy way to query for all > > transposable > > elements with a particular flanking LTR structure.) Do you want to > > come up > > with a 2-table version of what I've done? The use cases aren't clear > > enough > > in my mind yet for me to be able to do it. It seems that the bare > > minimum we > > need is just another column in the RepeatRegionAAFeature, parent_id; > > which > > would let us represent explicitly that a particular repeat is a > > *necessary* > > (versus incidental) component of another NA/AAFeature. Both AAFeatureImp > > and NAFeatureImp already have a parent_id, so this would be a > > straightforward > > change. The queries still might not be terribly efficient, but I > > don't know > > what exactly you wanted to support in terms of queries, versus just > > making > > sure that the representation is sufficiently rich to capture the > > structure. > > A case we came across here for Tbrucei is nested repeat regions (at the > DNA level). Each repeat region has coordinates and is annotated with a > unique repeat unit type. This repeat region can be within a bigger > repeat region annotated with a different repeat unit type. > ... which is in other words your suggestion with parent_id as an extra > attribute ... > > Regarding transposon repeat types, if we have a TransposableElement > feature and its type is given as an attribute, a repeat feature will > just be useful to locate the LTRs within a given a transposable element. > Can we keep this functionality ? Then the feature will be simple, just a > repeat_type, and a parent_id atributes. > > > > >> In any case, NA repeats and AA repeats should have the same design. > >> Just the controlled vocabulary representing the types of repeats will > >> be different. > > > > > > Absolutely, yes, although one question is whether AA repeats can have the > > same kind of nested structure that you mention as a possibility for NA > > repeats (the transposon with LTRs). I don't know the answer to this. > > > >>> -DoTS.Interaction (table modified, dependent tables added) > >>> *Added "has_direction" column, as discussed previously. The idea > >>> here is that > >>> not all interactions (particularly physical ones, e.g., > >>> dimerization) have a > >>> direction. If has_direction == 0, then the value of > >>> direction_is_known can > >>> be ignored. > >>> *Added non-nullable "effector_action_type_id" column, referencing > >>> DoTS.EffectorActionType (a new table.) This column/table > >>> represents the possible > >>> things that an effector can do to a target. For example, the > >>> InteractionType > >>> associated with the Interaction could be "binds to" (e.g., a > >>> promoter region), and > >>> the EffectorActionType for that Interaction could be to either > >>> "inhibit" or "enhance" > >>> expression of the coresponding gene. > >>> *Replaced effector_table_id and effector_row_id with > >>> effector_row_set_id, and > >>> similarly for the target_table_id and target_row_id. This allows > >>> us to represent > >>> the interaction of a set of objects (the effector) with another set > >>> of objects > >>> (the target.) Previously the Interaction table could only > >>> represent the interaction > >>> between a single pair of entities (OK if they happened to be > >>> Complexes, for example, > >>> but a potential problem in other situations.) Now both effector > >>> and target are represented as references to DoTS.RowSet, which in > >>> tun references DoTS.RowSetMember, > >>> which...in turn...references the individual database rows that > >>> comprise the effector > >>> or target. These tables (RowSet and RowSetMember) are essentially > >>> the same as Complex and ComplexComponent, except that they are > >>> totally generic; they can be used to group any set of rows in the > >>> database and they store no additional information. However, if > >>> there are any additional columns that we can think of (that are > >>> specific to Interactions) then these tables should be replaced by > >>> less generic ones (e.g. InteractingEntitySet or InteractionSet, or > >>> something along those lines.) > >>> > >>> > >> Sounds fine. The only thing I can see is regarding the > >> EffectorActionType. If each effector, member of a RowSet, has a > >> different action type, the attribute, effector_action_type_id, should > >> go in the RowSetMember table. I don't have any example though. > > > > > > OK, I think I'd be inclined to wait until we have some use cases for > > this. > > Although the current schema lets us group effectors together, it > > doesn't let > > us say (for example) that E1 interacts *directly* with T1 to > > phosphorylate > > it, but that E1's active site is only exposed when E1 is bound to E2. In > > other words, E1's role in the activity can be viewed as "primary", and > > E2's > > role is secondary (in some sense) but all we can say in the schema is > > that > > the Complex consisting of E1 and E2 interacts with T1 to phosphorylate > > it. > > I think that the solution we have now is OK, but it only lets us > > represent > > the overall action of the entire set of effectors. > > Let's leave the design as it is for now. Curators are not going to > curate interactions data in the short term. We shall come back later > with more precise ideas/use cases about them. > > > > > Jonathan > > > > Arnaud > > ------------------------------------------------------- > This SF.NET email is sponsored by: Thawte.com > Understand how to protect your customers personal information by implementing > SSL on your Apache Web Server. Click here to get our FREE Thawte Apache > Guide: http://ads.sourceforge.net/cgi-bin/redirect.pl?thaw0029en > _______________________________________________ > Gusdev-gusdev mailing list > Gus...@li... > https://lists.sourceforge.net/lists/listinfo/gusdev-gusdev |
From: Arnaud K. <ax...@sa...> - 2003-01-23 22:45:22
|
Hi Quoting Jonathan Crabtree <cra...@pc...>: > > Arnaud- > > > * What about an extra "automatically created" entry, along the "manually > > created" one ? > > We've decided to drop the "manually created" term, and while it would be > useful to know which entries were created automatically vs. manually, > this information can be tracked using a combination of the Evidence and/or > AlgorithmInvocation tables. If you think it's crucial to record how an > entry was *originally* created, then we should consider adding an extra > column in the relevant tables to record this (and also to enable fast > queries to retrieve entries based on their origin.) > > > * Curators here has raised another point : they want to be able to track > > when was the last time the feature has been reviewed. By reviewed I mean > > checked even if the review status is already set on "reviewed, correct". > > Is there any way of storing a "last_checked_date" ? > > I'm thinking of curated similarity evidences. Regularly new searches > > would be done and a curator would want to check that any new hit would > > confirm or cancel a prediction. > > I think that we're currently working under the assumption that we'll use > the > modification_date for this (which may or may not meet your needs). When an > annotator first reviews an unreviewed entry, its status is set to either > "manually reviewed, correct" or "manually reviewed, incorrect" and its > modification_date is updated (and the old row versioned.) The > modification_date > now records the date of last review. Now, when something happens that > might > change the status of the entry (e.g., new searches are performed), its > status > gets changed to "updated" and its modification_date is updated. At this > point > the only way to tell the date of last manual review is to look in the > version > table. (Although one does know that the last manual review must have been > before the stated modification_date.) When the entry comes up for review > again (since it's now marked "updated"), its status is changed once more, > and > it's modification_date will once again reflect the time of the most recent > manual review. > > So the reason (in our current system) that we don't independently store the > last_checked_date is because we don't care when the entry was last checked > in absolute terms; we only care whether anything has changed since then. > One > problem with this (versus just storing a last_checked_date) is that it > means > that any program that makes changes (e.g., the one that runs the similarity > searches) must determine what entries in the database *may* have been > affected, > and update their review_status. > I think it would be an interesting functionality. A way for the annotators/curators to be informed than a run may affect some entries in the database and that they should be reviewed. It sounds like some sort of "triggers" with a set of rules that specify, for a given run, which entries may be affected. But it's probably not a simple task to implement! > I think I agree that last_manual_review_date would be a useful thing to > have, > but I think that its addition will have to be deferred until after I've > finished working on the migration, because it will affect several tables. > I'll put it on my list of things to deal with after the migration is done. > fine > Jonathan > Arnaud |
From: Arnaud K. <ax...@sa...> - 2003-01-23 22:23:56
|
Hi Jonathan Quoting Jonathan Crabtree <cra...@pc...>: > > Arnaud- > > Returning to some slightly older business... > > >> > >> <DoTS::GenomicSequence> > >> ^ ^ ^ ^ ^ > >> | | | | | > >> <DoTS::GeneFeature (RHS)> | | | | > >> ^ | | | | > >> | | | | | > >> <DoTS::TransposableElement (INGI)> | | | > >> ^ ^ | | | > >> | | | | | > >> | <DoTS::RepeatRegionNAFeature> | | > >> | ^ | | > >> | | | | > >> | 2 x <DoTS::RepeatFeature (RIME)> | > >> | | > >> | | > >> ------------------------<DoTS::GeneFeature (pseudo)> > >> > > > > My proposal is this representation without the repeat region feature. I > would > > see the repeat region feature to cluster together a sequence, whatever > the > > sequence is (even one base, or more), repeated X times, but not being used > in > > this situation. > > Meaning that you would only use the repeat region feature when X > 1, > right? yes > I'm suggesting that we combine the two tables, meaning that we would have > one > uniform representation for all X. I suppose that's probably my strongest > argument against the 2-table representation, namely that it seems arbitrary > to say that something is only a "repeat region" when it contains > 1 copy > of > a repeat. Wouldn't such a thing be better described as a tandem repeat? Yes I guess it could as a repeat region was meant to annotate tandemly repeated DNA sequences. I can see a TandemRepeatFeature in GUS very similar to the proposed RepeatRegionFeature. Are you planning to keep it in replacment of the RepeatRegionNAFeature ? > >>region". Specifically, can a repeat region contain things that are not > >>repeats, > > > > Yes ! a gene for example !! A repeat region would be used to cluster > tandemly > > repeated genes. But this should be fine as long as a gene feature can be > > attached to a repeat region. > > My question wasn't quite correct; I should have asked whether a repeat > region > can contain things that are not repeated. That is, could you use a repeat > region > to cluster tandemly repeated genes if those genes were separated by some > additional non-repeating sequences. It sounds like the answer is probably > "yes", I think so. > and that your definition of repeat region is simply any region that contains > two > or more copies of some type of sequence. Is this accurate? yes > > I think we want to represent a transposable element in a given context, ie > at a > > given location because this insertion may have consequences, (in)activating > a > > gene or shifting the frame of a gene etc. > > > > A core transposon should be represented as an entity on its own like genes > are. > > OK, I agree, and I think that fits with the current schema (except that we > have > yet to create a table to represent the transposons independent of their > location.) ok, I guess this can wait. I reckon that would involve that the Central Dogma side would not only represent genes, right ? > Jonathan > > -- > Jonathan Crabtree > Center for Bioinformatics, University of Pennsylvania > 1406 Blockley Hall, 423 Guardian Drive Philadelphia, PA 19104-6021 > 215-573-3115 > > Arnaud |
From: Joan M. <ma...@pc...> - 2003-01-23 19:34:56
|
Jonathan, For now, I think this is the best proposal. Joan Jonathan Crabtree wrote: > I forgot to include the latest proposal in the previous e-mail; it's the > same as before, but with "manually created" removed: > > 0 unreviewed Entry has never been manually reviewed. > 1 reviewed, correct Entry has been manually reviewed and is deemed to be correct. > 2 reviewed, incorrect Entry has been manually reviewed and is deemed to be incorrect. > 3 updated Entry has been updated since last being manually reviewed. > > Manually-created entries will initially be assigned "reviewed, correct". > > Jonathan > > ------------------------------------------------------- > This SF.NET email is sponsored by: > SourceForge Enterprise Edition + IBM + LinuxWorld = Something 2 See! > http://www.vasoftware.com > _______________________________________________ > Gusdev-gusdev mailing list > Gus...@li... > https://lists.sourceforge.net/lists/listinfo/gusdev-gusdev -- Joan Mazzarelli Computational Biology and Informatics Laboratory Center for Bioinformatics 1429 Blockley Hall University of Pennsylvania Philadelphia, PA 19104 |
From: Jonathan C. <cra...@pc...> - 2003-01-23 19:31:23
|
Arnaud- Returning to some slightly older business... >> >> <DoTS::GenomicSequence> >> ^ ^ ^ ^ ^ >> | | | | | >> <DoTS::GeneFeature (RHS)> | | | | >> ^ | | | | >> | | | | | >> <DoTS::TransposableElement (INGI)> | | | >> ^ ^ | | | >> | | | | | >> | <DoTS::RepeatRegionNAFeature> | | >> | ^ | | >> | | | | >> | 2 x <DoTS::RepeatFeature (RIME)> | >> | | >> | | >> ------------------------<DoTS::GeneFeature (pseudo)> >> > > My proposal is this representation without the repeat region feature. I would > see the repeat region feature to cluster together a sequence, whatever the > sequence is (even one base, or more), repeated X times, but not being used in > this situation. Meaning that you would only use the repeat region feature when X > 1, right? I'm suggesting that we combine the two tables, meaning that we would have one uniform representation for all X. I suppose that's probably my strongest argument against the 2-table representation, namely that it seems arbitrary to say that something is only a "repeat region" when it contains > 1 copy of a repeat. Wouldn't such a thing be better described as a tandem repeat? >>region". Specifically, can a repeat region contain things that are not >>repeats, > > Yes ! a gene for example !! A repeat region would be used to cluster tandemly > repeated genes. But this should be fine as long as a gene feature can be > attached to a repeat region. My question wasn't quite correct; I should have asked whether a repeat region can contain things that are not repeated. That is, could you use a repeat region to cluster tandemly repeated genes if those genes were separated by some additional non-repeating sequences. It sounds like the answer is probably "yes", and that your definition of repeat region is simply any region that contains two or more copies of some type of sequence. Is this accurate? > I think we want to represent a transposable element in a given context, ie at a > given location because this insertion may have consequences, (in)activating a > gene or shifting the frame of a gene etc. > > A core transposon should be represented as an entity on its own like genes are. OK, I agree, and I think that fits with the current schema (except that we have yet to create a table to represent the transposons independent of their location.) Jonathan -- Jonathan Crabtree Center for Bioinformatics, University of Pennsylvania 1406 Blockley Hall, 423 Guardian Drive Philadelphia, PA 19104-6021 215-573-3115 |
From: Jonathan C. <cra...@pc...> - 2003-01-23 18:13:49
|
Arnaud- > * What about an extra "automatically created" entry, along the "manually > created" one ? We've decided to drop the "manually created" term, and while it would be useful to know which entries were created automatically vs. manually, this information can be tracked using a combination of the Evidence and/or AlgorithmInvocation tables. If you think it's crucial to record how an entry was *originally* created, then we should consider adding an extra column in the relevant tables to record this (and also to enable fast queries to retrieve entries based on their origin.) > * Curators here has raised another point : they want to be able to track > when was the last time the feature has been reviewed. By reviewed I mean > checked even if the review status is already set on "reviewed, correct". > Is there any way of storing a "last_checked_date" ? > I'm thinking of curated similarity evidences. Regularly new searches > would be done and a curator would want to check that any new hit would > confirm or cancel a prediction. I think that we're currently working under the assumption that we'll use the modification_date for this (which may or may not meet your needs). When an annotator first reviews an unreviewed entry, its status is set to either "manually reviewed, correct" or "manually reviewed, incorrect" and its modification_date is updated (and the old row versioned.) The modification_date now records the date of last review. Now, when something happens that might change the status of the entry (e.g., new searches are performed), its status gets changed to "updated" and its modification_date is updated. At this point the only way to tell the date of last manual review is to look in the version table. (Although one does know that the last manual review must have been before the stated modification_date.) When the entry comes up for review again (since it's now marked "updated"), its status is changed once more, and it's modification_date will once again reflect the time of the most recent manual review. So the reason (in our current system) that we don't independently store the last_checked_date is because we don't care when the entry was last checked in absolute terms; we only care whether anything has changed since then. One problem with this (versus just storing a last_checked_date) is that it means that any program that makes changes (e.g., the one that runs the similarity searches) must determine what entries in the database *may* have been affected, and update their review_status. I think I agree that last_manual_review_date would be a useful thing to have, but I think that its addition will have to be deferred until after I've finished working on the migration, because it will affect several tables. I'll put it on my list of things to deal with after the migration is done. Jonathan |
From: Jonathan C. <cra...@pc...> - 2003-01-23 18:06:14
|
I forgot to include the latest proposal in the previous e-mail; it's the same as before, but with "manually created" removed: 0 unreviewed Entry has never been manually reviewed. 1 reviewed, correct Entry has been manually reviewed and is deemed to be correct. 2 reviewed, incorrect Entry has been manually reviewed and is deemed to be incorrect. 3 updated Entry has been updated since last being manually reviewed. Manually-created entries will initially be assigned "reviewed, correct". Jonathan |
From: Jonathan C. <cra...@pc...> - 2003-01-23 17:55:26
|
Debbie- > "manually created" doesn't seem as if it is a review status and this > situation could be covered with "manually reviewed correct" with evidence > being that it was manually created. It is a review status if you want to differentiate between "implicitly reviewed" (i.e., the annotator created it, and he/she would not have done so if he/she did not believe it to be correct) and "explicitly reviewed" (i.e., the entry, which already exists in the database, was retrieved and then examined to determine whether it's correct.) However, it's not clear that this is a distinction (between two different kinds of "reviewed, correct") that we should be making in ReviewStatus. There are at least two separate questions here: 1. Do we want to track which entries in the database were created manually, versus those that were created automatically and then approved by an annotator? I think that we're all in agreement that the answer to this question is a resounding "yes". Given that, the second question is: 2. Where should this information be stored? As you point out, we could record this information using the Evidence table. And, as I mentioned in a previous e-mail, we *have* to do it this way unless we change our ReviewStatus vocabulary so that each and every term in the vocabulary records whether the entry was originally created manually or automatically (so that we can track its original status through one or more rounds of update/re-review.) I don't think that this is a good idea, and after talking to Jonathan about it I think we're in agreement that we should drop the term for "manually created." We also have to bear in mind that our current notion of ReviewStatus is something that's fairly closely tied to the annotation process that we use in DoTS. There's nothing wrong with that, but it's quite possible that other sites will have different ideas about how ReviewStatus should be used. So at some point we should revisit this, but as long as the revised set of terms (see below) is agreeable to everyone on the mailing list, I think that we should stick with it for the time being. > I have to agree with Joan, that it may be safer to stick as closely to the > existing manually_reviewed values as possible, 0=unreviewed and 1=manually > reviewed correct and add 2=manually reviewed incorrect as well as > 4=updated. Can you be more specific about why changing the actual ids would be unsafe? (I hope you're not threatening me :)) I trust that you're not planning to rely on having hard-coded review_status_ids in your GUS 3.0 programs and queries, right? I myself have plenty of GUS 2.x scripts and queries that contain hard-coded internal identifiers (e.g., sequence_type_ids and external_db_ids, to name two of the most frequently-used ones.) However, when I convert these scripts to GUS 3.0 I'm going to have to rewrite them to be portable, meaning that I can't assume that other copies of GUS (perhaps running at other sites) will have the same internal ids. Unless we're willing to take these ids and publish them (as, for example, the GO consortium has done with their GO IDs), we can't rely on their being constant across different copies of GUS; it's just not good programming practice. Jonathan |