From: Arnaud K. <ax...@sa...> - 2003-01-14 11:27:06
|
Hi Jonathan Thanks for doing this. Please find below some comments I've inserted. I have noticed that your changes don't cover the DNA/RNA features. Is there any reason for this ? I know there are quite a lot of them and there might be another way of storing data some information such as telomere or centromere regions, origin of replication, inflection point etc. All these features are covered by Sequence Ontology, so a new ChromosomeElement or ChromosomeRegion feature could be generic enough to cover most of them. Let me know what you think. cheers Arnaud Jonathan Crabtree wrote: >Hi all- > >The attached text file describes the schema changes that I just finished >implementing. It's attached as a separate file to avoid problems with the >mail clients changing the line wrapping. Sorry if there are any typos, >but it's getting late and I want to get this out there for everyone to >look at in the morning. > >Jonathan > > > >------------------------------------------------------------------------ > > >Hi all- > >Here are the schema changes that I've just finished implementing: > >-Protein properties (new table: DoTS.ProteinProperty) > A new table that Arnaud requested back in July, but was overlooked in the earlier > schema changes. There are four possible protein properties as represented by the > following constraint (we could instead have a ProteinPropertyType table and treat > this as a controlled vocabulary): > > alter table DOTS.PROTEINPROPERTY add constraint PROTEINPROPERTY_CK01 check > (property_name in ('isoelectric point', 'molecular mass', 'charge', 'average residue mass')); > > The table allows multiple protein properties of the same type to be associated with > entries in DoTS.AASequenceImp. Arnaud had suggested originally that the last > property, average residue mass, could actually be an attribute of the table that > stores the protein sequence itself. However, it seemed that if the molecular > mass attribute could have multiple values (e.g., from different experiments) then > the same should be true of the average residue mass, which is essentially a > derived property. Let me know if you disagree with this, or think we should > create an explicit controlled vocab. for these 4 properties. > > A controlled vocabulary table with the four attributes you've mentioned is fine. >-Protein features > *Signal peptide features (stored in DoTS.SignalPeptideFeature) > This view exists already, as DoTS.SignalPeptideFeature, but we need to add the > ability to store curated data, such as targetting information. It should be > straightforward to modify the view to accomodate this, but I'm not sure exactly > what needs to be stored. Currently we use the view exclusively for SignalP > predictions, and from what I understand SignalP is only concerned with predicting > secreted proteins, meaning that we don't currently have any explicit targetting > information. Is this something we could represent using the GO ontology for cellular > localization? Do we also need some free text columns? Let me know and I'll make > the changes. All the SignalP-specific columns appear to be nullable, so we don't > necessarily have to do anything except add the new columns for the manually curated > information. > > After talking to the curators it appears that GO component suplements targetting information at the feature level but will not be enough. The targeting information is represented by the component ontology in one context i.e. mitochondrial, nuclear, membrane localization but not in the context of the actual residues involved. The actual residues involved in the targeting (or any other phenomena) need to be represented by a protein feature ontology can be mapped onto specific amino acids of a protein. This ontology is the equivalent of Sequence Ontology (SO) which is meant for DNA features. It is being prepared by Val Wood with input from Swiss-prot. As you're going to add a extra attribute sequence_ontology_id to the NA Features, could you do the same to any AA Features ? > *Domain/motif features (new view: DoTS.DomainFeature) > I've created this as a view on AAFeatureImp. You can either use the NAME column to > specify the type of domain (e.g., "leucine zipper" or "coiled coil"), or include > an explicit reference to a domain/motif database (SMART, ProSite) using the > external_database_release_id and source_id columns. PFam is handled as a special > case, with a specific pfam_entry_id column that references the PfamEntry table. > This was originally done because the entries in the PFam database are HMMs, so > they don't fit too well in the sequence-related tables. Most other motif databases > have consensus sequences for their motifs that we can store in MotifAASequence. > > Note that motif/domain features are currently stored in GUS in the PredictedAAFeature > table, which is also a view on AAFeatureImp. After the migration I plan to eliminate > the PredictedAAFeature view and move its contents into feature-specific tables (like > DomainFeature) instead. > > *Transmembrane domain features (stored in DoTS.PredictedAAFeature) > "PlasmoDB web site shows hydrophobicity graphics, where is it stored in GUS?" > The hydrophobicity plots are computed dynamically based on the amino acid sequence. > Transmembrane domains are currently stored in the PredictedAAFeature view, although > I will probably create a new view for them when I get around to eliminating > PredictedAAFeature. Another possibility would be to treat TM domains as another > type of domain, and store them in DomainFeature. What do you think about this? > > I reckon they could be merged. > *Post-translational modification features (new view: DoTS:PostTranslationalModFeature) > Has a "type" column to represent the type of modification. It was also suggested > that we have a column called "modified_by", which would be a reference to the > Interaction table. However, isn't it possible that the same post-translational > modification (e.g., phosphorylation of a specific amino acid) could be the result > of one of several Interactions? > yes you're right, the effector could be different. In that case the attribute "modified_by" is not useful. > This argues for an additional relationship > between Interaction and PostTranslationalModFeature, unless we're OK creating > multiple PostTranslationalModFeatures, identical except for their modified_by > attribute. Comments on this? > > I don't think they should be duplicated as they corresponds to a unique site. This unique feature would be associated with different interaction entries. We might not need an extra table between Interaction and PostTranslationalModFeature though. We still can do the following query : "give me all the interaction entries which target is a PostTranslationalModFeature which id is ...". How does it sound ? > *AA repeats (new view: RepeatRegionAAFeature) > I called this view RepeatRegionAAFeature in case we want to have a similar view > for NASequences. I also created only a single view, instead of following Arnaud's > original suggestion, which was for both: > > * RepeatRegionFeature as a set of RepeatUnitFeatures, > * RepeatUnitFeature, with the consensus sequence, name and size > > I based the design of this view on that of TandemRepeatFeature, which we have for > NASequences already. Instead of splitting the consensus sequence, name, and size > into a separate table, they occupy columns in RepeatRegionAAFeature. This works > quite well for the tandem repeats we already have (for DNA sequences.) However, if > there is a known set of named amino acid sequence repeats, then it would probably > make sense to do what Arnaud suggested, and store these in a separate table > (likely named RepeatUnit, not RepeatUnitFeature, since they would have no unique > locations.) Does this sound reasonable? That is, put the consensus seqs in the > repeat region table itself if they're anonymous, but if they've been named, then > store them in a separate table. Also note that this view has a reference to > RepeatType, although the current contents of this table are probably applicable > only to DNA sequence repeats (LINEs, SINEs, ALUs, etc.), since I believe that I > parsed them out of RepBase. > > I proposed a separate repeat feature because one may want to annotate a repeat outside a repeat region, for example LTR repeats attached to a given transposable element. These RepeatFeatures or RepeatUnitFeatures can then have a location. The other case is when a repeat region is made of a set of different repeat units. In any case, NA repeats and AA repeats should have the same design. Just the controlled vocabulary representing the types of repeats will be different. > *2D structures (not currently represented) > "Another question : What about 2D structures (beta-sheet and alpha-helice) in GUS?" > I don't *believe* we have any of these. They should be easy to add as either a > single feature view, or a set of views. > > fine. >-DoTS.Interaction (table modified, dependent tables added) > *Added "has_direction" column, as discussed previously. The idea here is that > not all interactions (particularly physical ones, e.g., dimerization) have a > direction. If has_direction == 0, then the value of direction_is_known can > be ignored. > *Added non-nullable "effector_action_type_id" column, referencing > DoTS.EffectorActionType (a new table.) This column/table represents the possible > things that an effector can do to a target. For example, the InteractionType > associated with the Interaction could be "binds to" (e.g., a promoter region), and > the EffectorActionType for that Interaction could be to either "inhibit" or "enhance" > expression of the coresponding gene. > *Replaced effector_table_id and effector_row_id with effector_row_set_id, and > similarly for the target_table_id and target_row_id. This allows us to represent > the interaction of a set of objects (the effector) with another set of objects > (the target.) Previously the Interaction table could only represent the interaction > between a single pair of entities (OK if they happened to be Complexes, for example, > but a potential problem in other situations.) Now both effector and target are > represented as references to DoTS.RowSet, which in tun references DoTS.RowSetMember, > which...in turn...references the individual database rows that comprise the effector > or target. These tables (RowSet and RowSetMember) are essentially the same as > Complex and ComplexComponent, except that they are totally generic; they can be > used to group any set of rows in the database and they store no additional information. > However, if there are any additional columns that we can think of (that are specific > to Interactions) then these tables should be replaced by less generic ones (e.g. > InteractingEntitySet or InteractionSet, or something along those lines.) > > Sounds fine. The only thing I can see is regarding the EffectorActionType. If each effector, member of a RowSet, has a different action type, the attribute, effector_action_type_id, should go in the RowSetMember table. I don't have any example though. >-DoTS.Attribution (new table) > A table intended to allow us to attribute data to people and/or organizations, using > the Contact table. It is a many-to-1 relationship between SReS.Contact and any row in > the DoTS schema. > > Fine. We already agreed on this implementation. >-SRes.BibRefType (new table), SRes.BibliographicReference (modified table) > Added new table, BibRefType, to represent the different types of references/publications > that one might encounter. I've populated this table based on a combination of the terms > used in MEDLINE 2003 and those from FlyBase, as well as one or two rows of my own > devising. Correspondingly, a non-nullable column has been added to BibliographicReference > to allow one to specify the BibRefType of the reference. I've also added a contact_id > column to BibliographicReference, to be used in the case where the BibRefType == > "personal communication". You can find the rows that I've placed in BibRefType in the > 3.0 db creation scripts mentioned below (in the file gus30-sres-BibRefType-rows.sql). > >I think those are the main changes. I also wrote a couple of scripts to help check >and maintain the version tables (those ending in "Ver") and to check that the actual >database schema and the information stored in Core.TableInfo actually agree. In the >process I fixed a number of problems, although there are still some things to be done, >such as: > > -Create SEQUENCE objects for all the tables (or at least modify the database dump > script to generate CREATE SEQUENCE statements for all the tables) > -Check that all the foreign key constraints have been defined correctly > -Check that all the foreign key columns are indexed correctly (I have a script > that will do this) > -Add sequence_ontology_id to all the views on NAFeatureImp. > >I've updated the schema browser, so you should be able to see all the new and >modified tables online: > > <http://www.cbil.upenn.edu/cgi-bin/GUS30/schemaBrowser.pl?db=GUS30> > >There's also a prelimary dump of the create database scripts, which should be consistent >with's shown in the schema browser: > > <http://www.cbil.upenn.edu/downloads/GUS/releases/3.0-beta/schema/> > >Jonathan > > > > |
From: Arnaud K. <ax...@sa...> - 2003-01-16 14:02:56
|
Hi Jonathan Jonathan Crabtree wrote: > > Arnaud- > > Thanks for the feedback; I think we're getting close to agreement here. I think so too ! >> I have noticed that your changes don't cover the DNA/RNA features. Is >> there any reason for this ? I know there are quite a lot of them and >> there might be another way of storing data some information such as >> telomere or centromere regions, origin of replication, inflection >> point etc. All these features are covered by Sequence Ontology, so a >> new ChromosomeElement or ChromosomeRegion feature could be generic >> enough to cover most of them. >> Let me know what you think. > > > Which DNA/RNA features do you mean (other than those mentioned above)? The file I sent you should include views on the top of NAFeatureImp table. Here the list : * ChromosomeElement or we can keep CentromereFeature and TelomereFeature as they are in gusdev - IMPORTANT * InfectionPointFeature * ReplicationFeature, for annotated origins of replication * RNARegulatory - as there is a DNARegulatory feature => regulatory element at the RNA level * RNASecondaryStructure * SpliceSiteFeature * TransposableElement + an extra attribute in RestrictionFragmentFeature, "type_of_cut" (Sticky or blunt) + an extra attribute in GeneSynonym, "is_obsolete" + a new view on the top of NASequenceImp, "GenomicSequence" instead of the existing one, ExternalNASequence. I can send the files to you if you want. > > It's possible that I misplaced the e-mail or notes where we discussed > these. Or are you just saying that we will eventually have a view for > each type of DNA/RNA feature in the Sequence Ontology? I think that > this is true, although I hadn't planned to make the change immediately, > since I believe we had agreed on a "transitional" period in which the > various NAFeature views would first be given a nullable > sequence_ontology_id Yes we had! So regarding chromosome regions, shall we keep TelomereFeature and CentromereFeature ? > and we would then decide how to best rearrange the views to more closely > match the ontology terms. I haven't added the sequence_ontology_id > column to the NAFeature views, but I will do so right away. We do > currently have some relevant NAFeature views in gusdev that have not > been migrated into 3.0: > > CentromereFeature > LowComplexityNAFeature > ScaffoldGapFeature > TelomereFeature > > I have no objection to merging the telomere and centromere features into > a single view--along with any other chromosomal regions covered by the > ontology--although it would mean that we wouldn't have a 1-1 mapping > between sequence ontology terms and views on NAFeature. I think that > at one point this was proposed as the eventual goal of the rearrangement. > Anyway, given that I'm not certain of the plan here, I'm going to add > the sequence_ontology_id column but leave the views unchanged for now. > They can easily be changed without interfering with our data migration, > so their fate doesn't have to be settled immediately. We have yet to > establish a consistent set of rules for deciding when different types > of features get grouped into a single view and when they get their own > views, so this is probably a good opportunity to settle the question > once and for all. The Sequence Ontology is big enough that we probably > *don't* want a view for each and every term in the ontology; it would > make maintenance quite difficult. But we could, for example, create a > view for every top-level (or second-level) sequence ontology term. > However, even a relatively high-level feature like "chromosomal region" > (SO:0000711) looks like it's already a 4th or 5th level feature... > At > the other extreme, we could continue what we're doing now, i.e. using > an ad-hoc classification of features based on the data we actually have > available, and just make sure that every feature is tagged with the > correct sequence ontology term. Any thoughts? It makes sense as SO may undergo revisions this year. > >>> >>> alter table DOTS.PROTEINPROPERTY add constraint PROTEINPROPERTY_CK01 >>> check (property_name in ('isoelectric point', 'molecular mass', >>> 'charge', 'average residue mass')); >>> >>> The table allows multiple protein properties of the same type to be >>> associated with >>> entries in DoTS.AASequenceImp. Arnaud had suggested originally that >>> the last property, average residue mass, could actually be an >>> attribute of the table that stores the protein sequence itself. >>> However, it seemed that if the molecular mass attribute could have >>> multiple values (e.g., from different experiments) then >>> the same should be true of the average residue mass, which is >>> essentially a derived property. Let me know if you disagree with >>> this, or think we should create an explicit controlled vocab. for >>> these 4 properties. >>> >>> >> A controlled vocabulary table with the four attributes you've >> mentioned is fine. > > > OK, I'll make this change. > >>> -Protein features >>> *Signal peptide features (stored in DoTS.SignalPeptideFeature) >>> This view exists already, as DoTS.SignalPeptideFeature, but we need >>> to add the >>> ability to store curated data, such as targetting information. It >>> should be straightforward to modify the view to accomodate this, >>> but I'm not sure exactly >>> what needs to be stored. Currently we use the view exclusively for >>> SignalP >>> predictions, and from what I understand SignalP is only concerned >>> with predicting >>> secreted proteins, meaning that we don't currently have any >>> explicit targetting information. Is this something we could >>> represent using the GO ontology for cellular localization? Do we >>> also need some free text columns? Let me know and I'll make >>> the changes. All the SignalP-specific columns appear to be >>> nullable, so we don't >>> necessarily have to do anything except add the new columns for the >>> manually curated >>> information. >>> >>> >> After talking to the curators it appears that GO component suplements >> targetting information at the feature level but will not be enough. >> The targeting information is represented by the component ontology in >> one context i.e. mitochondrial, nuclear, membrane localization but >> not in the context of the actual residues involved. >> The actual residues involved in the targeting (or any other >> phenomena) need to be represented by a protein feature ontology can >> be mapped onto specific amino acids of a protein. >> This ontology is the equivalent of Sequence Ontology (SO) which is >> meant for DNA features. It is being prepared by Val Wood with input >> from Swiss-prot. > > > OK, so the idea is that the various signal peptides have been classified > into named classes that should be represented by some kind of ontology? > >> As you're going to add a extra attribute sequence_ontology_id to the >> NA Features, could you do the same to any AA Features ? > > > This will only work if the new ontology is actually part of the Sequence > Ontology (or if we use the SequenceOntology table to store both > ontologies.) > Do you know if this is the case? It's quite possible, since the SO does > already cover amino acid features. Otherwise we'll have to create a > separate AASequenceOntology (or whatever the new ontology is called). It is at the moment a different project but it would make sense they merge in the future. Just to give you an idea about Localization Signals, here is a snapshot: %localization signal %N-terminal signal sequence %nuclear localization signal %bipartite nuclear localization signal %etc %mitochondrial localization sequence %thylakoid localization signal %ER retention signal The way the SignalPeptideFeature is designed make difficult the annotation of localization signal features. We can leave SignalPeptideFeature as it is as it fits with SignalP software prediction and in the future create a new feature LocalizationSignalFeature. > >>> *Transmembrane domain features (stored in DoTS.PredictedAAFeature) >>> "PlasmoDB web site shows hydrophobicity graphics, where is it >>> stored in GUS?" >>> The hydrophobicity plots are computed dynamically based on the >>> amino acid sequence. >>> Transmembrane domains are currently stored in the >>> PredictedAAFeature view, although >>> I will probably create a new view for them when I get around to >>> eliminating PredictedAAFeature. Another possibility would be to >>> treat TM domains as another >>> type of domain, and store them in DomainFeature. What do you think >>> about this? >>> >>> >> I reckon they could be merged. > > > OK, sounds good. > >>> *Post-translational modification features (new view: >>> DoTS:PostTranslationalModFeature) >>> Has a "type" column to represent the type of modification. It was >>> also suggested >>> that we have a column called "modified_by", which would be a >>> reference to the Interaction table. However, isn't it possible >>> that the same post-translational >>> modification (e.g., phosphorylation of a specific amino acid) could >>> be the result >>> of one of several Interactions? >> >> yes you're right, the effector could be different. In that case the >> attribute >> "modified_by" is not useful. >> >>> This argues for an additional relationship between Interaction and >>> PostTranslationalModFeature, unless we're OK creating multiple >>> PostTranslationalModFeatures, identical except for their modified_by >>> attribute. Comments on this? >>> >>> >> I don't think they should be duplicated as they corresponds to a >> unique site. This unique feature would >> be associated with different interaction entries. We might not need >> an extra table between Interaction and PostTranslationalModFeature >> though. We still can do the following query : "give me all the >> interaction entries which target is a PostTranslationalModFeature >> which id is ...". >> How does it sound ? > > > We could do this, although one question is whether, semantically > speaking, > the "target" of an Interaction should be "the thing to be modified" > (e.g. an > unphosphorylated sequence or residue) or "the resulting modification" > (e.g. > the feature that represents a phosphorylated residue at the appropriate > location.) The answer is probably that we just shouldn't worry about it > and should just do whatever is most convenient on a case-by-case basis. > To do it "correctly" would be problematic either way. For example, if we > say that the target is the thing to be modified, then we have to create a > feature that represents a region of sequence that *could* be modified in > some way and then create another feature to represent the actual > modification. > But if we say that the target is the result of the modification then > we may > have to create equally unusual tables/views. For example, if the > result of > a given interaction is to degrade a protein, then do we have to create a > table/object that represents a degraded protein (or "nothing", or > whatever > it is that's left after the modification)? For now I have no problem > with > interpreting the "target" based on context, but in the longer term we may > want to consider separating the notions of "target prior to modification" > and either "target after modification" or "effect of modification". > > I also realized belatedly that I could have left the Interaction table > unchanged, rather than introducing specific references to RowSet. This > would have allowed us to represent either singleton effectors/targets or > set-valued effectors/targets, without having to always join through > RowSet > in the singleton case. On the other hand, if we do associate some > additional information with the RowSets, then the current representation > is correct. It depends if we want to represent many-to-many relationship between interaction and members of this interaction. Without the RowSet table, we can't assign a set of several effectors/targets, right ? Unless we consider that this set of effectors are being part of a complex and act as the whole. > >>> *AA repeats (new view: RepeatRegionAAFeature) >>> I called this view RepeatRegionAAFeature in case we want to have a >>> similar view >>> for NASequences. I also created only a single view, instead of >>> following Arnaud's >>> original suggestion, which was for both: >>> >>> * RepeatRegionFeature as a set of RepeatUnitFeatures, >>> * RepeatUnitFeature, with the consensus sequence, name and size >>> >>> I based the design of this view on that of TandemRepeatFeature, >>> which we have for >>> NASequences already. Instead of splitting the consensus sequence, >>> name, and size >>> into a separate table, they occupy columns in >>> RepeatRegionAAFeature. This works >>> quite well for the tandem repeats we already have (for DNA >>> sequences.) However, if >>> there is a known set of named amino acid sequence repeats, then it >>> would probably >>> make sense to do what Arnaud suggested, and store these in a >>> separate table (likely named RepeatUnit, not RepeatUnitFeature, >>> since they would have no unique locations.) Does this sound >>> reasonable? That is, put the consensus seqs in the >>> repeat region table itself if they're anonymous, but if they've >>> been named, then store them in a separate table. Also note that >>> this view has a reference to RepeatType, although the current >>> contents of this table are probably applicable only to DNA sequence >>> repeats (LINEs, SINEs, ALUs, etc.), since I believe that I parsed >>> them out of RepBase. >>> >>> >> I proposed a separate repeat feature because one may want to annotate >> a repeat outside a repeat region, for example LTR repeats attached to >> a given transposable element. These RepeatFeatures or >> RepeatUnitFeatures can then have a location. >> The other case is when a repeat region is made of a set of different >> repeat units. > > > OK, I didn't realize that this was what you were trying to represent. As > currently conceived, RepeatRegionAAFeature is meant to represent a region > that contains one or more immediately adjacent copies of the same type > of (amino acid sequence) repeat. The assumption is also that these > regions > will typically be maximal (with respect to the chosen repeat type, > consensus, > and max. mismatch, the last of which is not represented directly in the > table.) We can still represent more complex repeat structures using this > single table, but the representation is implicit, not explicit (i.e. you > have to do a query to find out what other repeats lie within the > bounds of > the transposon, meaning that there's no easy way to query for all > transposable > elements with a particular flanking LTR structure.) Do you want to > come up > with a 2-table version of what I've done? The use cases aren't clear > enough > in my mind yet for me to be able to do it. It seems that the bare > minimum we > need is just another column in the RepeatRegionAAFeature, parent_id; > which > would let us represent explicitly that a particular repeat is a > *necessary* > (versus incidental) component of another NA/AAFeature. Both AAFeatureImp > and NAFeatureImp already have a parent_id, so this would be a > straightforward > change. The queries still might not be terribly efficient, but I > don't know > what exactly you wanted to support in terms of queries, versus just > making > sure that the representation is sufficiently rich to capture the > structure. A case we came across here for Tbrucei is nested repeat regions (at the DNA level). Each repeat region has coordinates and is annotated with a unique repeat unit type. This repeat region can be within a bigger repeat region annotated with a different repeat unit type. ... which is in other words your suggestion with parent_id as an extra attribute ... Regarding transposon repeat types, if we have a TransposableElement feature and its type is given as an attribute, a repeat feature will just be useful to locate the LTRs within a given a transposable element. Can we keep this functionality ? Then the feature will be simple, just a repeat_type, and a parent_id atributes. > >> In any case, NA repeats and AA repeats should have the same design. >> Just the controlled vocabulary representing the types of repeats will >> be different. > > > Absolutely, yes, although one question is whether AA repeats can have the > same kind of nested structure that you mention as a possibility for NA > repeats (the transposon with LTRs). I don't know the answer to this. > >>> -DoTS.Interaction (table modified, dependent tables added) >>> *Added "has_direction" column, as discussed previously. The idea >>> here is that >>> not all interactions (particularly physical ones, e.g., >>> dimerization) have a >>> direction. If has_direction == 0, then the value of >>> direction_is_known can >>> be ignored. >>> *Added non-nullable "effector_action_type_id" column, referencing >>> DoTS.EffectorActionType (a new table.) This column/table >>> represents the possible >>> things that an effector can do to a target. For example, the >>> InteractionType >>> associated with the Interaction could be "binds to" (e.g., a >>> promoter region), and >>> the EffectorActionType for that Interaction could be to either >>> "inhibit" or "enhance" >>> expression of the coresponding gene. >>> *Replaced effector_table_id and effector_row_id with >>> effector_row_set_id, and >>> similarly for the target_table_id and target_row_id. This allows >>> us to represent >>> the interaction of a set of objects (the effector) with another set >>> of objects >>> (the target.) Previously the Interaction table could only >>> represent the interaction >>> between a single pair of entities (OK if they happened to be >>> Complexes, for example, >>> but a potential problem in other situations.) Now both effector >>> and target are represented as references to DoTS.RowSet, which in >>> tun references DoTS.RowSetMember, >>> which...in turn...references the individual database rows that >>> comprise the effector >>> or target. These tables (RowSet and RowSetMember) are essentially >>> the same as Complex and ComplexComponent, except that they are >>> totally generic; they can be used to group any set of rows in the >>> database and they store no additional information. However, if >>> there are any additional columns that we can think of (that are >>> specific to Interactions) then these tables should be replaced by >>> less generic ones (e.g. InteractingEntitySet or InteractionSet, or >>> something along those lines.) >>> >>> >> Sounds fine. The only thing I can see is regarding the >> EffectorActionType. If each effector, member of a RowSet, has a >> different action type, the attribute, effector_action_type_id, should >> go in the RowSetMember table. I don't have any example though. > > > OK, I think I'd be inclined to wait until we have some use cases for > this. > Although the current schema lets us group effectors together, it > doesn't let > us say (for example) that E1 interacts *directly* with T1 to > phosphorylate > it, but that E1's active site is only exposed when E1 is bound to E2. In > other words, E1's role in the activity can be viewed as "primary", and > E2's > role is secondary (in some sense) but all we can say in the schema is > that > the Complex consisting of E1 and E2 interacts with T1 to phosphorylate > it. > I think that the solution we have now is OK, but it only lets us > represent > the overall action of the entire set of effectors. Let's leave the design as it is for now. Curators are not going to curate interactions data in the short term. We shall come back later with more precise ideas/use cases about them. > > Jonathan > Arnaud |
From: Jonathan C. <cra...@sn...> - 2003-01-17 05:41:27
Attachments:
gus-30-schema-changes-2.txt
|
Arnaud - > > Which DNA/RNA features do you mean (other than those mentioned above)? > > The file I sent you should include views on the top of NAFeatureImp > table. Here the list : Yes, you're absolutely right; there was a period when I wasn't paying very close attention to the schema mailing list, and I'm afraid I misplaced a couple of the files you sent, at least temporarily. I believe I've now added all the views and tables that you originally proposed, with some minor modifications to take into account discussions we've had since then. See the attached text file for a complete list of the changes I've made this time around. > Yes we had! So regarding chromosome regions, shall we keep > TelomereFeature and CentromereFeature ? No, I think we should use ChromosomeElementFeature instead; I've created this view based on the ChromosomeElement view you suggested, but with a couple of additional columns to handle the data currently in gusdev.TelomereFeature and gusdev.CentromereFeature. > > At > > the other extreme, we could continue what we're doing now, i.e. using > > an ad-hoc classification of features based on the data we actually have > > available, and just make sure that every feature is tagged with the > > correct sequence ontology term. Any thoughts? > > It makes sense as SO may undergo revisions this year. OK, as noted in the attachment, I've added sequence_ontology_id to *all* views of NAFeatureImp and AAFeatureImp. > >> A controlled vocabulary table with the four attributes you've > >> mentioned is fine. Done; it's called ProteinPropertyType, and the schema/contents are described in the attached list of changes. > >> As you're going to add a extra attribute sequence_ontology_id to the > >> NA Features, could you do the same to any AA Features ? OK, done. > The way the SignalPeptideFeature is designed make difficult the > annotation of localization signal features. We can leave > SignalPeptideFeature as it is as it fits with SignalP software > prediction and in the future create a new feature LocalizationSignalFeature. OK, based on our discussion today the only change I've made to SignalPeptideFeature is to add the sequence_ontology_id, which can be used to reference the different localization ontology terms that you mentioned. A column has been added to SequenceOntology to let us store multiple ontologies (and versions thereof) in the same table. Experimental evidence, references, and annotator's comments can be linked to SignalPeptideFeature (or a future LocalizationSignalFeature view) using DoTS.Evidence. > >> I reckon they could be merged. (This comment was in reference to incorporating TM domain features into the DomainFeature view.) I've added a "number_of_domains" column to DomainFeature to permit this. We will *not* have a separate view specifically for TM domain features. > > I also realized belatedly that I could have left the Interaction table > > unchanged, rather than introducing specific references to RowSet. This > > would have allowed us to represent either singleton effectors/targets or > > set-valued effectors/targets, without having to always join through > > RowSet > > in the singleton case. On the other hand, if we do associate some > > additional information with the RowSets, then the current representation > > is correct. > > It depends if we want to represent many-to-many relationship between > interaction and members of this interaction. Without the RowSet table, > we can't assign a set of several effectors/targets, right ? Unless we > consider that this set of effectors are being part of a complex and act > as the whole. It's true that without the RowSet table we can't assign a set of several effectors or targets. What I was trying to say was that I replaced the following rows in DoTS.Interaction-- effector_table_id effector_row_id (or something to that effect) using instead a single row that references a RowSet: effector_row_set_id However, I could have left the Interaction table unchanged, and used the effector_table_id and effector_row_id to reference entries in the RowSet table (in the case where there are multiple effectors.) With this approach one would have the choice of either using or not using the RowSet table on a case-by-case basis. I don't think it's too important which way we do this; on the one hand you save a join when you only need to reference a single effector/target (using the table_id/row_id approach) but on the other hand with the row_set_id approach you can write uniform code and also have an enforceable referential integrity constraint. So barring any strong objection, I'll leave the table as it is now (i.e., with explicit references to RowSet, meaning that you always have to have a RowSet even when the effector or target is a single object.) > A case we came across here for Tbrucei is nested repeat regions (at the > DNA level). Each repeat region has coordinates and is annotated with a > unique repeat unit type. This repeat region can be within a bigger > repeat region annotated with a different repeat unit type. > ... which is in other words your suggestion with parent_id as an extra > attribute ... I haven't added the parent_id yet, but I'll do so. > Regarding transposon repeat types, if we have a TransposableElement > feature and its type is given as an attribute, a repeat feature will > just be useful to locate the LTRs within a given a transposable element. > Can we keep this functionality ? Then the feature will be simple, just a > repeat_type, and a parent_id atributes. Are you saying that we still need the two tables/features, one for RepeatFeature, the other for RepeatRegionFeature? Could you give me a specific example of how you would envision using these tables (and also these tables in conjunction with the TransposableElement view, under the assumption that they're all equipped with parent_ids)? > Let's leave the design as it is for now. Curators are not going to > curate interactions data in the short term. We shall come back later > with more precise ideas/use cases about them. Sounds good. Let me know if there's anything I've missed. I'll try to generate updated SQL scripts tomorrow, and also update the schema browser so that everyone can review the changes one last time. Cheers, Jonathan |
From: Arnaud K. <ax...@sa...> - 2003-01-17 16:34:02
|
Hi Jonathan Jonathan Crabtree wrote: >Arnaud - > > > >>>Which DNA/RNA features do you mean (other than those mentioned above)? >>> >>> >>The file I sent you should include views on the top of NAFeatureImp >>table. Here the list : >> >> > >Yes, you're absolutely right; there was a period when I wasn't paying very >close attention to the schema mailing list, and I'm afraid I misplaced a >couple of the files you sent, at least temporarily. I believe I've >now added all the views and tables that you originally proposed, with >some minor modifications to take into account discussions we've had since >then. See the attached text file for a complete list of the changes I've >made this time around. > > > >>Yes we had! So regarding chromosome regions, shall we keep >>TelomereFeature and CentromereFeature ? >> >> > >No, I think we should use ChromosomeElementFeature instead; I've created >this view based on the ChromosomeElement view you suggested, but with a >couple of additional columns to handle the data currently in >gusdev.TelomereFeature and gusdev.CentromereFeature. > > > >>>At >>>the other extreme, we could continue what we're doing now, i.e. using >>>an ad-hoc classification of features based on the data we actually have >>>available, and just make sure that every feature is tagged with the >>>correct sequence ontology term. Any thoughts? >>> >>> >>It makes sense as SO may undergo revisions this year. >> >> > >OK, as noted in the attachment, I've added sequence_ontology_id to *all* >views of NAFeatureImp and AAFeatureImp. > > > >>>>A controlled vocabulary table with the four attributes you've >>>>mentioned is fine. >>>> >>>> > >Done; it's called ProteinPropertyType, and the schema/contents are >described in the attached list of changes. > > > >>>>As you're going to add a extra attribute sequence_ontology_id to the >>>>NA Features, could you do the same to any AA Features ? >>>> >>>> > >OK, done. > > > >>The way the SignalPeptideFeature is designed make difficult the >>annotation of localization signal features. We can leave >>SignalPeptideFeature as it is as it fits with SignalP software >>prediction and in the future create a new feature LocalizationSignalFeature. >> >> > >OK, based on our discussion today the only change I've made to >SignalPeptideFeature is to add the sequence_ontology_id, which can be >used to reference the different localization ontology terms that you >mentioned. A column has been added to SequenceOntology to let us store >multiple ontologies (and versions thereof) in the same table. >Experimental evidence, references, and annotator's comments can be linked >to SignalPeptideFeature (or a future LocalizationSignalFeature view) using >DoTS.Evidence. > > A quick question regarding evidences, you're mentioning that the Evidence table will connect Features and Experimental evidences. Where will the latter be stored ? > > >>>>I reckon they could be merged. >>>> >>>> > >(This comment was in reference to incorporating TM domain features into >the DomainFeature view.) I've added a "number_of_domains" column to >DomainFeature to permit this. We will *not* have a separate view >specifically for TM domain features. > > > >>>I also realized belatedly that I could have left the Interaction table >>>unchanged, rather than introducing specific references to RowSet. This >>>would have allowed us to represent either singleton effectors/targets or >>>set-valued effectors/targets, without having to always join through >>>RowSet >>>in the singleton case. On the other hand, if we do associate some >>>additional information with the RowSets, then the current representation >>>is correct. >>> >>> >>It depends if we want to represent many-to-many relationship between >>interaction and members of this interaction. Without the RowSet table, >>we can't assign a set of several effectors/targets, right ? Unless we >>consider that this set of effectors are being part of a complex and act >>as the whole. >> >> > >It's true that without the RowSet table we can't assign a set of several >effectors or targets. What I was trying to say was that I replaced the >following rows in DoTS.Interaction-- > effector_table_id > effector_row_id (or something to that effect) > >using instead a single row that references a RowSet: > effector_row_set_id > >However, I could have left the Interaction table unchanged, and used the >effector_table_id and effector_row_id to reference entries in the RowSet >table (in the case where there are multiple effectors.) With this >approach one would have the choice of either using or not using the RowSet >table on a case-by-case basis. I don't think it's too important which way >we do this; on the one hand you save a join when you only need to reference >a single effector/target (using the table_id/row_id approach) but on the >other hand with the row_set_id approach you can write uniform code and >also have an enforceable referential integrity constraint. So barring any >strong objection, I'll leave the table as it is now (i.e., with explicit >references to RowSet, meaning that you always have to have a RowSet even >when the effector or target is a single object.) > > fine, I think this way is more consistent as storing one and storing more than one effectors will be done the same way. > > >>A case we came across here for Tbrucei is nested repeat regions (at the >>DNA level). Each repeat region has coordinates and is annotated with a >>unique repeat unit type. This repeat region can be within a bigger >>repeat region annotated with a different repeat unit type. >>... which is in other words your suggestion with parent_id as an extra >>attribute ... >> >> > >I haven't added the parent_id yet, but I'll do so. > > > >>Regarding transposon repeat types, if we have a TransposableElement >>feature and its type is given as an attribute, a repeat feature will >>just be useful to locate the LTRs within a given a transposable element. >>Can we keep this functionality ? Then the feature will be simple, just a >>repeat_type, and a parent_id atributes. >> >> > >Are you saying that we still need the two tables/features, one for >RepeatFeature, the other for RepeatRegionFeature? Could you give me a >specific example of how you would envision using these tables (and also >these tables in conjunction with the TransposableElement view, under the >assumption that they're all equipped with parent_ids)? > > Here two examples of transposable elements annotations, one is from Tbrucei, the other one is a common one in procaryote genomes. The first one in the inclusion of a INGI transposon within an ORF, the RHS gene. The transposon includes two RIME flanking repeats and another ORF. So in GUS, the INGI transposon could be stored as a transposable element feature, attached to a RHS gene feature. The transposable element feature will have three sub features, a gene feature, tagged as a pseudo-gene and two repeat features, which repeat_type is RIME and with a given location. The second example is nested transposable elements in procaryote genomes, ie insertion of a transposable element within another one. Each transposable element can have a similar structure including the following sub features : two flanking Inverted Repeats, a gene and its promoter and/or a promoter, functional on the other strand ! So if there is no repeat feature, the flanking repeats will have to be annotated part of the transposable element feature. Let me know what you think about these. > > >>Let's leave the design as it is for now. Curators are not going to >>curate interactions data in the short term. We shall come back later >>with more precise ideas/use cases about them. >> >> > >Sounds good. Let me know if there's anything I've missed. I'll try to >generate updated SQL scripts tomorrow, and also update the schema browser >so that everyone can review the changes one last time. Cheers, > >Jonathan > > > >------------------------------------------------------------------------ > > >-Added nullable 'is_obsolete' column to DoTS.GeneSynonym >-Added and populated DoTS.ProteinPropertyType table (please correct/improve my > protein property descriptions, shown below.) I did not include a source_id column, > because that usually implies a reference to an external database (in conjunction > with an external_database_release_id to specify which database). > > 1 isoelectric point The pH at which the net charge of the entire polypeptide is zero. > 2 molecular mass The mass of the entire polypeptide. > 3 charge The net charge of the entire polypeptide. > 4 average residue mass The average mass of a single residue in the polypeptide chain. > >-Modified DoTS.ProteinProperty table to reference ProteinPropertyType > One question I have regarding these tables is how will the units be specified? > Should I make the "property_value" column a varchar2 column? It may have had > this type originally, and I might have changed it without considering the > consequences. One option would be to specify in the ProteinPropertyType table > what units are to be used, though this is clumsy if there is more than one > choice of units for a given property. > Whatever the unit they're in, they should all be numbers (some would be integer) so we can go for the "number" data type but float or varchar could also be fine! >-Created DoTS.SecondaryStructureAAFeature (instead of AASecondaryStructure) >-Created DoTS.TertiaryStructureAAFeature (instead of AATertiaryStructure) >-Created DoTS.ChromosomeElementFeature (instead of ChromosomeElement), with > a few additional columns to handle the data currently in gusdev.TelomereFeature > and gusdev.CentromereFeature >-Added "probability" column to DoTS.DomainFeature. >-Added "number_of_domains" column to DoTS.DomainFeature, so that it can be used > instead of the proposed TransmembraneDomainFeature to represent TM domains. >-Added DoTS.GenomicSequence view, with sequencing_center_contact_id instead of > the proposed free text column, "sequencing_center". >-Added sequencing_center_contact_id to DoTS.NASequenceImp to support this. >-Created DoTS.InflectionPointFeature >-Added columns to ProteinProperty to more closely reflect the original proposal > (e.g. prediction_algorithm_id, is_predicted, review_status_id, source_id) >-Modified DoTS.PostTranslationalModFeature as per Arnaud's original proposal >-Created DoTS.ReplicationFeature (should this be ReplicationOriginFeature?) > I reckon ReplicationOriginFeature would make more sense >-Added "type_of_cut" column to DoTS.RestrictionFragmentFeature >-Created DoTS.RNARegulatoryFeature (instead of RNARegulatory), but omitted the > "evidence" column; shouldn't the Evidence table be used for this purpose? >-Created DoTS.RNASecondaryStructureFeature (instead of RNASecondaryStructure) >-Created DoTS.SpliceSiteFeature >-Created DoTS.TransposableElement >-Added external_database_release_id to any view that has a source_id; these two > fields should always appear together, since by convention they are used to > specify a reference to an external database. (Admittedly this is somewhat > obscure, and we should probably think about using something more obvious.) >-Added sequence_ontology_id to AAFeatureImp and all of its views >-Added "ontology_name" column to SequenceOntology to allow us to store multiple > ontologies (na sequence + aa sequence) in the table. We *could* have used > the existing so_version column for this purpose, but I think adding an extra > column is a slightly better idea. Alternatively we could switch to using an > external_database_release_id, which I think we might have done for the GO > terms already. > > > cheers Arnaud |
From: Jonathan C. <cra...@pc...> - 2003-01-17 19:23:33
|
Arnaud- > A quick question regarding evidences, you're mentioning that the > Evidence table will connect Features and Experimental evidences. Where > will the latter be stored ? Hopefully others will chime in if I get this wrong... I believe that the relevant tables are DoTS.Comments (for free text notes/comments entered by an annotator) and SRes.BibliographicReference (for published experiments.) However, I don't think that we have a generic table to represent unpublished laboratory experiments in a structured way. Perhaps we need some use cases here? We do have your new table for representing RNAi constructs, but I don't think that we have a corresponding table to represent the actual RNAi experiment. Do we need/want such a table (either for RNAi experiments or in general) and, if so, how detailed does it need to be? > Here two examples of transposable elements annotations, one is from > Tbrucei, the other one is a common one in procaryote genomes. > > The first one in the inclusion of a INGI transposon within an ORF, the > RHS gene. The transposon includes two RIME flanking repeats and another ORF. > So in GUS, the INGI transposon could be stored as a transposable element > feature, attached to a RHS gene feature. The transposable element > feature will have three sub features, a gene feature, tagged as a > pseudo-gene and two repeat features, which repeat_type is RIME and with > a given location. So in the "current" schema (meaning that I'm assuming we have only a single repeat-related view, called RepeatRegionNAFeature, which is the NA equivalent of RepeatRegionAAFeature), the picture would look like this: <DoTS::GenomicSequence> ^ ^ ^ ^ | | | | <DoTS::GeneFeature (RHS)> | | | ^ | | | | | | | <DoTS::TransposableElement (INGI)> | | ^ ^ | | | | | | | 2 x <DoTS::RepeatRegionNAFeature (RIME)> | | | ------------------------<DoTS::GeneFeature (pseudo)> -For each feature the leftmost arrow shows the parent_id, the rightmost arrow shows the na_sequence_id. -All of the features will have a location specified in terms of the genomic sequence (because that's what their na_sequence_id references.) -I have to create 2 RepeatRegionNAFeatures under my definition, because the RIME repeats are not adjacent to one another. -Presumably the transposable element is contained in the coding region of a single exon, so the parent feature could be an ExonFeature instead of a GeneFeature. -Note that parent_id is typically used to indicate a part-whole relationship, in the sense that the part *must* have a corresponding whole (e.g. Exon to Gene). In the above picture and our discussions on this topic we've generalized its usage to also encompass the concept that one feature "happens to be" part of another i.e., that its NALocation is strictly within the bounds of its parent's NALocation, but that this need not be the case by definition. And I believe your proposal is for something that looks more like this: <DoTS::GenomicSequence> ^ ^ ^ ^ ^ | | | | | <DoTS::GeneFeature (RHS)> | | | | ^ | | | | | | | | | <DoTS::TransposableElement (INGI)> | | | ^ ^ | | | | | | | | | <DoTS::RepeatRegionNAFeature> | | | ^ | | | | | | | 2 x <DoTS::RepeatFeature (RIME)> | | | | | ------------------------<DoTS::GeneFeature (pseudo)> In other words, the RepeatRegionNAFeature serves only to group the two RIME repeats (which aren't even immediately adjacent to one another.) Is this what you had in mind? Or did you mean to make the RepeatRegionNAFeature a child of the GeneFeature and then make the TransposableElement a child of the RepeatRegionNAFeature? I'm just not clear on your definition of "repeat region". Specifically, can a repeat region contain things that are not repeats, and can it contain more than one type of repeat? And, if so, how does one assign bounds to the region in a non-arbitrary way? > The second example is nested transposable elements in procaryote > genomes, ie insertion of a transposable element within another one. Each > transposable element can have a similar structure including the > following sub features : two flanking Inverted Repeats, a gene and its > promoter and/or a promoter, functional on the other strand ! I won't try to draw the pictures for this one! In both the current schema and your proposal I think we have the problem that we haev no way of explicitly representing the relationship between the two flanking inverted repeats. Apart from that, however, I think that we can handle this case just as well as the first. You have to create quite a few features, but I don't think there's any way to avoid that unless we want to come up with some "exemplar" transposons and use them to classify the instances we encounter. The promoter/gene that's functional on the opposite strand would be represented simply as reverse-strand features (i.e., we'd set the is_reversed flag in their NALocations, but still use their parent_ids to indicate their place in the nested repeat structure.) > So if there is no repeat feature, the flanking repeats will have to be > annotated part of the transposable element feature. > Let me know what you think about these. But shouldn't they be part of the transposable element feature? I don't know the details of this specific type of transposon, but are you trying to make the distinction between: 1) the core transposon, i.e., the machinery that enables that part of the genome (encompassing both the machinery and perhaps some variable-sized flanking regions) to move around and 2) the "transposed" element, i.e. the core machinery plus whatever flanking regions happened to be carried along on the element's most recent trip (the one that brought it to its current location.)? >>-Modified DoTS.ProteinProperty table to reference ProteinPropertyType >> One question I have regarding these tables is how will the units be specified? >> Should I make the "property_value" column a varchar2 column? It may have had >> this type originally, and I might have changed it without considering the >> consequences. One option would be to specify in the ProteinPropertyType table >> what units are to be used, though this is clumsy if there is more than one >> choice of units for a given property. >> > Whatever the unit they're in, they should all be numbers (some would be > integer) so we can go for the "number" data type but float or varchar > could also be fine! Right, but the question is how does somebody querying the table know what a mass of "25" means? Are molecular masses always expressed in the same units, no matter what? My recollection is that you can sometimes have some pretty big polypeptides, but I don't know what the convention is. > I reckon ReplicationOriginFeature would make more sense OK, I'll make this change. Jonathan -- Jonathan Crabtree Center for Bioinformatics, University of Pennsylvania 1406 Blockley Hall, 423 Guardian Drive Philadelphia, PA 19104-6021 215-573-3115 |
From: Arnaud K. <ax...@sa...> - 2003-01-17 23:34:40
|
Quoting Jonathan Crabtree <cra...@pc...>: > > Arnaud- > > > Here two examples of transposable elements annotations, one is from > > Tbrucei, the other one is a common one in procaryote genomes. > > > > The first one in the inclusion of a INGI transposon within an ORF, the > > RHS gene. The transposon includes two RIME flanking repeats and another > ORF. > > So in GUS, the INGI transposon could be stored as a transposable element > > feature, attached to a RHS gene feature. The transposable element > > feature will have three sub features, a gene feature, tagged as a > > pseudo-gene and two repeat features, which repeat_type is RIME and with > > a given location. > > So in the "current" schema (meaning that I'm assuming we have only a single > repeat-related view, called RepeatRegionNAFeature, which is the NA > equivalent > of RepeatRegionAAFeature), the picture would look like this: > > <DoTS::GenomicSequence> > ^ ^ ^ ^ > | | | | > <DoTS::GeneFeature (RHS)> | | | > ^ | | | > | | | | > <DoTS::TransposableElement (INGI)> | | > ^ ^ | | > | | | | > | 2 x <DoTS::RepeatRegionNAFeature (RIME)> | > | | > ------------------------<DoTS::GeneFeature (pseudo)> > > -For each feature the leftmost arrow shows the parent_id, the rightmost > arrow shows the na_sequence_id. > -All of the features will have a location specified in terms of the > genomic sequence (because that's what their na_sequence_id references.) > -I have to create 2 RepeatRegionNAFeatures under my definition, because > the RIME repeats are not adjacent to one another. > -Presumably the transposable element is contained in the coding region > of a single exon, so the parent feature could be an ExonFeature instead > of a GeneFeature. > -Note that parent_id is typically used to indicate a part-whole > relationship, in the sense that the part *must* have a corresponding > whole (e.g. Exon to Gene). In the above picture and our discussions > on this topic we've generalized its usage to also encompass the > concept that one feature "happens to be" part of another i.e., > that its NALocation is strictly within the bounds of its parent's > NALocation, but that this need not be the case by definition. > > And I believe your proposal is for something that looks more like this: > > <DoTS::GenomicSequence> > ^ ^ ^ ^ ^ > | | | | | > <DoTS::GeneFeature (RHS)> | | | | > ^ | | | | > | | | | | > <DoTS::TransposableElement (INGI)> | | | > ^ ^ | | | > | | | | | > | <DoTS::RepeatRegionNAFeature> | | > | ^ | | > | | | | > | 2 x <DoTS::RepeatFeature (RIME)> | > | | > | | > ------------------------<DoTS::GeneFeature (pseudo)> > My proposal is this representation without the repeat region feature. I would see the repeat region feature to cluster together a sequence, whatever the sequence is (even one base, or more), repeated X times, but not being used in this situation. > In other words, the RepeatRegionNAFeature serves only to group the two RIME > repeats (which aren't even immediately adjacent to one another.) Is this > what you had in mind? I don't think we need to group them with a repeat region feature, as the transposable element would do it. Or did you mean to make the RepeatRegionNAFeature a > child of the GeneFeature and then make the TransposableElement a child of > the RepeatRegionNAFeature? I'm just not clear on your definition of > "repeat > region". Specifically, can a repeat region contain things that are not > repeats, Yes ! a gene for example !! A repeat region would be used to cluster tandemly repeated genes. But this should be fine as long as a gene feature can be attached to a repeat region. and can it contain more than one type of repeat? I think we agree on only one type of repeat unit and if it has more, we would nest the repeat region features. We din't come here with a repeat region made of interlaced repeat units which would require to make the schema more generic. And, if so, how > does one assign bounds to the region in a non-arbitrary way? > > > The second example is nested transposable elements in procaryote > > genomes, ie insertion of a transposable element within another one. Each > > transposable element can have a similar structure including the > > following sub features : two flanking Inverted Repeats, a gene and its > > promoter and/or a promoter, functional on the other strand ! > > I won't try to draw the pictures for this one! In both the current schema > and your proposal I think we have the problem that we haev no way of > explicitly representing the relationship between the two flanking inverted > repeats. But we don't need to !? Apart from that, however, I think that we can handle this case > just as well as the first. You have to create quite a few features, but > I don't think there's any way to avoid that unless we want to come up with > some "exemplar" transposons and use them to classify the instances we > encounter. The promoter/gene that's functional on the opposite strand > would be represented simply as reverse-strand features (i.e., we'd set > the is_reversed flag in their NALocations, but still use their parent_ids > to indicate their place in the nested repeat structure.) > > > So if there is no repeat feature, the flanking repeats will have to be > > annotated part of the transposable element feature. > > Let me know what you think about these. > > But shouldn't they be part of the transposable element feature? I don't > know the details of this specific type of transposon, but are you trying > to make the distinction between: 1) the core transposon, i.e., the > machinery > that enables that part of the genome (encompassing both the machinery and > perhaps some variable-sized flanking regions) to move around and 2) the > "transposed" element, i.e. the core machinery plus whatever flanking > regions happened to be carried along on the element's most recent trip > (the one that brought it to its current location.)? > I think we want to represent a transposable element in a given context, ie at a given location because this insertion may have consequences, (in)activating a gene or shifting the frame of a gene etc. A core transposon should be represented as an entity on its own like genes are. > > Jonathan > > -- > Jonathan Crabtree > Center for Bioinformatics, University of Pennsylvania > 1406 Blockley Hall, 423 Guardian Drive Philadelphia, PA 19104-6021 > 215-573-3115 > > Arnaud |
From: Jonathan C. <cra...@pc...> - 2003-01-23 19:31:23
|
Arnaud- Returning to some slightly older business... >> >> <DoTS::GenomicSequence> >> ^ ^ ^ ^ ^ >> | | | | | >> <DoTS::GeneFeature (RHS)> | | | | >> ^ | | | | >> | | | | | >> <DoTS::TransposableElement (INGI)> | | | >> ^ ^ | | | >> | | | | | >> | <DoTS::RepeatRegionNAFeature> | | >> | ^ | | >> | | | | >> | 2 x <DoTS::RepeatFeature (RIME)> | >> | | >> | | >> ------------------------<DoTS::GeneFeature (pseudo)> >> > > My proposal is this representation without the repeat region feature. I would > see the repeat region feature to cluster together a sequence, whatever the > sequence is (even one base, or more), repeated X times, but not being used in > this situation. Meaning that you would only use the repeat region feature when X > 1, right? I'm suggesting that we combine the two tables, meaning that we would have one uniform representation for all X. I suppose that's probably my strongest argument against the 2-table representation, namely that it seems arbitrary to say that something is only a "repeat region" when it contains > 1 copy of a repeat. Wouldn't such a thing be better described as a tandem repeat? >>region". Specifically, can a repeat region contain things that are not >>repeats, > > Yes ! a gene for example !! A repeat region would be used to cluster tandemly > repeated genes. But this should be fine as long as a gene feature can be > attached to a repeat region. My question wasn't quite correct; I should have asked whether a repeat region can contain things that are not repeated. That is, could you use a repeat region to cluster tandemly repeated genes if those genes were separated by some additional non-repeating sequences. It sounds like the answer is probably "yes", and that your definition of repeat region is simply any region that contains two or more copies of some type of sequence. Is this accurate? > I think we want to represent a transposable element in a given context, ie at a > given location because this insertion may have consequences, (in)activating a > gene or shifting the frame of a gene etc. > > A core transposon should be represented as an entity on its own like genes are. OK, I agree, and I think that fits with the current schema (except that we have yet to create a table to represent the transposons independent of their location.) Jonathan -- Jonathan Crabtree Center for Bioinformatics, University of Pennsylvania 1406 Blockley Hall, 423 Guardian Drive Philadelphia, PA 19104-6021 215-573-3115 |
From: Arnaud K. <ax...@sa...> - 2003-01-23 22:23:56
|
Hi Jonathan Quoting Jonathan Crabtree <cra...@pc...>: > > Arnaud- > > Returning to some slightly older business... > > >> > >> <DoTS::GenomicSequence> > >> ^ ^ ^ ^ ^ > >> | | | | | > >> <DoTS::GeneFeature (RHS)> | | | | > >> ^ | | | | > >> | | | | | > >> <DoTS::TransposableElement (INGI)> | | | > >> ^ ^ | | | > >> | | | | | > >> | <DoTS::RepeatRegionNAFeature> | | > >> | ^ | | > >> | | | | > >> | 2 x <DoTS::RepeatFeature (RIME)> | > >> | | > >> | | > >> ------------------------<DoTS::GeneFeature (pseudo)> > >> > > > > My proposal is this representation without the repeat region feature. I > would > > see the repeat region feature to cluster together a sequence, whatever > the > > sequence is (even one base, or more), repeated X times, but not being used > in > > this situation. > > Meaning that you would only use the repeat region feature when X > 1, > right? yes > I'm suggesting that we combine the two tables, meaning that we would have > one > uniform representation for all X. I suppose that's probably my strongest > argument against the 2-table representation, namely that it seems arbitrary > to say that something is only a "repeat region" when it contains > 1 copy > of > a repeat. Wouldn't such a thing be better described as a tandem repeat? Yes I guess it could as a repeat region was meant to annotate tandemly repeated DNA sequences. I can see a TandemRepeatFeature in GUS very similar to the proposed RepeatRegionFeature. Are you planning to keep it in replacment of the RepeatRegionNAFeature ? > >>region". Specifically, can a repeat region contain things that are not > >>repeats, > > > > Yes ! a gene for example !! A repeat region would be used to cluster > tandemly > > repeated genes. But this should be fine as long as a gene feature can be > > attached to a repeat region. > > My question wasn't quite correct; I should have asked whether a repeat > region > can contain things that are not repeated. That is, could you use a repeat > region > to cluster tandemly repeated genes if those genes were separated by some > additional non-repeating sequences. It sounds like the answer is probably > "yes", I think so. > and that your definition of repeat region is simply any region that contains > two > or more copies of some type of sequence. Is this accurate? yes > > I think we want to represent a transposable element in a given context, ie > at a > > given location because this insertion may have consequences, (in)activating > a > > gene or shifting the frame of a gene etc. > > > > A core transposon should be represented as an entity on its own like genes > are. > > OK, I agree, and I think that fits with the current schema (except that we > have > yet to create a table to represent the transposons independent of their > location.) ok, I guess this can wait. I reckon that would involve that the Central Dogma side would not only represent genes, right ? > Jonathan > > -- > Jonathan Crabtree > Center for Bioinformatics, University of Pennsylvania > 1406 Blockley Hall, 423 Guardian Drive Philadelphia, PA 19104-6021 > 215-573-3115 > > Arnaud |
From: Arnaud K. <ax...@sa...> - 2003-01-20 17:01:02
|
Hi Jonathan Jonathan Crabtree wrote: > > Arnaud- > >> A quick question regarding evidences, you're mentioning that the >> Evidence table will connect Features and Experimental evidences. >> Where will the latter be stored ? > > > Hopefully others will chime in if I get this wrong... I believe that the > relevant tables are DoTS.Comments (for free text notes/comments > entered by > an annotator) and SRes.BibliographicReference (for published > experiments.) > However, I don't think that we have a generic table to represent > unpublished > laboratory experiments in a structured way. Perhaps we need some use > cases > here? We do have your new table for representing RNAi constructs, but I > don't think that we have a corresponding table to represent the actual > RNAi experiment. Do we need/want such a table (either for RNAi > experiments > or in general) and, if so, how detailed does it need to be? Should be fine for now. In the future the evidence design could be extended for assignment of evidence codes, the same way it is done for GO annotations. > >>> -Modified DoTS.ProteinProperty table to reference ProteinPropertyType >>> One question I have regarding these tables is how will the units be >>> specified? >>> Should I make the "property_value" column a varchar2 column? It may >>> have had this type originally, and I might have changed it without >>> considering the consequences. One option would be to specify in the >>> ProteinPropertyType table >>> what units are to be used, though this is clumsy if there is more >>> than one >>> choice of units for a given property. >>> >> Whatever the unit they're in, they should all be numbers (some would >> be integer) so we can go for the "number" data type but float or >> varchar could also be fine! > > > Right, but the question is how does somebody querying the table know what > a mass of "25" means? Are molecular masses always expressed in the same > units, no matter what? My recollection is that you can sometimes have > some pretty big polypeptides, but I don't know what the convention is. > If we want to query "value" attribute it might be better to have it as a number. It doesn't matter for charge et isoelectric point pH but you're right re. the molecular mass "25" could mean 25 Da but also 25 kDa. Why not storing as a convention the molecular mass always in Daltons and then the API code would do the conversion in kiloDaltons if needed. This way we don't need a "unit" attribute in ProteinPropertyType. > > > Jonathan > Arnaud |
From: Chris S. <sto...@pc...> - 2003-01-20 19:33:43
|
Hi Arnaud and Jonathan, >>>> -Modified DoTS.ProteinProperty table to reference >>>> ProteinPropertyType >>>> One question I have regarding these tables is how will the units be >>>> specified? >>>> Should I make the "property_value" column a varchar2 column? It >>>> may have had this type originally, and I might have changed it >>>> without considering the consequences. One option would be to >>>> specify in the ProteinPropertyType table >>>> what units are to be used, though this is clumsy if there is more >>>> than one >>>> choice of units for a given property. >>>> >>> Whatever the unit they're in, they should all be numbers (some would >>> be integer) so we can go for the "number" data type but float or >>> varchar could also be fine! >> >> >> Right, but the question is how does somebody querying the table know >> what >> a mass of "25" means? Are molecular masses always expressed in the >> same >> units, no matter what? My recollection is that you can sometimes have >> some pretty big polypeptides, but I don't know what the convention is. >> > If we want to query "value" attribute it might be better to have it as > a number. It doesn't matter for charge et isoelectric point pH but > you're right re. the molecular mass "25" could mean 25 Da but also 25 > kDa. Why not storing as a convention the molecular mass always in > Daltons and then the API code would do the conversion in kiloDaltons > if needed. This way we don't need a "unit" attribute in > ProteinPropertyType. This sounds dangerous and unenforceable. We should add a units field. This can either be a varchar or a foreign key to units stored are Sres:MGEDOntology terms. Chris |
From: mazz <ma...@sn...> - 2003-01-27 00:26:23
|
Hi Arnaud, I have some questions for you. So with the way the tables complex and interaction are set up now, if a complex participates in an interaction to find this then you have to see if the row_id in complexComponent is also a row_id in row set member of interaction. With the interaction table two things are interacting (why is row set needed)? What did you have in mind for interaction type (protein-DNA? or more detailed) and effector action type (inhibits)? I am confused about why it is not possible to build up sequential interactions using just single interacting components (see below). Then maybe use pathwayinteraction and pathway (even if the pathway just consists of a A binds B which binds C). Or do you want to model biological reactions which seems sequential to me like a pathway ? (This allows >>> us to represent >>> the interaction of a set of objects (the effector) with another set >>> of objects >>> (the target.) Previously the Interaction table could only >>> represent the interaction >>> between a single pair of entities (OK if they happened to be >>> Complexes, for example, >>> but a potential problem in other situations.) but a potential problem in other situations? What are more of these? (Although the current schema lets us group effectors together, it > doesn't let > us say (for example) that E1 interacts *directly* with T1 to > phosphorylate > it, but that E1's active site is only exposed when E1 is bound to E2. In > other words, E1's role in the activity can be viewed as "primary", and > E2's > role is secondary (in some sense) but all we can say in the schema is > that > the Complex consisting of E1 and E2 interacts with T1 to phosphorylate > it.) The above case: E2 interacts with E1 (they also happen to be represented by a complex; but now consider this only an interaction); then E1 interacts with T1 to modify it. E2 affects E1; then E1 affects T1. or Protein X (effector) interacts with protein Y (target); protein Y (effector ) modifies protein Z (target). I guess I have a problem with the E1-E2 concept (or multiple effectors if one does not effect or interact with a target directly) in the interaction table. I guess I may also think of Complex and Interaction more separately. For example, the TFIID complex has several components; the complex consists of several proteins (protein-protein interactions; complex type - protein) of which there is no known direction (effector-target concept at least for now). This complex with its complexComponents (which we represent) can than interact with a DNA sequence (target). Interaction type (protein-DNA); (effector action type - binds?). Although, in this example, we know that we would also have the TATA-binding protein (TBP; effector) interacting with the DNA sequence target as an entry in interaction (separate from the entry that the complex TFIID can interact with the DNA target). Also the other component interactions ...TBP-associated factor 70 interacts with TBP .(direction not known) and so on ... if all the interactions individually are known to define the complex entirely. Joan Arnaud Kerhornou wrote: > Hi Jonathan > > Jonathan Crabtree wrote: > > > > > Arnaud- > > > > Thanks for the feedback; I think we're getting close to agreement here. > > I think so too ! > > >> I have noticed that your changes don't cover the DNA/RNA features. Is > >> there any reason for this ? I know there are quite a lot of them and > >> there might be another way of storing data some information such as > >> telomere or centromere regions, origin of replication, inflection > >> point etc. All these features are covered by Sequence Ontology, so a > >> new ChromosomeElement or ChromosomeRegion feature could be generic > >> enough to cover most of them. > >> Let me know what you think. > > > > > > Which DNA/RNA features do you mean (other than those mentioned above)? > > The file I sent you should include views on the top of NAFeatureImp > table. Here the list : > > * ChromosomeElement or we can keep CentromereFeature and TelomereFeature > as they are in gusdev - IMPORTANT > * InfectionPointFeature > * ReplicationFeature, for annotated origins of replication > * RNARegulatory - as there is a DNARegulatory feature => regulatory > element at the RNA level > * RNASecondaryStructure > * SpliceSiteFeature > * TransposableElement > > + an extra attribute in RestrictionFragmentFeature, "type_of_cut" > (Sticky or blunt) > + an extra attribute in GeneSynonym, "is_obsolete" > > + a new view on the top of NASequenceImp, "GenomicSequence" instead of > the existing one, ExternalNASequence. > > I can send the files to you if you want. > > > > > It's possible that I misplaced the e-mail or notes where we discussed > > these. Or are you just saying that we will eventually have a view for > > each type of DNA/RNA feature in the Sequence Ontology? I think that > > this is true, although I hadn't planned to make the change immediately, > > since I believe we had agreed on a "transitional" period in which the > > various NAFeature views would first be given a nullable > > sequence_ontology_id > > Yes we had! So regarding chromosome regions, shall we keep > TelomereFeature and CentromereFeature ? > > > and we would then decide how to best rearrange the views to more closely > > match the ontology terms. I haven't added the sequence_ontology_id > > column to the NAFeature views, but I will do so right away. We do > > currently have some relevant NAFeature views in gusdev that have not > > been migrated into 3.0: > > > > CentromereFeature > > LowComplexityNAFeature > > ScaffoldGapFeature > > TelomereFeature > > > > I have no objection to merging the telomere and centromere features into > > a single view--along with any other chromosomal regions covered by the > > ontology--although it would mean that we wouldn't have a 1-1 mapping > > between sequence ontology terms and views on NAFeature. I think that > > at one point this was proposed as the eventual goal of the rearrangement. > > Anyway, given that I'm not certain of the plan here, I'm going to add > > the sequence_ontology_id column but leave the views unchanged for now. > > They can easily be changed without interfering with our data migration, > > so their fate doesn't have to be settled immediately. We have yet to > > establish a consistent set of rules for deciding when different types > > of features get grouped into a single view and when they get their own > > views, so this is probably a good opportunity to settle the question > > once and for all. The Sequence Ontology is big enough that we probably > > *don't* want a view for each and every term in the ontology; it would > > make maintenance quite difficult. But we could, for example, create a > > view for every top-level (or second-level) sequence ontology term. > > However, even a relatively high-level feature like "chromosomal region" > > (SO:0000711) looks like it's already a 4th or 5th level feature... > > > At > > the other extreme, we could continue what we're doing now, i.e. using > > an ad-hoc classification of features based on the data we actually have > > available, and just make sure that every feature is tagged with the > > correct sequence ontology term. Any thoughts? > > It makes sense as SO may undergo revisions this year. > > > > >>> > >>> alter table DOTS.PROTEINPROPERTY add constraint PROTEINPROPERTY_CK01 > >>> check (property_name in ('isoelectric point', 'molecular mass', > >>> 'charge', 'average residue mass')); > >>> > >>> The table allows multiple protein properties of the same type to be > >>> associated with > >>> entries in DoTS.AASequenceImp. Arnaud had suggested originally that > >>> the last property, average residue mass, could actually be an > >>> attribute of the table that stores the protein sequence itself. > >>> However, it seemed that if the molecular mass attribute could have > >>> multiple values (e.g., from different experiments) then > >>> the same should be true of the average residue mass, which is > >>> essentially a derived property. Let me know if you disagree with > >>> this, or think we should create an explicit controlled vocab. for > >>> these 4 properties. > >>> > >>> > >> A controlled vocabulary table with the four attributes you've > >> mentioned is fine. > > > > > > OK, I'll make this change. > > > >>> -Protein features > >>> *Signal peptide features (stored in DoTS.SignalPeptideFeature) > >>> This view exists already, as DoTS.SignalPeptideFeature, but we need > >>> to add the > >>> ability to store curated data, such as targetting information. It > >>> should be straightforward to modify the view to accomodate this, > >>> but I'm not sure exactly > >>> what needs to be stored. Currently we use the view exclusively for > >>> SignalP > >>> predictions, and from what I understand SignalP is only concerned > >>> with predicting > >>> secreted proteins, meaning that we don't currently have any > >>> explicit targetting information. Is this something we could > >>> represent using the GO ontology for cellular localization? Do we > >>> also need some free text columns? Let me know and I'll make > >>> the changes. All the SignalP-specific columns appear to be > >>> nullable, so we don't > >>> necessarily have to do anything except add the new columns for the > >>> manually curated > >>> information. > >>> > >>> > >> After talking to the curators it appears that GO component suplements > >> targetting information at the feature level but will not be enough. > >> The targeting information is represented by the component ontology in > >> one context i.e. mitochondrial, nuclear, membrane localization but > >> not in the context of the actual residues involved. > >> The actual residues involved in the targeting (or any other > >> phenomena) need to be represented by a protein feature ontology can > >> be mapped onto specific amino acids of a protein. > >> This ontology is the equivalent of Sequence Ontology (SO) which is > >> meant for DNA features. It is being prepared by Val Wood with input > >> from Swiss-prot. > > > > > > OK, so the idea is that the various signal peptides have been classified > > into named classes that should be represented by some kind of ontology? > > > >> As you're going to add a extra attribute sequence_ontology_id to the > >> NA Features, could you do the same to any AA Features ? > > > > > > This will only work if the new ontology is actually part of the Sequence > > Ontology (or if we use the SequenceOntology table to store both > > ontologies.) > > Do you know if this is the case? It's quite possible, since the SO does > > already cover amino acid features. Otherwise we'll have to create a > > separate AASequenceOntology (or whatever the new ontology is called). > > It is at the moment a different project but it would make sense they > merge in the future. Just to give you an idea about Localization > Signals, here is a snapshot: > > %localization signal > %N-terminal signal sequence > %nuclear localization signal > %bipartite nuclear localization signal > %etc > %mitochondrial localization sequence > %thylakoid localization signal > %ER retention signal > > The way the SignalPeptideFeature is designed make difficult the > annotation of localization signal features. We can leave > SignalPeptideFeature as it is as it fits with SignalP software > prediction and in the future create a new feature LocalizationSignalFeature. > > > > >>> *Transmembrane domain features (stored in DoTS.PredictedAAFeature) > >>> "PlasmoDB web site shows hydrophobicity graphics, where is it > >>> stored in GUS?" > >>> The hydrophobicity plots are computed dynamically based on the > >>> amino acid sequence. > >>> Transmembrane domains are currently stored in the > >>> PredictedAAFeature view, although > >>> I will probably create a new view for them when I get around to > >>> eliminating PredictedAAFeature. Another possibility would be to > >>> treat TM domains as another > >>> type of domain, and store them in DomainFeature. What do you think > >>> about this? > >>> > >>> > >> I reckon they could be merged. > > > > > > OK, sounds good. > > > >>> *Post-translational modification features (new view: > >>> DoTS:PostTranslationalModFeature) > >>> Has a "type" column to represent the type of modification. It was > >>> also suggested > >>> that we have a column called "modified_by", which would be a > >>> reference to the Interaction table. However, isn't it possible > >>> that the same post-translational > >>> modification (e.g., phosphorylation of a specific amino acid) could > >>> be the result > >>> of one of several Interactions? > >> > >> yes you're right, the effector could be different. In that case the > >> attribute > >> "modified_by" is not useful. > >> > >>> This argues for an additional relationship between Interaction and > >>> PostTranslationalModFeature, unless we're OK creating multiple > >>> PostTranslationalModFeatures, identical except for their modified_by > >>> attribute. Comments on this? > >>> > >>> > >> I don't think they should be duplicated as they corresponds to a > >> unique site. This unique feature would > >> be associated with different interaction entries. We might not need > >> an extra table between Interaction and PostTranslationalModFeature > >> though. We still can do the following query : "give me all the > >> interaction entries which target is a PostTranslationalModFeature > >> which id is ...". > >> How does it sound ? > > > > > > We could do this, although one question is whether, semantically > > speaking, > > the "target" of an Interaction should be "the thing to be modified" > > (e.g. an > > unphosphorylated sequence or residue) or "the resulting modification" > > (e.g. > > the feature that represents a phosphorylated residue at the appropriate > > location.) The answer is probably that we just shouldn't worry about it > > and should just do whatever is most convenient on a case-by-case basis. > > To do it "correctly" would be problematic either way. For example, if we > > say that the target is the thing to be modified, then we have to create a > > feature that represents a region of sequence that *could* be modified in > > some way and then create another feature to represent the actual > > modification. > > But if we say that the target is the result of the modification then > > we may > > have to create equally unusual tables/views. For example, if the > > result of > > a given interaction is to degrade a protein, then do we have to create a > > table/object that represents a degraded protein (or "nothing", or > > whatever > > it is that's left after the modification)? For now I have no problem > > with > > interpreting the "target" based on context, but in the longer term we may > > want to consider separating the notions of "target prior to modification" > > and either "target after modification" or "effect of modification". > > > > I also realized belatedly that I could have left the Interaction table > > unchanged, rather than introducing specific references to RowSet. This > > would have allowed us to represent either singleton effectors/targets or > > set-valued effectors/targets, without having to always join through > > RowSet > > in the singleton case. On the other hand, if we do associate some > > additional information with the RowSets, then the current representation > > is correct. > > It depends if we want to represent many-to-many relationship between > interaction and members of this interaction. Without the RowSet table, > we can't assign a set of several effectors/targets, right ? Unless we > consider that this set of effectors are being part of a complex and act > as the whole. > > > > >>> *AA repeats (new view: RepeatRegionAAFeature) > >>> I called this view RepeatRegionAAFeature in case we want to have a > >>> similar view > >>> for NASequences. I also created only a single view, instead of > >>> following Arnaud's > >>> original suggestion, which was for both: > >>> > >>> * RepeatRegionFeature as a set of RepeatUnitFeatures, > >>> * RepeatUnitFeature, with the consensus sequence, name and size > >>> > >>> I based the design of this view on that of TandemRepeatFeature, > >>> which we have for > >>> NASequences already. Instead of splitting the consensus sequence, > >>> name, and size > >>> into a separate table, they occupy columns in > >>> RepeatRegionAAFeature. This works > >>> quite well for the tandem repeats we already have (for DNA > >>> sequences.) However, if > >>> there is a known set of named amino acid sequence repeats, then it > >>> would probably > >>> make sense to do what Arnaud suggested, and store these in a > >>> separate table (likely named RepeatUnit, not RepeatUnitFeature, > >>> since they would have no unique locations.) Does this sound > >>> reasonable? That is, put the consensus seqs in the > >>> repeat region table itself if they're anonymous, but if they've > >>> been named, then store them in a separate table. Also note that > >>> this view has a reference to RepeatType, although the current > >>> contents of this table are probably applicable only to DNA sequence > >>> repeats (LINEs, SINEs, ALUs, etc.), since I believe that I parsed > >>> them out of RepBase. > >>> > >>> > >> I proposed a separate repeat feature because one may want to annotate > >> a repeat outside a repeat region, for example LTR repeats attached to > >> a given transposable element. These RepeatFeatures or > >> RepeatUnitFeatures can then have a location. > >> The other case is when a repeat region is made of a set of different > >> repeat units. > > > > > > OK, I didn't realize that this was what you were trying to represent. As > > currently conceived, RepeatRegionAAFeature is meant to represent a region > > that contains one or more immediately adjacent copies of the same type > > of (amino acid sequence) repeat. The assumption is also that these > > regions > > will typically be maximal (with respect to the chosen repeat type, > > consensus, > > and max. mismatch, the last of which is not represented directly in the > > table.) We can still represent more complex repeat structures using this > > single table, but the representation is implicit, not explicit (i.e. you > > have to do a query to find out what other repeats lie within the > > bounds of > > the transposon, meaning that there's no easy way to query for all > > transposable > > elements with a particular flanking LTR structure.) Do you want to > > come up > > with a 2-table version of what I've done? The use cases aren't clear > > enough > > in my mind yet for me to be able to do it. It seems that the bare > > minimum we > > need is just another column in the RepeatRegionAAFeature, parent_id; > > which > > would let us represent explicitly that a particular repeat is a > > *necessary* > > (versus incidental) component of another NA/AAFeature. Both AAFeatureImp > > and NAFeatureImp already have a parent_id, so this would be a > > straightforward > > change. The queries still might not be terribly efficient, but I > > don't know > > what exactly you wanted to support in terms of queries, versus just > > making > > sure that the representation is sufficiently rich to capture the > > structure. > > A case we came across here for Tbrucei is nested repeat regions (at the > DNA level). Each repeat region has coordinates and is annotated with a > unique repeat unit type. This repeat region can be within a bigger > repeat region annotated with a different repeat unit type. > ... which is in other words your suggestion with parent_id as an extra > attribute ... > > Regarding transposon repeat types, if we have a TransposableElement > feature and its type is given as an attribute, a repeat feature will > just be useful to locate the LTRs within a given a transposable element. > Can we keep this functionality ? Then the feature will be simple, just a > repeat_type, and a parent_id atributes. > > > > >> In any case, NA repeats and AA repeats should have the same design. > >> Just the controlled vocabulary representing the types of repeats will > >> be different. > > > > > > Absolutely, yes, although one question is whether AA repeats can have the > > same kind of nested structure that you mention as a possibility for NA > > repeats (the transposon with LTRs). I don't know the answer to this. > > > >>> -DoTS.Interaction (table modified, dependent tables added) > >>> *Added "has_direction" column, as discussed previously. The idea > >>> here is that > >>> not all interactions (particularly physical ones, e.g., > >>> dimerization) have a > >>> direction. If has_direction == 0, then the value of > >>> direction_is_known can > >>> be ignored. > >>> *Added non-nullable "effector_action_type_id" column, referencing > >>> DoTS.EffectorActionType (a new table.) This column/table > >>> represents the possible > >>> things that an effector can do to a target. For example, the > >>> InteractionType > >>> associated with the Interaction could be "binds to" (e.g., a > >>> promoter region), and > >>> the EffectorActionType for that Interaction could be to either > >>> "inhibit" or "enhance" > >>> expression of the coresponding gene. > >>> *Replaced effector_table_id and effector_row_id with > >>> effector_row_set_id, and > >>> similarly for the target_table_id and target_row_id. This allows > >>> us to represent > >>> the interaction of a set of objects (the effector) with another set > >>> of objects > >>> (the target.) Previously the Interaction table could only > >>> represent the interaction > >>> between a single pair of entities (OK if they happened to be > >>> Complexes, for example, > >>> but a potential problem in other situations.) Now both effector > >>> and target are represented as references to DoTS.RowSet, which in > >>> tun references DoTS.RowSetMember, > >>> which...in turn...references the individual database rows that > >>> comprise the effector > >>> or target. These tables (RowSet and RowSetMember) are essentially > >>> the same as Complex and ComplexComponent, except that they are > >>> totally generic; they can be used to group any set of rows in the > >>> database and they store no additional information. However, if > >>> there are any additional columns that we can think of (that are > >>> specific to Interactions) then these tables should be replaced by > >>> less generic ones (e.g. InteractingEntitySet or InteractionSet, or > >>> something along those lines.) > >>> > >>> > >> Sounds fine. The only thing I can see is regarding the > >> EffectorActionType. If each effector, member of a RowSet, has a > >> different action type, the attribute, effector_action_type_id, should > >> go in the RowSetMember table. I don't have any example though. > > > > > > OK, I think I'd be inclined to wait until we have some use cases for > > this. > > Although the current schema lets us group effectors together, it > > doesn't let > > us say (for example) that E1 interacts *directly* with T1 to > > phosphorylate > > it, but that E1's active site is only exposed when E1 is bound to E2. In > > other words, E1's role in the activity can be viewed as "primary", and > > E2's > > role is secondary (in some sense) but all we can say in the schema is > > that > > the Complex consisting of E1 and E2 interacts with T1 to phosphorylate > > it. > > I think that the solution we have now is OK, but it only lets us > > represent > > the overall action of the entire set of effectors. > > Let's leave the design as it is for now. Curators are not going to > curate interactions data in the short term. We shall come back later > with more precise ideas/use cases about them. > > > > > Jonathan > > > > Arnaud > > ------------------------------------------------------- > This SF.NET email is sponsored by: Thawte.com > Understand how to protect your customers personal information by implementing > SSL on your Apache Web Server. Click here to get our FREE Thawte Apache > Guide: http://ads.sourceforge.net/cgi-bin/redirect.pl?thaw0029en > _______________________________________________ > Gusdev-gusdev mailing list > Gus...@li... > https://lists.sourceforge.net/lists/listinfo/gusdev-gusdev |
From: Jonathan C. <cra...@sn...> - 2003-01-27 06:46:18
|
Joan- > So with the way the tables complex and interaction are set up now, if a complex > participates in an interaction > to find this then you have to see if the row_id in complexComponent is also a > row_id in row set member of interaction. Not quite; if the complex is itself the effector or target of the interaction then you'd want to join ComplexComponent.complex_id (not row_id) with RowSetMember.row_id (and you'd also constrain RowSetMember.table_id with the table_id for Complex.) You'd only use the row_id of ComplexComponent if your Complex of interest was itself a component of another Complex that acted as the effector or target. > With the interaction table two things are interacting (why is row set needed)? No; as discussed previously, we decided to extend the Interaction table to represent the interaction of two *sets* of things (where those sets of things cannot be represented as Complexes.) Given that, we had to add something like the RowSet table. > I am confused about why it is not possible to build up sequential interactions > using just single interacting components (see below). I believe that it *is* possible to build up sequential interactions using single interacting components; this is what the Pathway and PathwayInteraction tables allow you to do. Having to create a new RowSet object--even when you have only a single effector/target--does require some extra work, but it's by no means impossible. This is what I was talking about last week when I said that I could have left the Interaction table as it was originally (i.e., with a table_id & row_id for both effector and target.) However, we decided that we might as well replace these with references to RowSet, so that the joins would always be consistent regardless of the number of objects acting as the target/effector of the Interaction. > Then maybe use pathwayinteraction and pathway (even if the pathway just consists > of a A binds B which binds C). Or do you want to model biological reactions > which seems sequential to me like a pathway ? The goal is simply to be able to represent interactions between sets of things. If this happens not to be a useful feature then we could consider doing away with it. I'm not crazy about the extra joins that it entails, but at the time it seemed like it would be a reasonable generalization to make (and nobody objected when it was done.) We could consider changing it back, but definitely not until after I've finished the 3.0 migration, since I've already moved the Pathway data into the new Interaction schema. > >>> (the target.) Previously the Interaction table could only > >>> represent the interaction > >>> between a single pair of entities (OK if they happened to be > >>> Complexes, for example, > >>> but a potential problem in other situations.) > > but a potential problem in other situations? What are more of these? I'm not sure that I have any great examples, but since we've been talking about Complexes, what about using Interaction to represent Complex *formation* (e.g. dimerization)? Assuming that we had reason to explicitly represent the formation of a Complex (versus the mere fact of its existence, which is handled by Complex/ComplexComponent), wouldn't this be done with the Interaction table? If it were, then you'd have to be able to support multiple effectors. To represent dimerization, for example, you'd have 2 inputs (effectors) and 1 output (the target.) The effectors would be the same entities referenced by the ComplexComponents and the target would be the Complex itself. This sounds redundant, but if (yet another hypotheticals) you wanted to represent the fact that a second or third protein acted to inhibit the dimerization process (through some as-yet- undetermined mechanism) then you'd need to create the dimerization Interaction so that you could reference it in yet another Interaction (as a target being inhibited by the new protein). > (Although the current schema lets us group effectors together, it > > doesn't let > > us say (for example) that E1 interacts *directly* with T1 to > > phosphorylate > > it, but that E1's active site is only exposed when E1 is bound to E2. In > > other words, E1's role in the activity can be viewed as "primary", and > > E2's > > role is secondary (in some sense) but all we can say in the schema is > > that > > the Complex consisting of E1 and E2 interacts with T1 to phosphorylate > > it.) > > The above case: > > E2 interacts with E1 (they also happen to be represented by a complex; but now > consider this only an interaction); then E1 interacts with T1 to modify it. Yes, although if the interaction between E2 and E1 were transient then the result of E2 and E1's interaction would not be a complex, but rather a modified E1. > E2 affects E1; then E1 affects T1. or > > Protein X (effector) interacts with protein Y (target); protein Y (effector ) > modifies protein Z (target). I'm not sure I see the difference between these two alternatives, apart from the fact that the second uses X,Y, and Z instead of E2, E1, and T1?? > I guess I have a problem with the E1-E2 concept (or multiple effectors if one > does not effect or interact with a target directly) in the interaction table. Well, I think complex formation is a good example of a situation in which both effectors interact directly with the target, because they both become part of it. Whether we actually need to represent this is another question. Can anyone else come up with any good examples of multiple- effector/target interactions (that couldn't be easily modeled, as Joan points out, with a series of simpler "single-valued" Interactions)? Jonathan |
From: Joan M. <ma...@pc...> - 2003-01-27 16:13:39
|
Hi Jonathan, *formation* (e.g. dimerization)? Assuming that we had reason to explicitly represent the formation of a Complex (versus the mere fact of its existence, which is handled by Complex/ComplexComponent), wouldn't this be done with the Interaction table? If it were, then you'd have to be able to support multiple effectors. To represent dimerization, for example, you'd have 2 inputs (effectors) and 1 output (the target.) The effectors would be the same entities referenced by the ComplexComponents and the target would be the Complex itself. This sounds redundant, but if (yet another hypotheticals) you wanted to represent the fact that a second or third protein acted to inhibit the dimerization process (through some as-yet- undetermined mechanism) then you'd need to create the dimerization Interaction so that you could reference it in yet another Interaction (as a target being inhibited by the new protein). Yes this is true. Think modeling all interactions regardless of knowing that they form a dimer complex. If we take out the effect-target wording in the interaction table and say: entity 1 (effector) interacts withe entity 2 (target) to create a dimer; entity 1can equal entity 2 Entity 1 interacts with entity 2 ; now if a third entity inhibits the dimerization between 1 and 2 than entity 3 would need to be able to interact with entity 1 (or 2). I think the trouble comes in when if you had a dimer or a 2 component complex (entity 1 and entity as a complex) and the entity 3 could only interact with the complex to disassemble it (or on the molecular level you need surfaces of both entity 1 and entity 2 for interaction with entity 3). I think maybe saying that the complex of 1 and 2 interacts with entity 3 takes care of this, but this uses both the interaction table and the complex table to assign the complex of 1 and 2 as interacting with entity 3. I think that the terms effector and target are confusing in the interaction table. Actually when we originally designed the interaction table I am remembering we struggled with these words, but if we now have direction is known (or not known) in the table do we need effector-target? I am not sure about this. Also the line of evidence tables for interaction may assume that you are only adding evidence for a single direct interaction even if there are multiple lines of evidence to support the interaction (yeast 2 hybrid exp., invitro binding exp.). Joan Jonathan Crabtree wrote: > Joan- > > > So with the way the tables complex and interaction are set up now, if a complex > > participates in an interaction > > to find this then you have to see if the row_id in complexComponent is also a > > row_id in row set member of interaction. > > Not quite; if the complex is itself the effector or target of the > interaction then you'd want to join ComplexComponent.complex_id (not row_id) > with RowSetMember.row_id (and you'd also constrain RowSetMember.table_id with > the table_id for Complex.) You'd only use the row_id of ComplexComponent > if your Complex of interest was itself a component of another Complex that > acted as the effector or target. > > > With the interaction table two things are interacting (why is row set needed)? > > No; as discussed previously, we decided to extend the Interaction table > to represent the interaction of two *sets* of things (where those sets of > things cannot be represented as Complexes.) Given that, we had to add > something like the RowSet table. > > > I am confused about why it is not possible to build up sequential interactions > > using just single interacting components (see below). > > I believe that it *is* possible to build up sequential interactions using > single interacting components; this is what the Pathway and PathwayInteraction > tables allow you to do. Having to create a new RowSet object--even when > you have only a single effector/target--does require some extra work, but > it's by no means impossible. This is what I was talking about last week > when I said that I could have left the Interaction table as it was > originally (i.e., with a table_id & row_id for both effector and target.) > However, we decided that we might as well replace these with references to > RowSet, so that the joins would always be consistent regardless of the > number of objects acting as the target/effector of the Interaction. > > > Then maybe use pathwayinteraction and pathway (even if the pathway just consists > > of a A binds B which binds C). Or do you want to model biological reactions > > which seems sequential to me like a pathway ? > > The goal is simply to be able to represent interactions between sets of > things. If this happens not to be a useful feature then we could consider > doing away with it. I'm not crazy about the extra joins that it entails, > but at the time it seemed like it would be a reasonable generalization to > make (and nobody objected when it was done.) We could consider changing > it back, but definitely not until after I've finished the 3.0 migration, > since I've already moved the Pathway data into the new Interaction schema. > > > >>> (the target.) Previously the Interaction table could only > > >>> represent the interaction > > >>> between a single pair of entities (OK if they happened to be > > >>> Complexes, for example, > > >>> but a potential problem in other situations.) > > > > but a potential problem in other situations? What are more of these? > > I'm not sure that I have any great examples, but since we've been talking > about Complexes, what about using Interaction to represent Complex > *formation* (e.g. dimerization)? Assuming that we had reason to explicitly > represent the formation of a Complex (versus the mere fact of its existence, > which is handled by Complex/ComplexComponent), wouldn't this be done with > the Interaction table? If it were, then you'd have to be able to support > multiple effectors. To represent dimerization, for example, you'd have > 2 inputs (effectors) and 1 output (the target.) The effectors would be > the same entities referenced by the ComplexComponents and the target would > be the Complex itself. This sounds redundant, but if (yet another > hypotheticals) you wanted to represent the fact that a second or third > protein acted to inhibit the dimerization process (through some as-yet- > undetermined mechanism) then you'd need to create the dimerization > Interaction so that you could reference it in yet another Interaction (as > a target being inhibited by the new protein). > > > (Although the current schema lets us group effectors together, it > > > doesn't let > > > us say (for example) that E1 interacts *directly* with T1 to > > > phosphorylate > > > it, but that E1's active site is only exposed when E1 is bound to E2. In > > > other words, E1's role in the activity can be viewed as "primary", and > > > E2's > > > role is secondary (in some sense) but all we can say in the schema is > > > that > > > the Complex consisting of E1 and E2 interacts with T1 to phosphorylate > > > it.) > > > > The above case: > > > > E2 interacts with E1 (they also happen to be represented by a complex; but now > > consider this only an interaction); then E1 interacts with T1 to modify it. > > Yes, although if the interaction between E2 and E1 were transient then the > result of E2 and E1's interaction would not be a complex, but rather a > modified E1. > > > E2 affects E1; then E1 affects T1. or > > > > Protein X (effector) interacts with protein Y (target); protein Y (effector ) > > modifies protein Z (target). > > I'm not sure I see the difference between these two alternatives, apart from > the fact that the second uses X,Y, and Z instead of E2, E1, and T1?? > > > I guess I have a problem with the E1-E2 concept (or multiple effectors if one > > does not effect or interact with a target directly) in the interaction table. > > Well, I think complex formation is a good example of a situation in which > both effectors interact directly with the target, because they both become > part of it. Whether we actually need to represent this is another > question. Can anyone else come up with any good examples of multiple- > effector/target interactions (that couldn't be easily modeled, as Joan > points out, with a series of simpler "single-valued" Interactions)? > > Jonathan > > ------------------------------------------------------- > This SF.NET email is sponsored by: > SourceForge Enterprise Edition + IBM + LinuxWorld = Something 2 See! > http://www.vasoftware.com > _______________________________________________ > Gusdev-gusdev mailing list > Gus...@li... > https://lists.sourceforge.net/lists/listinfo/gusdev-gusdev -- Joan Mazzarelli Computational Biology and Informatics Laboratory Center for Bioinformatics 1429 Blockley Hall University of Pennsylvania Philadelphia, PA 19104 |
From: Arnaud K. <ax...@sa...> - 2003-01-28 15:30:14
|
Hi Joan Mazzarelli wrote: >Hi Jonathan, > >*formation* (e.g. dimerization)? Assuming that we had reason to explicitly >represent the formation of a Complex (versus the mere fact of its existence, >which is handled by Complex/ComplexComponent), wouldn't this be done with >the Interaction table? If it were, then you'd have to be able to support >multiple effectors. To represent dimerization, for example, you'd have >2 inputs (effectors) and 1 output (the target.) The effectors would be >the same entities referenced by the ComplexComponents and the target would >be the Complex itself. This sounds redundant, but if (yet another >hypotheticals) you wanted to represent the fact that a second or third >protein acted to inhibit the dimerization process (through some as-yet- >undetermined mechanism) then you'd need to create the dimerization >Interaction so that you could reference it in yet another Interaction (as >a target being inhibited by the new protein). > > I think It makes sense representing the dimerization by an interaction. This way we can differenciate that an effector modulates the activity of Complex C1 from another situation where another effector modulates the formation of C1, even though one could argue that regulating the formation of C1 is likely to also regulate the activity of C1! I think in this use case there are two things we want to represent. * First is that an effector, E3, let's say activates the formation of Complex C1, made of two proteins, P1 and P2. * Second is how E3 activates the formation of C1. For example E3 is a kinase and by phosphorylating P1, it induces the dimerization of P1 and P2 to form C1. So P1 has two states, one is phosphorylated in which it acts as a complex component, the other one is unphosphorylated. Would it be worth to represent these two states of a protein (or more generally active/inactive states) ? >Yes this is true. Think modeling all interactions regardless of knowing that they >form a dimer complex. > >If we take out the effect-target wording in the interaction table and say: >entity 1 (effector) interacts withe entity 2 (target) to create a dimer; entity 1can >equal entity 2 > >Entity 1 interacts with entity 2 ; now if a third entity inhibits the dimerization >between 1 and 2 than entity 3 would need to be able to interact with entity 1 (or 2). > >I think the trouble comes in when if you had a dimer or a 2 component complex (entity >1 and entity as a complex) and the entity 3 could only interact with the complex to >disassemble it (or on the molecular level you need surfaces of both entity 1 and >entity 2 for interaction with entity 3). > >I think maybe saying that the complex of 1 and 2 interacts with entity 3 takes care of >this, but this uses both the interaction table and the complex table to assign the >complex of 1 and 2 as interacting with entity 3. > >I think that the terms effector and target are confusing in the interaction table. >Actually when we originally designed the interaction table I am remembering we >struggled with these words, but if we now have direction is known (or not known) in >the table do we need effector-target? I am not sure about this. >Also the line of evidence tables for interaction may assume that you are only adding >evidence for a single direct interaction even if there are multiple lines of evidence >to support the interaction (yeast 2 hybrid exp., invitro binding exp.). > >Joan > > > Arnaud |
From: Joan M. <ma...@pc...> - 2003-01-28 19:16:25
|
Hi, (I think in this use case there are two things we want to represent. * First is that an effector, E3, let's say activates the formation of Complex C1, made of two proteins, P1 and P2. * Second is how E3 activates the formation of C1. For example E3 is a kinase and by phosphorylating P1, it induces the dimerization of P1 and P2 to form C1.) So just for me: entity 3 (E3) interacts with P1 (unphosphorlated); (phosphorlated) P1 can now interact with P2 So P1 has two states, one is phosphorylated in which it acts as a complex component, the other one is unphosphorylated. Would it be worth to represent these two states of a protein (or more generally active/inactive states) ? Yes, but I think this gets into something that interaction does not strictly cover. And the answer requires thinking about GUS proteins. So the protein involved in the interaction is not the same, in other words, in protein land, it has a phosphorlated residue which participates in the interaction, so I think in the database we would have to say this is a new "instance" of the protein (I am not sure this is the right word to use) ....so there was a RNA which has a protein associated with it and then this protein is modified which changes not strictly its overall amino acid sequence but one of its amino acids "chemical nature". (although protein instances (sequences) derived for an RNA can vary depending on the "source".) If we can represent both the proteins forms (phosphorlated and unphos.) somehow, we could use the form which does the interacting as the entity (effector) in the interaction table. But I think this gets into protein areas which we have not discussed in any depth because we would have to be able to create the feature on the amino sequence (ie amino acid 23 of this amino acid sequence is phosphorlated). I guess something like the amino acid residue S at position 200 has been changed to S*. This gets into how to handle postranslational protein modifications. I think you may have had some discussions with Crabtree on this. Do you currently have away to do this when annotating or do you just have this info. associated in the protein (e.g., protein X is phosphorlated on residue 34; pubmed reference)? Joan Arnaud Kerhornou wrote: > Hi > > Joan Mazzarelli wrote: > > >Hi Jonathan, > > > >*formation* (e.g. dimerization)? Assuming that we had reason to explicitly > >represent the formation of a Complex (versus the mere fact of its existence, > >which is handled by Complex/ComplexComponent), wouldn't this be done with > >the Interaction table? If it were, then you'd have to be able to support > >multiple effectors. To represent dimerization, for example, you'd have > >2 inputs (effectors) and 1 output (the target.) The effectors would be > >the same entities referenced by the ComplexComponents and the target would > >be the Complex itself. This sounds redundant, but if (yet another > >hypotheticals) you wanted to represent the fact that a second or third > >protein acted to inhibit the dimerization process (through some as-yet- > >undetermined mechanism) then you'd need to create the dimerization > >Interaction so that you could reference it in yet another Interaction (as > >a target being inhibited by the new protein). > > > > > I think It makes sense representing the dimerization by an interaction. > This way we can differenciate that an effector modulates the activity of > Complex C1 from another situation where another effector modulates the > formation of C1, even though one could argue that regulating the > formation of C1 is likely to also regulate the activity of C1! > > I think in this use case there are two things we want to represent. > * First is that an effector, E3, let's say activates the formation of > Complex C1, made of two proteins, P1 and P2. > * Second is how E3 activates the formation of C1. For example E3 is a > kinase and by phosphorylating P1, it induces the dimerization of P1 and > P2 to form C1. > > So P1 has two states, one is phosphorylated in which it acts as a > complex component, the other one is unphosphorylated. Would it be worth > to represent these two states of a protein (or more generally > active/inactive states) ? > > >Yes this is true. Think modeling all interactions regardless of knowing that they > >form a dimer complex. > > > >If we take out the effect-target wording in the interaction table and say: > >entity 1 (effector) interacts withe entity 2 (target) to create a dimer; entity 1can > >equal entity 2 > > > >Entity 1 interacts with entity 2 ; now if a third entity inhibits the dimerization > >between 1 and 2 than entity 3 would need to be able to interact with entity 1 (or 2). > > > >I think the trouble comes in when if you had a dimer or a 2 component complex (entity > >1 and entity as a complex) and the entity 3 could only interact with the complex to > >disassemble it (or on the molecular level you need surfaces of both entity 1 and > >entity 2 for interaction with entity 3). > > > >I think maybe saying that the complex of 1 and 2 interacts with entity 3 takes care of > >this, but this uses both the interaction table and the complex table to assign the > >complex of 1 and 2 as interacting with entity 3. > > > >I think that the terms effector and target are confusing in the interaction table. > >Actually when we originally designed the interaction table I am remembering we > >struggled with these words, but if we now have direction is known (or not known) in > >the table do we need effector-target? I am not sure about this. > >Also the line of evidence tables for interaction may assume that you are only adding > >evidence for a single direct interaction even if there are multiple lines of evidence > >to support the interaction (yeast 2 hybrid exp., invitro binding exp.). > > > >Joan > > > > > > > Arnaud -- Joan Mazzarelli Computational Biology and Informatics Laboratory Center for Bioinformatics 1429 Blockley Hall University of Pennsylvania Philadelphia, PA 19104 |
From: mazz <ma...@sn...> - 2003-01-27 00:31:09
|
Hi Arnaud, I have some questions for you. So with the way the tables complex and interaction are set up now, if a complex participates in an interaction to find this then you have to see if the row_id in complexComponent is also a row_id in row set member of interaction. With the interaction table two things are interacting (why is row set needed)? What did you have in mind for interaction type (protein-DNA? or more detailed) and effector action type (inhibits)? I am confused about why it is not possible to build up sequential interactions using just single interacting components (see below). Then maybe use pathwayinteraction and pathway (even if the pathway just consists of a A binds B which binds C). Or do you want to model biological reactions which seems sequential to me like a pathway ? (This allows >>> us to represent >>> the interaction of a set of objects (the effector) with another set >>> of objects >>> (the target.) Previously the Interaction table could only >>> represent the interaction >>> between a single pair of entities (OK if they happened to be >>> Complexes, for example, >>> but a potential problem in other situations.) but a potential problem in other situations? What are more of these? (Although the current schema lets us group effectors together, it > doesn't let > us say (for example) that E1 interacts *directly* with T1 to > phosphorylate > it, but that E1's active site is only exposed when E1 is bound to E2. In > other words, E1's role in the activity can be viewed as "primary", and > E2's > role is secondary (in some sense) but all we can say in the schema is > that > the Complex consisting of E1 and E2 interacts with T1 to phosphorylate > it.) The above case: E2 interacts with E1 (they also happen to be represented by a complex; but now consider this only an interaction); then E1 interacts with T1 to modify it. E2 affects E1; then E1 affects T1. or Protein X (effector) interacts with protein Y (target); protein Y (effector ) modifies protein Z (target). I guess I have a problem with the E1-E2 concept (or multiple effectors if one does not effect or interact with a target directly) in the interaction table. I guess I may also think of Complex and Interaction more separately. For example, the TFIID complex has several components; the complex consists of several proteins (protein-protein interactions; complex type - protein) of which there is no known direction (effector-target concept at least for now). This complex with its complexComponents (which we represent) can than interact with a DNA sequence (target). Interaction type (protein-DNA); (effector action type - binds?). Although, in this example, we know that we would also have the TATA-binding protein (TBP; effector) interacting with the DNA sequence target as an entry in interaction (separate from the entry that the complex TFIID can interact with the DNA target). Also the other component interactions ...TBP-associated factor 70 interacts with TBP .(direction not known) and so on ... if all the interactions individually are known to define the complex entirely. Joan Arnaud Kerhornou wrote: > Hi Jonathan > > Jonathan Crabtree wrote: > > > > > Arnaud- > > > > Thanks for the feedback; I think we're getting close to agreement here. > > I think so too ! > > >> I have noticed that your changes don't cover the DNA/RNA features. Is > >> there any reason for this ? I know there are quite a lot of them and > >> there might be another way of storing data some information such as > >> telomere or centromere regions, origin of replication, inflection > >> point etc. All these features are covered by Sequence Ontology, so a > >> new ChromosomeElement or ChromosomeRegion feature could be generic > >> enough to cover most of them. > >> Let me know what you think. > > > > > > Which DNA/RNA features do you mean (other than those mentioned above)? > > The file I sent you should include views on the top of NAFeatureImp > table. Here the list : > > * ChromosomeElement or we can keep CentromereFeature and TelomereFeature > as they are in gusdev - IMPORTANT > * InfectionPointFeature > * ReplicationFeature, for annotated origins of replication > * RNARegulatory - as there is a DNARegulatory feature => regulatory > element at the RNA level > * RNASecondaryStructure > * SpliceSiteFeature > * TransposableElement > > + an extra attribute in RestrictionFragmentFeature, "type_of_cut" > (Sticky or blunt) > + an extra attribute in GeneSynonym, "is_obsolete" > > + a new view on the top of NASequenceImp, "GenomicSequence" instead of > the existing one, ExternalNASequence. > > I can send the files to you if you want. > > > > > It's possible that I misplaced the e-mail or notes where we discussed > > these. Or are you just saying that we will eventually have a view for > > each type of DNA/RNA feature in the Sequence Ontology? I think that > > this is true, although I hadn't planned to make the change immediately, > > since I believe we had agreed on a "transitional" period in which the > > various NAFeature views would first be given a nullable > > sequence_ontology_id > > Yes we had! So regarding chromosome regions, shall we keep > TelomereFeature and CentromereFeature ? > > > and we would then decide how to best rearrange the views to more closely > > match the ontology terms. I haven't added the sequence_ontology_id > > column to the NAFeature views, but I will do so right away. We do > > currently have some relevant NAFeature views in gusdev that have not > > been migrated into 3.0: > > > > CentromereFeature > > LowComplexityNAFeature > > ScaffoldGapFeature > > TelomereFeature > > > > I have no objection to merging the telomere and centromere features into > > a single view--along with any other chromosomal regions covered by the > > ontology--although it would mean that we wouldn't have a 1-1 mapping > > between sequence ontology terms and views on NAFeature. I think that > > at one point this was proposed as the eventual goal of the rearrangement. > > Anyway, given that I'm not certain of the plan here, I'm going to add > > the sequence_ontology_id column but leave the views unchanged for now. > > They can easily be changed without interfering with our data migration, > > so their fate doesn't have to be settled immediately. We have yet to > > establish a consistent set of rules for deciding when different types > > of features get grouped into a single view and when they get their own > > views, so this is probably a good opportunity to settle the question > > once and for all. The Sequence Ontology is big enough that we probably > > *don't* want a view for each and every term in the ontology; it would > > make maintenance quite difficult. But we could, for example, create a > > view for every top-level (or second-level) sequence ontology term. > > However, even a relatively high-level feature like "chromosomal region" > > (SO:0000711) looks like it's already a 4th or 5th level feature... > > > At > > the other extreme, we could continue what we're doing now, i.e. using > > an ad-hoc classification of features based on the data we actually have > > available, and just make sure that every feature is tagged with the > > correct sequence ontology term. Any thoughts? > > It makes sense as SO may undergo revisions this year. > > > > >>> > >>> alter table DOTS.PROTEINPROPERTY add constraint PROTEINPROPERTY_CK01 > >>> check (property_name in ('isoelectric point', 'molecular mass', > >>> 'charge', 'average residue mass')); > >>> > >>> The table allows multiple protein properties of the same type to be > >>> associated with > >>> entries in DoTS.AASequenceImp. Arnaud had suggested originally that > >>> the last property, average residue mass, could actually be an > >>> attribute of the table that stores the protein sequence itself. > >>> However, it seemed that if the molecular mass attribute could have > >>> multiple values (e.g., from different experiments) then > >>> the same should be true of the average residue mass, which is > >>> essentially a derived property. Let me know if you disagree with > >>> this, or think we should create an explicit controlled vocab. for > >>> these 4 properties. > >>> > >>> > >> A controlled vocabulary table with the four attributes you've > >> mentioned is fine. > > > > > > OK, I'll make this change. > > > >>> -Protein features > >>> *Signal peptide features (stored in DoTS.SignalPeptideFeature) > >>> This view exists already, as DoTS.SignalPeptideFeature, but we need > >>> to add the > >>> ability to store curated data, such as targetting information. It > >>> should be straightforward to modify the view to accomodate this, > >>> but I'm not sure exactly > >>> what needs to be stored. Currently we use the view exclusively for > >>> SignalP > >>> predictions, and from what I understand SignalP is only concerned > >>> with predicting > >>> secreted proteins, meaning that we don't currently have any > >>> explicit targetting information. Is this something we could > >>> represent using the GO ontology for cellular localization? Do we > >>> also need some free text columns? Let me know and I'll make > >>> the changes. All the SignalP-specific columns appear to be > >>> nullable, so we don't > >>> necessarily have to do anything except add the new columns for the > >>> manually curated > >>> information. > >>> > >>> > >> After talking to the curators it appears that GO component suplements > >> targetting information at the feature level but will not be enough. > >> The targeting information is represented by the component ontology in > >> one context i.e. mitochondrial, nuclear, membrane localization but > >> not in the context of the actual residues involved. > >> The actual residues involved in the targeting (or any other > >> phenomena) need to be represented by a protein feature ontology can > >> be mapped onto specific amino acids of a protein. > >> This ontology is the equivalent of Sequence Ontology (SO) which is > >> meant for DNA features. It is being prepared by Val Wood with input > >> from Swiss-prot. > > > > > > OK, so the idea is that the various signal peptides have been classified > > into named classes that should be represented by some kind of ontology? > > > >> As you're going to add a extra attribute sequence_ontology_id to the > >> NA Features, could you do the same to any AA Features ? > > > > > > This will only work if the new ontology is actually part of the Sequence > > Ontology (or if we use the SequenceOntology table to store both > > ontologies.) > > Do you know if this is the case? It's quite possible, since the SO does > > already cover amino acid features. Otherwise we'll have to create a > > separate AASequenceOntology (or whatever the new ontology is called). > > It is at the moment a different project but it would make sense they > merge in the future. Just to give you an idea about Localization > Signals, here is a snapshot: > > %localization signal > %N-terminal signal sequence > %nuclear localization signal > %bipartite nuclear localization signal > %etc > %mitochondrial localization sequence > %thylakoid localization signal > %ER retention signal > > The way the SignalPeptideFeature is designed make difficult the > annotation of localization signal features. We can leave > SignalPeptideFeature as it is as it fits with SignalP software > prediction and in the future create a new feature LocalizationSignalFeature. > > > > >>> *Transmembrane domain features (stored in DoTS.PredictedAAFeature) > >>> "PlasmoDB web site shows hydrophobicity graphics, where is it > >>> stored in GUS?" > >>> The hydrophobicity plots are computed dynamically based on the > >>> amino acid sequence. > >>> Transmembrane domains are currently stored in the > >>> PredictedAAFeature view, although > >>> I will probably create a new view for them when I get around to > >>> eliminating PredictedAAFeature. Another possibility would be to > >>> treat TM domains as another > >>> type of domain, and store them in DomainFeature. What do you think > >>> about this? > >>> > >>> > >> I reckon they could be merged. > > > > > > OK, sounds good. > > > >>> *Post-translational modification features (new view: > >>> DoTS:PostTranslationalModFeature) > >>> Has a "type" column to represent the type of modification. It was > >>> also suggested > >>> that we have a column called "modified_by", which would be a > >>> reference to the Interaction table. However, isn't it possible > >>> that the same post-translational > >>> modification (e.g., phosphorylation of a specific amino acid) could > >>> be the result > >>> of one of several Interactions? > >> > >> yes you're right, the effector could be different. In that case the > >> attribute > >> "modified_by" is not useful. > >> > >>> This argues for an additional relationship between Interaction and > >>> PostTranslationalModFeature, unless we're OK creating multiple > >>> PostTranslationalModFeatures, identical except for their modified_by > >>> attribute. Comments on this? > >>> > >>> > >> I don't think they should be duplicated as they corresponds to a > >> unique site. This unique feature would > >> be associated with different interaction entries. We might not need > >> an extra table between Interaction and PostTranslationalModFeature > >> though. We still can do the following query : "give me all the > >> interaction entries which target is a PostTranslationalModFeature > >> which id is ...". > >> How does it sound ? > > > > > > We could do this, although one question is whether, semantically > > speaking, > > the "target" of an Interaction should be "the thing to be modified" > > (e.g. an > > unphosphorylated sequence or residue) or "the resulting modification" > > (e.g. > > the feature that represents a phosphorylated residue at the appropriate > > location.) The answer is probably that we just shouldn't worry about it > > and should just do whatever is most convenient on a case-by-case basis. > > To do it "correctly" would be problematic either way. For example, if we > > say that the target is the thing to be modified, then we have to create a > > feature that represents a region of sequence that *could* be modified in > > some way and then create another feature to represent the actual > > modification. > > But if we say that the target is the result of the modification then > > we may > > have to create equally unusual tables/views. For example, if the > > result of > > a given interaction is to degrade a protein, then do we have to create a > > table/object that represents a degraded protein (or "nothing", or > > whatever > > it is that's left after the modification)? For now I have no problem > > with > > interpreting the "target" based on context, but in the longer term we may > > want to consider separating the notions of "target prior to modification" > > and either "target after modification" or "effect of modification". > > > > I also realized belatedly that I could have left the Interaction table > > unchanged, rather than introducing specific references to RowSet. This > > would have allowed us to represent either singleton effectors/targets or > > set-valued effectors/targets, without having to always join through > > RowSet > > in the singleton case. On the other hand, if we do associate some > > additional information with the RowSets, then the current representation > > is correct. > > It depends if we want to represent many-to-many relationship between > interaction and members of this interaction. Without the RowSet table, > we can't assign a set of several effectors/targets, right ? Unless we > consider that this set of effectors are being part of a complex and act > as the whole. > > > > >>> *AA repeats (new view: RepeatRegionAAFeature) > >>> I called this view RepeatRegionAAFeature in case we want to have a > >>> similar view > >>> for NASequences. I also created only a single view, instead of > >>> following Arnaud's > >>> original suggestion, which was for both: > >>> > >>> * RepeatRegionFeature as a set of RepeatUnitFeatures, > >>> * RepeatUnitFeature, with the consensus sequence, name and size > >>> > >>> I based the design of this view on that of TandemRepeatFeature, > >>> which we have for > >>> NASequences already. Instead of splitting the consensus sequence, > >>> name, and size > >>> into a separate table, they occupy columns in > >>> RepeatRegionAAFeature. This works > >>> quite well for the tandem repeats we already have (for DNA > >>> sequences.) However, if > >>> there is a known set of named amino acid sequence repeats, then it > >>> would probably > >>> make sense to do what Arnaud suggested, and store these in a > >>> separate table (likely named RepeatUnit, not RepeatUnitFeature, > >>> since they would have no unique locations.) Does this sound > >>> reasonable? That is, put the consensus seqs in the > >>> repeat region table itself if they're anonymous, but if they've > >>> been named, then store them in a separate table. Also note that > >>> this view has a reference to RepeatType, although the current > >>> contents of this table are probably applicable only to DNA sequence > >>> repeats (LINEs, SINEs, ALUs, etc.), since I believe that I parsed > >>> them out of RepBase. > >>> > >>> > >> I proposed a separate repeat feature because one may want to annotate > >> a repeat outside a repeat region, for example LTR repeats attached to > >> a given transposable element. These RepeatFeatures or > >> RepeatUnitFeatures can then have a location. > >> The other case is when a repeat region is made of a set of different > >> repeat units. > > > > > > OK, I didn't realize that this was what you were trying to represent. As > > currently conceived, RepeatRegionAAFeature is meant to represent a region > > that contains one or more immediately adjacent copies of the same type > > of (amino acid sequence) repeat. The assumption is also that these > > regions > > will typically be maximal (with respect to the chosen repeat type, > > consensus, > > and max. mismatch, the last of which is not represented directly in the > > table.) We can still represent more complex repeat structures using this > > single table, but the representation is implicit, not explicit (i.e. you > > have to do a query to find out what other repeats lie within the > > bounds of > > the transposon, meaning that there's no easy way to query for all > > transposable > > elements with a particular flanking LTR structure.) Do you want to > > come up > > with a 2-table version of what I've done? The use cases aren't clear > > enough > > in my mind yet for me to be able to do it. It seems that the bare > > minimum we > > need is just another column in the RepeatRegionAAFeature, parent_id; > > which > > would let us represent explicitly that a particular repeat is a > > *necessary* > > (versus incidental) component of another NA/AAFeature. Both AAFeatureImp > > and NAFeatureImp already have a parent_id, so this would be a > > straightforward > > change. The queries still might not be terribly efficient, but I > > don't know > > what exactly you wanted to support in terms of queries, versus just > > making > > sure that the representation is sufficiently rich to capture the > > structure. > > A case we came across here for Tbrucei is nested repeat regions (at the > DNA level). Each repeat region has coordinates and is annotated with a > unique repeat unit type. This repeat region can be within a bigger > repeat region annotated with a different repeat unit type. > ... which is in other words your suggestion with parent_id as an extra > attribute ... > > Regarding transposon repeat types, if we have a TransposableElement > feature and its type is given as an attribute, a repeat feature will > just be useful to locate the LTRs within a given a transposable element. > Can we keep this functionality ? Then the feature will be simple, just a > repeat_type, and a parent_id atributes. > > > > >> In any case, NA repeats and AA repeats should have the same design. > >> Just the controlled vocabulary representing the types of repeats will > >> be different. > > > > > > Absolutely, yes, although one question is whether AA repeats can have the > > same kind of nested structure that you mention as a possibility for NA > > repeats (the transposon with LTRs). I don't know the answer to this. > > > >>> -DoTS.Interaction (table modified, dependent tables added) > >>> *Added "has_direction" column, as discussed previously. The idea > >>> here is that > >>> not all interactions (particularly physical ones, e.g., > >>> dimerization) have a > >>> direction. If has_direction == 0, then the value of > >>> direction_is_known can > >>> be ignored. > >>> *Added non-nullable "effector_action_type_id" column, referencing > >>> DoTS.EffectorActionType (a new table.) This column/table > >>> represents the possible > >>> things that an effector can do to a target. For example, the > >>> InteractionType > >>> associated with the Interaction could be "binds to" (e.g., a > >>> promoter region), and > >>> the EffectorActionType for that Interaction could be to either > >>> "inhibit" or "enhance" > >>> expression of the coresponding gene. > >>> *Replaced effector_table_id and effector_row_id with > >>> effector_row_set_id, and > >>> similarly for the target_table_id and target_row_id. This allows > >>> us to represent > >>> the interaction of a set of objects (the effector) with another set > >>> of objects > >>> (the target.) Previously the Interaction table could only > >>> represent the interaction > >>> between a single pair of entities (OK if they happened to be > >>> Complexes, for example, > >>> but a potential problem in other situations.) Now both effector > >>> and target are represented as references to DoTS.RowSet, which in > >>> tun references DoTS.RowSetMember, > >>> which...in turn...references the individual database rows that > >>> comprise the effector > >>> or target. These tables (RowSet and RowSetMember) are essentially > >>> the same as Complex and ComplexComponent, except that they are > >>> totally generic; they can be used to group any set of rows in the > >>> database and they store no additional information. However, if > >>> there are any additional columns that we can think of (that are > >>> specific to Interactions) then these tables should be replaced by > >>> less generic ones (e.g. InteractingEntitySet or InteractionSet, or > >>> something along those lines.) > >>> > >>> > >> Sounds fine. The only thing I can see is regarding the > >> EffectorActionType. If each effector, member of a RowSet, has a > >> different action type, the attribute, effector_action_type_id, should > >> go in the RowSetMember table. I don't have any example though. > > > > > > OK, I think I'd be inclined to wait until we have some use cases for > > this. > > Although the current schema lets us group effectors together, it > > doesn't let > > us say (for example) that E1 interacts *directly* with T1 to > > phosphorylate > > it, but that E1's active site is only exposed when E1 is bound to E2. In > > other words, E1's role in the activity can be viewed as "primary", and > > E2's > > role is secondary (in some sense) but all we can say in the schema is > > that > > the Complex consisting of E1 and E2 interacts with T1 to phosphorylate > > it. > > I think that the solution we have now is OK, but it only lets us > > represent > > the overall action of the entire set of effectors. > > Let's leave the design as it is for now. Curators are not going to > curate interactions data in the short term. We shall come back later > with more precise ideas/use cases about them. > > > > > Jonathan > > > > Arnaud > > ------------------------------------------------------- > This SF.NET email is sponsored by: Thawte.com > Understand how to protect your customers personal information by implementing > SSL on your Apache Web Server. Click here to get our FREE Thawte Apache > Guide: http://ads.sourceforge.net/cgi-bin/redirect.pl?thaw0029en > _______________________________________________ > Gusdev-gusdev mailing list > Gus...@li... > https://lists.sourceforge.net/lists/listinfo/gusdev-gusdev |
From: Jonathan C. <cra...@pc...> - 2003-01-15 16:18:32
|
Arnaud- Thanks for the feedback; I think we're getting close to agreement here. > I have noticed that your changes don't cover the DNA/RNA features. Is > there any reason for this ? I know there are quite a lot of them and > there might be another way of storing data some information such as > telomere or centromere regions, origin of replication, inflection point > etc. All these features are covered by Sequence Ontology, so a new > ChromosomeElement or ChromosomeRegion feature could be generic enough to > cover most of them. > Let me know what you think. Which DNA/RNA features do you mean (other than those mentioned above)? It's possible that I misplaced the e-mail or notes where we discussed these. Or are you just saying that we will eventually have a view for each type of DNA/RNA feature in the Sequence Ontology? I think that this is true, although I hadn't planned to make the change immediately, since I believe we had agreed on a "transitional" period in which the various NAFeature views would first be given a nullable sequence_ontology_id and we would then decide how to best rearrange the views to more closely match the ontology terms. I haven't added the sequence_ontology_id column to the NAFeature views, but I will do so right away. We do currently have some relevant NAFeature views in gusdev that have not been migrated into 3.0: CentromereFeature LowComplexityNAFeature ScaffoldGapFeature TelomereFeature I have no objection to merging the telomere and centromere features into a single view--along with any other chromosomal regions covered by the ontology--although it would mean that we wouldn't have a 1-1 mapping between sequence ontology terms and views on NAFeature. I think that at one point this was proposed as the eventual goal of the rearrangement. Anyway, given that I'm not certain of the plan here, I'm going to add the sequence_ontology_id column but leave the views unchanged for now. They can easily be changed without interfering with our data migration, so their fate doesn't have to be settled immediately. We have yet to establish a consistent set of rules for deciding when different types of features get grouped into a single view and when they get their own views, so this is probably a good opportunity to settle the question once and for all. The Sequence Ontology is big enough that we probably *don't* want a view for each and every term in the ontology; it would make maintenance quite difficult. But we could, for example, create a view for every top-level (or second-level) sequence ontology term. However, even a relatively high-level feature like "chromosomal region" (SO:0000711) looks like it's already a 4th or 5th level feature... At the other extreme, we could continue what we're doing now, i.e. using an ad-hoc classification of features based on the data we actually have available, and just make sure that every feature is tagged with the correct sequence ontology term. Any thoughts? >> >> alter table DOTS.PROTEINPROPERTY add constraint PROTEINPROPERTY_CK01 check >> (property_name in ('isoelectric point', 'molecular mass', 'charge', 'average residue mass')); >> >> The table allows multiple protein properties of the same type to be associated with >> entries in DoTS.AASequenceImp. Arnaud had suggested originally that the last >> property, average residue mass, could actually be an attribute of the table that >> stores the protein sequence itself. However, it seemed that if the molecular >> mass attribute could have multiple values (e.g., from different experiments) then >> the same should be true of the average residue mass, which is essentially a >> derived property. Let me know if you disagree with this, or think we should >> create an explicit controlled vocab. for these 4 properties. >> >> > A controlled vocabulary table with the four attributes you've mentioned > is fine. OK, I'll make this change. >>-Protein features >> *Signal peptide features (stored in DoTS.SignalPeptideFeature) >> This view exists already, as DoTS.SignalPeptideFeature, but we need to add the >> ability to store curated data, such as targetting information. It should be >> straightforward to modify the view to accomodate this, but I'm not sure exactly >> what needs to be stored. Currently we use the view exclusively for SignalP >> predictions, and from what I understand SignalP is only concerned with predicting >> secreted proteins, meaning that we don't currently have any explicit targetting >> information. Is this something we could represent using the GO ontology for cellular >> localization? Do we also need some free text columns? Let me know and I'll make >> the changes. All the SignalP-specific columns appear to be nullable, so we don't >> necessarily have to do anything except add the new columns for the manually curated >> information. >> >> > After talking to the curators it appears that GO component suplements > targetting information at the feature level but will not be enough. > The targeting information is represented by the component ontology in > one context i.e. mitochondrial, nuclear, membrane localization but not > in the context of the actual residues involved. > The actual residues involved in the targeting (or any other phenomena) > need to be represented by a protein feature ontology can be mapped onto > specific amino acids of a protein. > This ontology is the equivalent of Sequence Ontology (SO) which is meant > for DNA features. It is being prepared by Val Wood with input from > Swiss-prot. OK, so the idea is that the various signal peptides have been classified into named classes that should be represented by some kind of ontology? > As you're going to add a extra attribute sequence_ontology_id to the NA > Features, could you do the same to any AA Features ? This will only work if the new ontology is actually part of the Sequence Ontology (or if we use the SequenceOntology table to store both ontologies.) Do you know if this is the case? It's quite possible, since the SO does already cover amino acid features. Otherwise we'll have to create a separate AASequenceOntology (or whatever the new ontology is called). >> *Transmembrane domain features (stored in DoTS.PredictedAAFeature) >> "PlasmoDB web site shows hydrophobicity graphics, where is it stored in GUS?" >> The hydrophobicity plots are computed dynamically based on the amino acid sequence. >> Transmembrane domains are currently stored in the PredictedAAFeature view, although >> I will probably create a new view for them when I get around to eliminating >> PredictedAAFeature. Another possibility would be to treat TM domains as another >> type of domain, and store them in DomainFeature. What do you think about this? >> >> > I reckon they could be merged. OK, sounds good. >> *Post-translational modification features (new view: DoTS:PostTranslationalModFeature) >> Has a "type" column to represent the type of modification. It was also suggested >> that we have a column called "modified_by", which would be a reference to the >> Interaction table. However, isn't it possible that the same post-translational >> modification (e.g., phosphorylation of a specific amino acid) could be the result >> of one of several Interactions? >> > yes you're right, the effector could be different. In that case the > attribute > "modified_by" is not useful. > >> This argues for an additional relationship >> between Interaction and PostTranslationalModFeature, unless we're OK creating >> multiple PostTranslationalModFeatures, identical except for their modified_by >> attribute. Comments on this? >> >> > I don't think they should be duplicated as they corresponds to a unique > site. This unique feature would > be associated with different interaction entries. We might not need an > extra table between Interaction and PostTranslationalModFeature though. > We still can do the following query : "give me all the interaction > entries which target is a PostTranslationalModFeature which id is ...". > How does it sound ? We could do this, although one question is whether, semantically speaking, the "target" of an Interaction should be "the thing to be modified" (e.g. an unphosphorylated sequence or residue) or "the resulting modification" (e.g. the feature that represents a phosphorylated residue at the appropriate location.) The answer is probably that we just shouldn't worry about it and should just do whatever is most convenient on a case-by-case basis. To do it "correctly" would be problematic either way. For example, if we say that the target is the thing to be modified, then we have to create a feature that represents a region of sequence that *could* be modified in some way and then create another feature to represent the actual modification. But if we say that the target is the result of the modification then we may have to create equally unusual tables/views. For example, if the result of a given interaction is to degrade a protein, then do we have to create a table/object that represents a degraded protein (or "nothing", or whatever it is that's left after the modification)? For now I have no problem with interpreting the "target" based on context, but in the longer term we may want to consider separating the notions of "target prior to modification" and either "target after modification" or "effect of modification". I also realized belatedly that I could have left the Interaction table unchanged, rather than introducing specific references to RowSet. This would have allowed us to represent either singleton effectors/targets or set-valued effectors/targets, without having to always join through RowSet in the singleton case. On the other hand, if we do associate some additional information with the RowSets, then the current representation is correct. >> *AA repeats (new view: RepeatRegionAAFeature) >> I called this view RepeatRegionAAFeature in case we want to have a similar view >> for NASequences. I also created only a single view, instead of following Arnaud's >> original suggestion, which was for both: >> >> * RepeatRegionFeature as a set of RepeatUnitFeatures, >> * RepeatUnitFeature, with the consensus sequence, name and size >> >> I based the design of this view on that of TandemRepeatFeature, which we have for >> NASequences already. Instead of splitting the consensus sequence, name, and size >> into a separate table, they occupy columns in RepeatRegionAAFeature. This works >> quite well for the tandem repeats we already have (for DNA sequences.) However, if >> there is a known set of named amino acid sequence repeats, then it would probably >> make sense to do what Arnaud suggested, and store these in a separate table >> (likely named RepeatUnit, not RepeatUnitFeature, since they would have no unique >> locations.) Does this sound reasonable? That is, put the consensus seqs in the >> repeat region table itself if they're anonymous, but if they've been named, then >> store them in a separate table. Also note that this view has a reference to >> RepeatType, although the current contents of this table are probably applicable >> only to DNA sequence repeats (LINEs, SINEs, ALUs, etc.), since I believe that I >> parsed them out of RepBase. >> >> > I proposed a separate repeat feature because one may want to annotate a > repeat outside a repeat region, for example LTR repeats attached to a > given transposable element. These RepeatFeatures or RepeatUnitFeatures > can then have a location. > The other case is when a repeat region is made of a set of different > repeat units. OK, I didn't realize that this was what you were trying to represent. As currently conceived, RepeatRegionAAFeature is meant to represent a region that contains one or more immediately adjacent copies of the same type of (amino acid sequence) repeat. The assumption is also that these regions will typically be maximal (with respect to the chosen repeat type, consensus, and max. mismatch, the last of which is not represented directly in the table.) We can still represent more complex repeat structures using this single table, but the representation is implicit, not explicit (i.e. you have to do a query to find out what other repeats lie within the bounds of the transposon, meaning that there's no easy way to query for all transposable elements with a particular flanking LTR structure.) Do you want to come up with a 2-table version of what I've done? The use cases aren't clear enough in my mind yet for me to be able to do it. It seems that the bare minimum we need is just another column in the RepeatRegionAAFeature, parent_id; which would let us represent explicitly that a particular repeat is a *necessary* (versus incidental) component of another NA/AAFeature. Both AAFeatureImp and NAFeatureImp already have a parent_id, so this would be a straightforward change. The queries still might not be terribly efficient, but I don't know what exactly you wanted to support in terms of queries, versus just making sure that the representation is sufficiently rich to capture the structure. > In any case, NA repeats and AA repeats should have the same design. Just > the controlled vocabulary representing the types of repeats will be > different. Absolutely, yes, although one question is whether AA repeats can have the same kind of nested structure that you mention as a possibility for NA repeats (the transposon with LTRs). I don't know the answer to this. >>-DoTS.Interaction (table modified, dependent tables added) >> *Added "has_direction" column, as discussed previously. The idea here is that >> not all interactions (particularly physical ones, e.g., dimerization) have a >> direction. If has_direction == 0, then the value of direction_is_known can >> be ignored. >> *Added non-nullable "effector_action_type_id" column, referencing >> DoTS.EffectorActionType (a new table.) This column/table represents the possible >> things that an effector can do to a target. For example, the InteractionType >> associated with the Interaction could be "binds to" (e.g., a promoter region), and >> the EffectorActionType for that Interaction could be to either "inhibit" or "enhance" >> expression of the coresponding gene. >> *Replaced effector_table_id and effector_row_id with effector_row_set_id, and >> similarly for the target_table_id and target_row_id. This allows us to represent >> the interaction of a set of objects (the effector) with another set of objects >> (the target.) Previously the Interaction table could only represent the interaction >> between a single pair of entities (OK if they happened to be Complexes, for example, >> but a potential problem in other situations.) Now both effector and target are >> represented as references to DoTS.RowSet, which in tun references DoTS.RowSetMember, >> which...in turn...references the individual database rows that comprise the effector >> or target. These tables (RowSet and RowSetMember) are essentially the same as >> Complex and ComplexComponent, except that they are totally generic; they can be >> used to group any set of rows in the database and they store no additional information. >> However, if there are any additional columns that we can think of (that are specific >> to Interactions) then these tables should be replaced by less generic ones (e.g. >> InteractingEntitySet or InteractionSet, or something along those lines.) >> >> > Sounds fine. The only thing I can see is regarding the > EffectorActionType. If each effector, member of a RowSet, has a > different action type, the attribute, effector_action_type_id, should go > in the RowSetMember table. I don't have any example though. OK, I think I'd be inclined to wait until we have some use cases for this. Although the current schema lets us group effectors together, it doesn't let us say (for example) that E1 interacts *directly* with T1 to phosphorylate it, but that E1's active site is only exposed when E1 is bound to E2. In other words, E1's role in the activity can be viewed as "primary", and E2's role is secondary (in some sense) but all we can say in the schema is that the Complex consisting of E1 and E2 interacts with T1 to phosphorylate it. I think that the solution we have now is OK, but it only lets us represent the overall action of the entire set of effectors. Jonathan -- Jonathan Crabtree Center for Bioinformatics, University of Pennsylvania 1406 Blockley Hall, 423 Guardian Drive Philadelphia, PA 19104-6021 215-573-3115 |
From: Joan M. <ma...@pc...> - 2003-01-15 18:33:15
|
Hi Arnand, I have been very busy and have not had the time to follow these thread messages, completely but I have a request for GUS30. Since there have been many controlled vocabulary tables created, perhaps some that are not covered by the existing ontologies (e.gs., DoTS::InteractionType, DoTS::EffectorActionType, DoTS::ComplexType, also any that have been mentioned previously, see below). Could you provide the terms and definitions that will be in these tables as the controlled vocabularies, this would best be in an XML format representing the table, so the table can be populated by a plugin, and also document these tables using the format that is required by the documentation plugin (I believe when you were here we mentioned this plugin). In addition, if other tables have been created by Crabtree for you please do this for the documentation of these tables. If you had already planned to do this then sorry for the push. Thanks, Joan Jonathan Crabtree wrote: > Arnaud- > > Thanks for the feedback; I think we're getting close to agreement here. > > > I have noticed that your changes don't cover the DNA/RNA features. Is > > there any reason for this ? I know there are quite a lot of them and > > there might be another way of storing data some information such as > > telomere or centromere regions, origin of replication, inflection point > > etc. All these features are covered by Sequence Ontology, so a new > > ChromosomeElement or ChromosomeRegion feature could be generic enough to > > cover most of them. > > Let me know what you think. > > Which DNA/RNA features do you mean (other than those mentioned above)? > It's possible that I misplaced the e-mail or notes where we discussed > these. Or are you just saying that we will eventually have a view for > each type of DNA/RNA feature in the Sequence Ontology? I think that > this is true, although I hadn't planned to make the change immediately, > since I believe we had agreed on a "transitional" period in which the > various NAFeature views would first be given a nullable sequence_ontology_id > and we would then decide how to best rearrange the views to more closely > match the ontology terms. I haven't added the sequence_ontology_id > column to the NAFeature views, but I will do so right away. We do > currently have some relevant NAFeature views in gusdev that have not > been migrated into 3.0: > > CentromereFeature > LowComplexityNAFeature > ScaffoldGapFeature > TelomereFeature > > I have no objection to merging the telomere and centromere features into > a single view--along with any other chromosomal regions covered by the > ontology--although it would mean that we wouldn't have a 1-1 mapping > between sequence ontology terms and views on NAFeature. I think that > at one point this was proposed as the eventual goal of the rearrangement. > Anyway, given that I'm not certain of the plan here, I'm going to add > the sequence_ontology_id column but leave the views unchanged for now. > They can easily be changed without interfering with our data migration, > so their fate doesn't have to be settled immediately. We have yet to > establish a consistent set of rules for deciding when different types > of features get grouped into a single view and when they get their own > views, so this is probably a good opportunity to settle the question > once and for all. The Sequence Ontology is big enough that we probably > *don't* want a view for each and every term in the ontology; it would > make maintenance quite difficult. But we could, for example, create a > view for every top-level (or second-level) sequence ontology term. > However, even a relatively high-level feature like "chromosomal region" > (SO:0000711) looks like it's already a 4th or 5th level feature... At > the other extreme, we could continue what we're doing now, i.e. using > an ad-hoc classification of features based on the data we actually have > available, and just make sure that every feature is tagged with the > correct sequence ontology term. Any thoughts? > > >> > >> alter table DOTS.PROTEINPROPERTY add constraint PROTEINPROPERTY_CK01 check > >> (property_name in ('isoelectric point', 'molecular mass', 'charge', 'average residue mass')); > >> > >> The table allows multiple protein properties of the same type to be associated with > >> entries in DoTS.AASequenceImp. Arnaud had suggested originally that the last > >> property, average residue mass, could actually be an attribute of the table that > >> stores the protein sequence itself. However, it seemed that if the molecular > >> mass attribute could have multiple values (e.g., from different experiments) then > >> the same should be true of the average residue mass, which is essentially a > >> derived property. Let me know if you disagree with this, or think we should > >> create an explicit controlled vocab. for these 4 properties. > >> > >> > > A controlled vocabulary table with the four attributes you've mentioned > > is fine. > > OK, I'll make this change. > > >>-Protein features > >> *Signal peptide features (stored in DoTS.SignalPeptideFeature) > >> This view exists already, as DoTS.SignalPeptideFeature, but we need to add the > >> ability to store curated data, such as targetting information. It should be > >> straightforward to modify the view to accomodate this, but I'm not sure exactly > >> what needs to be stored. Currently we use the view exclusively for SignalP > >> predictions, and from what I understand SignalP is only concerned with predicting > >> secreted proteins, meaning that we don't currently have any explicit targetting > >> information. Is this something we could represent using the GO ontology for cellular > >> localization? Do we also need some free text columns? Let me know and I'll make > >> the changes. All the SignalP-specific columns appear to be nullable, so we don't > >> necessarily have to do anything except add the new columns for the manually curated > >> information. > >> > >> > > After talking to the curators it appears that GO component suplements > > targetting information at the feature level but will not be enough. > > The targeting information is represented by the component ontology in > > one context i.e. mitochondrial, nuclear, membrane localization but not > > in the context of the actual residues involved. > > The actual residues involved in the targeting (or any other phenomena) > > need to be represented by a protein feature ontology can be mapped onto > > specific amino acids of a protein. > > This ontology is the equivalent of Sequence Ontology (SO) which is meant > > for DNA features. It is being prepared by Val Wood with input from > > Swiss-prot. > > OK, so the idea is that the various signal peptides have been classified > into named classes that should be represented by some kind of ontology? > > > As you're going to add a extra attribute sequence_ontology_id to the NA > > Features, could you do the same to any AA Features ? > > This will only work if the new ontology is actually part of the Sequence > Ontology (or if we use the SequenceOntology table to store both ontologies.) > Do you know if this is the case? It's quite possible, since the SO does > already cover amino acid features. Otherwise we'll have to create a > separate AASequenceOntology (or whatever the new ontology is called). > > >> *Transmembrane domain features (stored in DoTS.PredictedAAFeature) > >> "PlasmoDB web site shows hydrophobicity graphics, where is it stored in GUS?" > >> The hydrophobicity plots are computed dynamically based on the amino acid sequence. > >> Transmembrane domains are currently stored in the PredictedAAFeature view, although > >> I will probably create a new view for them when I get around to eliminating > >> PredictedAAFeature. Another possibility would be to treat TM domains as another > >> type of domain, and store them in DomainFeature. What do you think about this? > >> > >> > > I reckon they could be merged. > > OK, sounds good. > > >> *Post-translational modification features (new view: DoTS:PostTranslationalModFeature) > >> Has a "type" column to represent the type of modification. It was also suggested > >> that we have a column called "modified_by", which would be a reference to the > >> Interaction table. However, isn't it possible that the same post-translational > >> modification (e.g., phosphorylation of a specific amino acid) could be the result > >> of one of several Interactions? > >> > > yes you're right, the effector could be different. In that case the > > attribute > > "modified_by" is not useful. > > > >> This argues for an additional relationship > >> between Interaction and PostTranslationalModFeature, unless we're OK creating > >> multiple PostTranslationalModFeatures, identical except for their modified_by > >> attribute. Comments on this? > >> > >> > > I don't think they should be duplicated as they corresponds to a unique > > site. This unique feature would > > be associated with different interaction entries. We might not need an > > extra table between Interaction and PostTranslationalModFeature though. > > We still can do the following query : "give me all the interaction > > entries which target is a PostTranslationalModFeature which id is ...". > > How does it sound ? > > We could do this, although one question is whether, semantically speaking, > the "target" of an Interaction should be "the thing to be modified" (e.g. an > unphosphorylated sequence or residue) or "the resulting modification" (e.g. > the feature that represents a phosphorylated residue at the appropriate > location.) The answer is probably that we just shouldn't worry about it > and should just do whatever is most convenient on a case-by-case basis. > To do it "correctly" would be problematic either way. For example, if we > say that the target is the thing to be modified, then we have to create a > feature that represents a region of sequence that *could* be modified in > some way and then create another feature to represent the actual modification. > But if we say that the target is the result of the modification then we may > have to create equally unusual tables/views. For example, if the result of > a given interaction is to degrade a protein, then do we have to create a > table/object that represents a degraded protein (or "nothing", or whatever > it is that's left after the modification)? For now I have no problem with > interpreting the "target" based on context, but in the longer term we may > want to consider separating the notions of "target prior to modification" > and either "target after modification" or "effect of modification". > > I also realized belatedly that I could have left the Interaction table > unchanged, rather than introducing specific references to RowSet. This > would have allowed us to represent either singleton effectors/targets or > set-valued effectors/targets, without having to always join through RowSet > in the singleton case. On the other hand, if we do associate some > additional information with the RowSets, then the current representation > is correct. > > >> *AA repeats (new view: RepeatRegionAAFeature) > >> I called this view RepeatRegionAAFeature in case we want to have a similar view > >> for NASequences. I also created only a single view, instead of following Arnaud's > >> original suggestion, which was for both: > >> > >> * RepeatRegionFeature as a set of RepeatUnitFeatures, > >> * RepeatUnitFeature, with the consensus sequence, name and size > >> > >> I based the design of this view on that of TandemRepeatFeature, which we have for > >> NASequences already. Instead of splitting the consensus sequence, name, and size > >> into a separate table, they occupy columns in RepeatRegionAAFeature. This works > >> quite well for the tandem repeats we already have (for DNA sequences.) However, if > >> there is a known set of named amino acid sequence repeats, then it would probably > >> make sense to do what Arnaud suggested, and store these in a separate table > >> (likely named RepeatUnit, not RepeatUnitFeature, since they would have no unique > >> locations.) Does this sound reasonable? That is, put the consensus seqs in the > >> repeat region table itself if they're anonymous, but if they've been named, then > >> store them in a separate table. Also note that this view has a reference to > >> RepeatType, although the current contents of this table are probably applicable > >> only to DNA sequence repeats (LINEs, SINEs, ALUs, etc.), since I believe that I > >> parsed them out of RepBase. > >> > >> > > I proposed a separate repeat feature because one may want to annotate a > > repeat outside a repeat region, for example LTR repeats attached to a > > given transposable element. These RepeatFeatures or RepeatUnitFeatures > > can then have a location. > > The other case is when a repeat region is made of a set of different > > repeat units. > > OK, I didn't realize that this was what you were trying to represent. As > currently conceived, RepeatRegionAAFeature is meant to represent a region > that contains one or more immediately adjacent copies of the same type > of (amino acid sequence) repeat. The assumption is also that these regions > will typically be maximal (with respect to the chosen repeat type, consensus, > and max. mismatch, the last of which is not represented directly in the > table.) We can still represent more complex repeat structures using this > single table, but the representation is implicit, not explicit (i.e. you > have to do a query to find out what other repeats lie within the bounds of > the transposon, meaning that there's no easy way to query for all transposable > elements with a particular flanking LTR structure.) Do you want to come up > with a 2-table version of what I've done? The use cases aren't clear enough > in my mind yet for me to be able to do it. It seems that the bare minimum we > need is just another column in the RepeatRegionAAFeature, parent_id; which > would let us represent explicitly that a particular repeat is a *necessary* > (versus incidental) component of another NA/AAFeature. Both AAFeatureImp > and NAFeatureImp already have a parent_id, so this would be a straightforward > change. The queries still might not be terribly efficient, but I don't know > what exactly you wanted to support in terms of queries, versus just making > sure that the representation is sufficiently rich to capture the structure. > > > In any case, NA repeats and AA repeats should have the same design. Just > > the controlled vocabulary representing the types of repeats will be > > different. > > Absolutely, yes, although one question is whether AA repeats can have the > same kind of nested structure that you mention as a possibility for NA > repeats (the transposon with LTRs). I don't know the answer to this. > > >>-DoTS.Interaction (table modified, dependent tables added) > >> *Added "has_direction" column, as discussed previously. The idea here is that > >> not all interactions (particularly physical ones, e.g., dimerization) have a > >> direction. If has_direction == 0, then the value of direction_is_known can > >> be ignored. > >> *Added non-nullable "effector_action_type_id" column, referencing > >> DoTS.EffectorActionType (a new table.) This column/table represents the possible > >> things that an effector can do to a target. For example, the InteractionType > >> associated with the Interaction could be "binds to" (e.g., a promoter region), and > >> the EffectorActionType for that Interaction could be to either "inhibit" or "enhance" > >> expression of the coresponding gene. > >> *Replaced effector_table_id and effector_row_id with effector_row_set_id, and > >> similarly for the target_table_id and target_row_id. This allows us to represent > >> the interaction of a set of objects (the effector) with another set of objects > >> (the target.) Previously the Interaction table could only represent the interaction > >> between a single pair of entities (OK if they happened to be Complexes, for example, > >> but a potential problem in other situations.) Now both effector and target are > >> represented as references to DoTS.RowSet, which in tun references DoTS.RowSetMember, > >> which...in turn...references the individual database rows that comprise the effector > >> or target. These tables (RowSet and RowSetMember) are essentially the same as > >> Complex and ComplexComponent, except that they are totally generic; they can be > >> used to group any set of rows in the database and they store no additional information. > >> However, if there are any additional columns that we can think of (that are specific > >> to Interactions) then these tables should be replaced by less generic ones (e.g. > >> InteractingEntitySet or InteractionSet, or something along those lines.) > >> > >> > > Sounds fine. The only thing I can see is regarding the > > EffectorActionType. If each effector, member of a RowSet, has a > > different action type, the attribute, effector_action_type_id, should go > > in the RowSetMember table. I don't have any example though. > > OK, I think I'd be inclined to wait until we have some use cases for this. > Although the current schema lets us group effectors together, it doesn't let > us say (for example) that E1 interacts *directly* with T1 to phosphorylate > it, but that E1's active site is only exposed when E1 is bound to E2. In > other words, E1's role in the activity can be viewed as "primary", and E2's > role is secondary (in some sense) but all we can say in the schema is that > the Complex consisting of E1 and E2 interacts with T1 to phosphorylate it. > I think that the solution we have now is OK, but it only lets us represent > the overall action of the entire set of effectors. > > Jonathan > > -- > Jonathan Crabtree > Center for Bioinformatics, University of Pennsylvania > 1406 Blockley Hall, 423 Guardian Drive Philadelphia, PA 19104-6021 > 215-573-3115 > > ------------------------------------------------------- > This SF.NET email is sponsored by: Take your first step towards giving > your online business a competitive advantage. Test-drive a Thawte SSL > certificate - our easy online guide will show you how. Click here to get > started: http://ads.sourceforge.net/cgi-bin/redirect.pl?thaw0027en > _______________________________________________ > Gusdev-gusdev mailing list > Gus...@li... > https://lists.sourceforge.net/lists/listinfo/gusdev-gusdev -- Joan Mazzarelli Computational Biology and Informatics Laboratory Center for Bioinformatics 1429 Blockley Hall University of Pennsylvania Philadelphia, PA 19104 |
From: Jonathan C. <cra...@pc...> - 2003-01-15 19:18:43
Attachments:
gus.phenotype_draft.doc
|
Hi Joan- Arnaud did supply us with documentation (attached) for the new Phenotype tables, but I just haven't loaded it into the database yet (I've also been quite busy :)) I started working on updating the documentation a couple of days ago, but in the process discovered that there are some invalid rows in core.DatabaseDocumentation that should be corrected first. A query shows that there are 73 rows in this table that reference nonexistent columns in GUS 3.0. For the most part I think that these are relatively minor problems stemming from the fact that the schema has been updated more recently than the documentation. However, there are also a few rows that suggest we need to improve the plugin and/or procedure used to populate this table. For example, the following rows have spaces in the column name (attribute_name), probably because the input files were invalid and the plugin has no restrictions on the format of the attribute_name: DATABASE_DOCUMENTATION_ID ------------------------- ATTRIBUTE_NAME -------------------------------------------------------------------------------- 1419 bio_material_id fk to LabelledExtract view of BioMaterial 1103 bio_source_characteristic_id primary key 1120 treatment_id fk to Treatment DATABASE_DOCUMENTATION_ID ------------------------- ATTRIBUTE_NAME -------------------------------------------------------------------------------- 1374 review_status_id The identifer of the review status 1418 assay_id fk to Assay 1373 synonym_name The gene symbol 6 rows selected. Also, as an aside (and not a comment to you in particular), it strikes me that column "documentation" of the form "fk to Table X" and "Primary key" could be generated automatically from the schema. However, comments on foreign keys are useful if they identify the specific subclass (i.e. view) to which the reference is expected to link, or if they explain what the referenced value is used for (if not obvious). Anyway, since there are still some minor schema changes taking place, I think that next week might be a good time to worry about updating all the documentation, since the database will be locked down for the migration at that point anyway. As for the controlled vocabularies, I think you're right, and we should try to populate these as soon as we can, even if it will be an iterative process in some cases. Jonathan -- Jonathan Crabtree Center for Bioinformatics, University of Pennsylvania 1406 Blockley Hall, 423 Guardian Drive Philadelphia, PA 19104-6021 215-573-3115 |
From: mazz <ma...@sn...> - 2003-01-15 23:55:18
|
Hi Jonathan, Perhaps we can ask Matt to revisit his documentation plugin. There are probably additional changes he will have to make for its use with GUS30 now. Also, I can send Arnaud an example of the XML for a table. We can use the XML to populate the rows of the controlled vocabulary tables (ids, terms (names) and definitions (descriptions). Joan Jonathan Crabtree wrote: > Hi Joan- > > Arnaud did supply us with documentation (attached) for the new Phenotype tables, > but I just haven't loaded it into the database yet (I've also been quite busy :)) > I started working on updating the documentation a couple of days ago, but in the > process discovered that there are some invalid rows in core.DatabaseDocumentation > that should be corrected first. A query shows that there are 73 rows in this > table that reference nonexistent columns in GUS 3.0. For the most part I think > that these are relatively minor problems stemming from the fact that the schema > has been updated more recently than the documentation. However, there are also > a few rows that suggest we need to improve the plugin and/or procedure used to > populate this table. For example, the following rows have spaces in the column > name (attribute_name), probably because the input files were invalid and the plugin > has no restrictions on the format of the attribute_name: > > DATABASE_DOCUMENTATION_ID > ------------------------- > ATTRIBUTE_NAME > -------------------------------------------------------------------------------- > 1419 > bio_material_id fk to LabelledExtract view of BioMaterial > > 1103 > bio_source_characteristic_id primary key > > 1120 > treatment_id fk to Treatment > > DATABASE_DOCUMENTATION_ID > ------------------------- > ATTRIBUTE_NAME > -------------------------------------------------------------------------------- > 1374 > review_status_id The identifer of the review status > > 1418 > assay_id fk to Assay > > 1373 > synonym_name The gene symbol > > 6 rows selected. > > Also, as an aside (and not a comment to you in particular), it strikes me that > column "documentation" of the form "fk to Table X" and "Primary key" could be > generated automatically from the schema. However, comments on foreign keys > are useful if they identify the specific subclass (i.e. view) to which the > reference is expected to link, or if they explain what the referenced value is > used for (if not obvious). Anyway, since there are still some minor schema > changes taking place, I think that next week might be a good time to worry > about updating all the documentation, since the database will be locked down > for the migration at that point anyway. As for the controlled vocabularies, > I think you're right, and we should try to populate these as soon as we can, > even if it will be an iterative process in some cases. > > Jonathan > > -- > Jonathan Crabtree > Center for Bioinformatics, University of Pennsylvania > 1406 Blockley Hall, 423 Guardian Drive Philadelphia, PA 19104-6021 > 215-573-3115 > > ------------------------------------------------------------------------ > Name: gus.phenotype_draft.doc > gus.phenotype_draft.doc Type: Winword File (application/msword) > Encoding: base64 |
From: Arnaud K. <ax...@sa...> - 2003-01-16 15:16:21
|
Hi Joan I'll get the new controlled vocabularies ready for population. If you're planning to use the UpdateFromXML.pm plugin for populating GUS I should have examples. Regarding ComplexType it should be covered by GO component. Regarding InteractionType, we need to find a controlled vocabulary which I'm not aware of yet ! cheers Arnaud mazz wrote: >Hi Jonathan, > >Perhaps we can ask Matt to revisit his documentation plugin. There are probably >additional changes he will have to make for its use with GUS30 now. >Also, I can send Arnaud an example of the XML for a table. We can use the XML to >populate the rows of the controlled vocabulary tables (ids, terms (names) and >definitions (descriptions). > > >Joan > >Jonathan Crabtree wrote: > > > >>Hi Joan- >> >>Arnaud did supply us with documentation (attached) for the new Phenotype tables, >>but I just haven't loaded it into the database yet (I've also been quite busy :)) >>I started working on updating the documentation a couple of days ago, but in the >>process discovered that there are some invalid rows in core.DatabaseDocumentation >>that should be corrected first. A query shows that there are 73 rows in this >>table that reference nonexistent columns in GUS 3.0. For the most part I think >>that these are relatively minor problems stemming from the fact that the schema >>has been updated more recently than the documentation. However, there are also >>a few rows that suggest we need to improve the plugin and/or procedure used to >>populate this table. For example, the following rows have spaces in the column >>name (attribute_name), probably because the input files were invalid and the plugin >>has no restrictions on the format of the attribute_name: >> >>DATABASE_DOCUMENTATION_ID >>------------------------- >>ATTRIBUTE_NAME >>-------------------------------------------------------------------------------- >> 1419 >>bio_material_id fk to LabelledExtract view of BioMaterial >> >> 1103 >>bio_source_characteristic_id primary key >> >> 1120 >>treatment_id fk to Treatment >> >>DATABASE_DOCUMENTATION_ID >>------------------------- >>ATTRIBUTE_NAME >>-------------------------------------------------------------------------------- >> 1374 >>review_status_id The identifer of the review status >> >> 1418 >>assay_id fk to Assay >> >> 1373 >>synonym_name The gene symbol >> >>6 rows selected. >> >>Also, as an aside (and not a comment to you in particular), it strikes me that >>column "documentation" of the form "fk to Table X" and "Primary key" could be >>generated automatically from the schema. However, comments on foreign keys >>are useful if they identify the specific subclass (i.e. view) to which the >>reference is expected to link, or if they explain what the referenced value is >>used for (if not obvious). Anyway, since there are still some minor schema >>changes taking place, I think that next week might be a good time to worry >>about updating all the documentation, since the database will be locked down >>for the migration at that point anyway. As for the controlled vocabularies, >>I think you're right, and we should try to populate these as soon as we can, >>even if it will be an iterative process in some cases. >> >>Jonathan >> >>-- >>Jonathan Crabtree >>Center for Bioinformatics, University of Pennsylvania >>1406 Blockley Hall, 423 Guardian Drive Philadelphia, PA 19104-6021 >>215-573-3115 >> >> ------------------------------------------------------------------------ >> Name: gus.phenotype_draft.doc >> gus.phenotype_draft.doc Type: Winword File (application/msword) >> Encoding: base64 >> >> > > > |
From: mazz <ma...@sn...> - 2003-01-19 18:45:13
|
Hi Arnaud, Below is a sample of the XML for a table (e.g. Gene) the plugin will use. The controlled vocabulary table DoTS::EffectorActionType also needs to be populated. I will try to go though and make a list of the new controlled vocabulary tables. Tables such as geneCategory & rnaCategory are tables I created for my planned future annotation tasks. Joan <GUS::Model::DoTS::Gene> <gene_id>10288603</gene_id> <name>test</name> <review_status_id>1</review_status_id> <description>gene desc test</description> <reviewer_summary>test</reviewer_summary> </GUS::Model::DoTS::Gene> Arnaud Kerhornou wrote: Arnaud Kerhornou wrote: > Hi Joan > > I'll get the new controlled vocabularies ready for population. If > you're planning to use the UpdateFromXML.pm plugin for populating GUS > I should have examples. > > Regarding ComplexType it should be covered by GO component. > Regarding InteractionType, we need to find a controlled vocabulary > which I'm not aware of yet ! > > cheers > Arnaud > > mazz wrote: > >> Hi Jonathan, >> >> Perhaps we can ask Matt to revisit his documentation plugin. There >> are probably >> additional changes he will have to make for its use with GUS30 now. >> Also, I can send Arnaud an example of the XML for a table. We can >> use the XML to >> populate the rows of the controlled vocabulary tables (ids, terms >> (names) and >> definitions (descriptions). >> >> >> Joan >> >> Jonathan Crabtree wrote: >> >> >> > Hi Joan- >> > >> > Arnaud did supply us with documentation (attached) for the new >> > Phenotype tables, >> > but I just haven't loaded it into the database yet (I've also been >> > quite busy :)) >> > I started working on updating the documentation a couple of days >> > ago, but in the >> > process discovered that there are some invalid rows in >> > core.DatabaseDocumentation >> > that should be corrected first. A query shows that there are 73 >> > rows in this >> > table that reference nonexistent columns in GUS 3.0. For the most >> > part I think >> > that these are relatively minor problems stemming from the fact >> > that the schema >> > has been updated more recently than the documentation. However, >> > there are also >> > a few rows that suggest we need to improve the plugin and/or >> > procedure used to >> > populate this table. For example, the following rows have spaces >> > in the column >> > name (attribute_name), probably because the input files were >> > invalid and the plugin >> > has no restrictions on the format of the attribute_name: >> > >> > DATABASE_DOCUMENTATION_ID >> > ------------------------- >> > ATTRIBUTE_NAME >> > -------------------------------------------------------------------------------- >> > 1419 >> > bio_material_id fk to LabelledExtract view of BioMaterial >> > >> > 1103 >> > bio_source_characteristic_id primary key >> > >> > 1120 >> > treatment_id fk to Treatment >> > >> > DATABASE_DOCUMENTATION_ID >> > ------------------------- >> > ATTRIBUTE_NAME >> > -------------------------------------------------------------------------------- >> > 1374 >> > review_status_id The identifer of the review status >> > >> > 1418 >> > assay_id fk to Assay >> > >> > 1373 >> > synonym_name The gene symbol >> > >> > 6 rows selected. >> > >> > Also, as an aside (and not a comment to you in particular), it >> > strikes me that >> > column "documentation" of the form "fk to Table X" and "Primary >> > key" could be >> > generated automatically from the schema. However, comments on >> > foreign keys >> > are useful if they identify the specific subclass (i.e. view) to >> > which the >> > reference is expected to link, or if they explain what the >> > referenced value is >> > used for (if not obvious). Anyway, since there are still some >> > minor schema >> > changes taking place, I think that next week might be a good time >> > to worry >> > about updating all the documentation, since the database will be >> > locked down >> > for the migration at that point anyway. As for the controlled >> > vocabularies, >> > I think you're right, and we should try to populate these as soon >> > as we can, >> > even if it will be an iterative process in some cases. >> > >> > Jonathan >> > >> > -- >> > Jonathan Crabtree >> > Center for Bioinformatics, University of Pennsylvania >> > 1406 Blockley Hall, 423 Guardian Drive Philadelphia, PA 19104-6021 >> > 215-573-3115 >> > >> > >> > ------------------------------------------------------------------------ >> > Name: gus.phenotype_draft.doc >> > gus.phenotype_draft.doc Type: Winword File >> > (application/msword) >> > Encoding: base64 >> > |
From: Arnaud K. <ax...@sa...> - 2003-01-20 12:59:47
|
Hi Joan Thanks. Just a quick question, what is Model for ? <GUS::Model::DoTS::Gene> Arnaud mazz wrote: > Hi Arnaud, > > >Below is a sample of the XML for a table (e.g. Gene) the plugin will >use. >The controlled vocabulary table DoTS::EffectorActionType also needs to >be populated. > >I will try to go though and make a list of the new controlled vocabulary >tables. >Tables such as geneCategory & rnaCategory are tables I created for my >planned future annotation tasks. > > >Joan > ><GUS::Model::DoTS::Gene> > <gene_id>10288603</gene_id> > <name>test</name> > <review_status_id>1</review_status_id> > <description>gene desc test</description> > <reviewer_summary>test</reviewer_summary> ></GUS::Model::DoTS::Gene> > >Arnaud Kerhornou wrote: > >Arnaud Kerhornou wrote: > > > >>Hi Joan >> >>I'll get the new controlled vocabularies ready for population. If >>you're planning to use the UpdateFromXML.pm plugin for populating GUS >>I should have examples. >> >>Regarding ComplexType it should be covered by GO component. >>Regarding InteractionType, we need to find a controlled vocabulary >>which I'm not aware of yet ! >> >>cheers >>Arnaud >> >>mazz wrote: >> >> >> >>>Hi Jonathan, >>> >>>Perhaps we can ask Matt to revisit his documentation plugin. There >>>are probably >>>additional changes he will have to make for its use with GUS30 now. >>>Also, I can send Arnaud an example of the XML for a table. We can >>>use the XML to >>>populate the rows of the controlled vocabulary tables (ids, terms >>>(names) and >>>definitions (descriptions). >>> >>> >>>Joan >>> >>>Jonathan Crabtree wrote: >>> >>> >>> >>> >>>>Hi Joan- >>>> >>>>Arnaud did supply us with documentation (attached) for the new >>>>Phenotype tables, >>>>but I just haven't loaded it into the database yet (I've also been >>>>quite busy :)) >>>>I started working on updating the documentation a couple of days >>>>ago, but in the >>>>process discovered that there are some invalid rows in >>>>core.DatabaseDocumentation >>>>that should be corrected first. A query shows that there are 73 >>>>rows in this >>>>table that reference nonexistent columns in GUS 3.0. For the most >>>>part I think >>>>that these are relatively minor problems stemming from the fact >>>>that the schema >>>>has been updated more recently than the documentation. However, >>>>there are also >>>>a few rows that suggest we need to improve the plugin and/or >>>>procedure used to >>>>populate this table. For example, the following rows have spaces >>>>in the column >>>>name (attribute_name), probably because the input files were >>>>invalid and the plugin >>>>has no restrictions on the format of the attribute_name: >>>> >>>>DATABASE_DOCUMENTATION_ID >>>>------------------------- >>>>ATTRIBUTE_NAME >>>>-------------------------------------------------------------------------------- >>>> 1419 >>>>bio_material_id fk to LabelledExtract view of BioMaterial >>>> >>>> 1103 >>>>bio_source_characteristic_id primary key >>>> >>>> 1120 >>>>treatment_id fk to Treatment >>>> >>>>DATABASE_DOCUMENTATION_ID >>>>------------------------- >>>>ATTRIBUTE_NAME >>>>-------------------------------------------------------------------------------- >>>> 1374 >>>>review_status_id The identifer of the review status >>>> >>>> 1418 >>>>assay_id fk to Assay >>>> >>>> 1373 >>>>synonym_name The gene symbol >>>> >>>>6 rows selected. >>>> >>>>Also, as an aside (and not a comment to you in particular), it >>>>strikes me that >>>>column "documentation" of the form "fk to Table X" and "Primary >>>>key" could be >>>>generated automatically from the schema. However, comments on >>>>foreign keys >>>>are useful if they identify the specific subclass (i.e. view) to >>>>which the >>>>reference is expected to link, or if they explain what the >>>>referenced value is >>>>used for (if not obvious). Anyway, since there are still some >>>>minor schema >>>>changes taking place, I think that next week might be a good time >>>>to worry >>>>about updating all the documentation, since the database will be >>>>locked down >>>>for the migration at that point anyway. As for the controlled >>>>vocabularies, >>>>I think you're right, and we should try to populate these as soon >>>>as we can, >>>>even if it will be an iterative process in some cases. >>>> >>>>Jonathan >>>> >>>> >>>> |
From: mazz <ma...@sn...> - 2003-01-20 15:22:34
|
Dear Arnaud, Model is a directory of Steve's new CVS structure under which the DoTS table Objects (eg Gene) are found. I do not know why Steve named the directory Model. Joan Arnaud Kerhornou wrote: > Hi Joan > > Thanks. Just a quick question, what is Model for ? > > <GUS::Model::DoTS::Gene> > > Arnaud > > mazz wrote: > > > Hi Arnaud, > > > > > >Below is a sample of the XML for a table (e.g. Gene) the plugin will > >use. > >The controlled vocabulary table DoTS::EffectorActionType also needs to > >be populated. > > > >I will try to go though and make a list of the new controlled vocabulary > >tables. > >Tables such as geneCategory & rnaCategory are tables I created for my > >planned future annotation tasks. > > > > > >Joan > > > ><GUS::Model::DoTS::Gene> > > <gene_id>10288603</gene_id> > > <name>test</name> > > <review_status_id>1</review_status_id> > > <description>gene desc test</description> > > <reviewer_summary>test</reviewer_summary> > ></GUS::Model::DoTS::Gene> > > > >Arnaud Kerhornou wrote: > > > >Arnaud Kerhornou wrote: > > > > > > > >>Hi Joan > >> > >>I'll get the new controlled vocabularies ready for population. If > >>you're planning to use the UpdateFromXML.pm plugin for populating GUS > >>I should have examples. > >> > >>Regarding ComplexType it should be covered by GO component. > >>Regarding InteractionType, we need to find a controlled vocabulary > >>which I'm not aware of yet ! > >> > >>cheers > >>Arnaud > >> > >>mazz wrote: > >> > >> > >> > >>>Hi Jonathan, > >>> > >>>Perhaps we can ask Matt to revisit his documentation plugin. There > >>>are probably > >>>additional changes he will have to make for its use with GUS30 now. > >>>Also, I can send Arnaud an example of the XML for a table. We can > >>>use the XML to > >>>populate the rows of the controlled vocabulary tables (ids, terms > >>>(names) and > >>>definitions (descriptions). > >>> > >>> > >>>Joan > >>> > >>>Jonathan Crabtree wrote: > >>> > >>> > >>> > >>> > >>>>Hi Joan- > >>>> > >>>>Arnaud did supply us with documentation (attached) for the new > >>>>Phenotype tables, > >>>>but I just haven't loaded it into the database yet (I've also been > >>>>quite busy :)) > >>>>I started working on updating the documentation a couple of days > >>>>ago, but in the > >>>>process discovered that there are some invalid rows in > >>>>core.DatabaseDocumentation > >>>>that should be corrected first. A query shows that there are 73 > >>>>rows in this > >>>>table that reference nonexistent columns in GUS 3.0. For the most > >>>>part I think > >>>>that these are relatively minor problems stemming from the fact > >>>>that the schema > >>>>has been updated more recently than the documentation. However, > >>>>there are also > >>>>a few rows that suggest we need to improve the plugin and/or > >>>>procedure used to > >>>>populate this table. For example, the following rows have spaces > >>>>in the column > >>>>name (attribute_name), probably because the input files were > >>>>invalid and the plugin > >>>>has no restrictions on the format of the attribute_name: > >>>> > >>>>DATABASE_DOCUMENTATION_ID > >>>>------------------------- > >>>>ATTRIBUTE_NAME > >>>>-------------------------------------------------------------------------------- > >>>> 1419 > >>>>bio_material_id fk to LabelledExtract view of BioMaterial > >>>> > >>>> 1103 > >>>>bio_source_characteristic_id primary key > >>>> > >>>> 1120 > >>>>treatment_id fk to Treatment > >>>> > >>>>DATABASE_DOCUMENTATION_ID > >>>>------------------------- > >>>>ATTRIBUTE_NAME > >>>>-------------------------------------------------------------------------------- > >>>> 1374 > >>>>review_status_id The identifer of the review status > >>>> > >>>> 1418 > >>>>assay_id fk to Assay > >>>> > >>>> 1373 > >>>>synonym_name The gene symbol > >>>> > >>>>6 rows selected. > >>>> > >>>>Also, as an aside (and not a comment to you in particular), it > >>>>strikes me that > >>>>column "documentation" of the form "fk to Table X" and "Primary > >>>>key" could be > >>>>generated automatically from the schema. However, comments on > >>>>foreign keys > >>>>are useful if they identify the specific subclass (i.e. view) to > >>>>which the > >>>>reference is expected to link, or if they explain what the > >>>>referenced value is > >>>>used for (if not obvious). Anyway, since there are still some > >>>>minor schema > >>>>changes taking place, I think that next week might be a good time > >>>>to worry > >>>>about updating all the documentation, since the database will be > >>>>locked down > >>>>for the migration at that point anyway. As for the controlled > >>>>vocabularies, > >>>>I think you're right, and we should try to populate these as soon > >>>>as we can, > >>>>even if it will be an iterative process in some cases. > >>>> > >>>>Jonathan > >>>> > >>>> > >>>> > > ------------------------------------------------------- > This SF.NET email is sponsored by: FREE SSL Guide from Thawte > are you planning your Web Server Security? Click here to get a FREE > Thawte SSL guide and find the answers to all your SSL security issues. > http://ads.sourceforge.net/cgi-bin/redirect.pl?thaw0026en > _______________________________________________ > Gusdev-gusdev mailing list > Gus...@li... > https://lists.sourceforge.net/lists/listinfo/gusdev-gusdev |
From: Chris S. <sto...@pc...> - 2003-01-20 19:38:12
|
Dear Joan and Arnaud, The CVS structure should not come into the XML used by the plug-in. It is my understanding that only the actual schema of the structure: Database.Namespace.Table.Attribute should be used. Chris On Monday, January 20, 2003, at 10:23 AM, mazz wrote: > Dear Arnaud, > > Model is a directory of Steve's new CVS structure under which the DoTS > table Objects (eg > > Gene) are found. > I do not know why Steve named the directory Model. > > Joan > > Arnaud Kerhornou wrote: > >> Hi Joan >> >> Thanks. Just a quick question, what is Model for ? >> >> <GUS::Model::DoTS::Gene> >> >> Arnaud >> >> mazz wrote: >> >>> Hi Arnaud, >>> >>> >>> Below is a sample of the XML for a table (e.g. Gene) the plugin >>> will >>> use. >>> The controlled vocabulary table DoTS::EffectorActionType also needs >>> to >>> be populated. >>> >>> I will try to go though and make a list of the new controlled >>> vocabulary >>> tables. >>> Tables such as geneCategory & rnaCategory are tables I created for my >>> planned future annotation tasks. >>> >>> >>> Joan >>> >>> <GUS::Model::DoTS::Gene> >>> <gene_id>10288603</gene_id> >>> <name>test</name> >>> <review_status_id>1</review_status_id> >>> <description>gene desc test</description> >>> <reviewer_summary>test</reviewer_summary> >>> </GUS::Model::DoTS::Gene> >>> >>> Arnaud Kerhornou wrote: >>> >>> Arnaud Kerhornou wrote: >>> >>> >>> >>>> Hi Joan >>>> >>>> I'll get the new controlled vocabularies ready for population. If >>>> you're planning to use the UpdateFromXML.pm plugin for populating >>>> GUS >>>> I should have examples. >>>> >>>> Regarding ComplexType it should be covered by GO component. >>>> Regarding InteractionType, we need to find a controlled vocabulary >>>> which I'm not aware of yet ! >>>> >>>> cheers >>>> Arnaud >>>> >>>> mazz wrote: >>>> >>>> >>>> >>>>> Hi Jonathan, >>>>> >>>>> Perhaps we can ask Matt to revisit his documentation plugin. There >>>>> are probably >>>>> additional changes he will have to make for its use with GUS30 now. >>>>> Also, I can send Arnaud an example of the XML for a table. We can >>>>> use the XML to >>>>> populate the rows of the controlled vocabulary tables (ids, terms >>>>> (names) and >>>>> definitions (descriptions). >>>>> >>>>> >>>>> Joan >>>>> >>>>> Jonathan Crabtree wrote: >>>>> >>>>> >>>>> >>>>> >>>>>> Hi Joan- >>>>>> >>>>>> Arnaud did supply us with documentation (attached) for the new >>>>>> Phenotype tables, >>>>>> but I just haven't loaded it into the database yet (I've also been >>>>>> quite busy :)) >>>>>> I started working on updating the documentation a couple of days >>>>>> ago, but in the >>>>>> process discovered that there are some invalid rows in >>>>>> core.DatabaseDocumentation >>>>>> that should be corrected first. A query shows that there are 73 >>>>>> rows in this >>>>>> table that reference nonexistent columns in GUS 3.0. For the most >>>>>> part I think >>>>>> that these are relatively minor problems stemming from the fact >>>>>> that the schema >>>>>> has been updated more recently than the documentation. However, >>>>>> there are also >>>>>> a few rows that suggest we need to improve the plugin and/or >>>>>> procedure used to >>>>>> populate this table. For example, the following rows have spaces >>>>>> in the column >>>>>> name (attribute_name), probably because the input files were >>>>>> invalid and the plugin >>>>>> has no restrictions on the format of the attribute_name: >>>>>> >>>>>> DATABASE_DOCUMENTATION_ID >>>>>> ------------------------- >>>>>> ATTRIBUTE_NAME >>>>>> ------------------------------------------------------------------ >>>>>> -------------- >>>>>> 1419 >>>>>> bio_material_id fk to LabelledExtract view of BioMaterial >>>>>> >>>>>> 1103 >>>>>> bio_source_characteristic_id primary key >>>>>> >>>>>> 1120 >>>>>> treatment_id fk to Treatment >>>>>> >>>>>> DATABASE_DOCUMENTATION_ID >>>>>> ------------------------- >>>>>> ATTRIBUTE_NAME >>>>>> ------------------------------------------------------------------ >>>>>> -------------- >>>>>> 1374 >>>>>> review_status_id The identifer of the review status >>>>>> >>>>>> 1418 >>>>>> assay_id fk to Assay >>>>>> >>>>>> 1373 >>>>>> synonym_name The gene symbol >>>>>> >>>>>> 6 rows selected. >>>>>> >>>>>> Also, as an aside (and not a comment to you in particular), it >>>>>> strikes me that >>>>>> column "documentation" of the form "fk to Table X" and "Primary >>>>>> key" could be >>>>>> generated automatically from the schema. However, comments on >>>>>> foreign keys >>>>>> are useful if they identify the specific subclass (i.e. view) to >>>>>> which the >>>>>> reference is expected to link, or if they explain what the >>>>>> referenced value is >>>>>> used for (if not obvious). Anyway, since there are still some >>>>>> minor schema >>>>>> changes taking place, I think that next week might be a good time >>>>>> to worry >>>>>> about updating all the documentation, since the database will be >>>>>> locked down >>>>>> for the migration at that point anyway. As for the controlled >>>>>> vocabularies, >>>>>> I think you're right, and we should try to populate these as soon >>>>>> as we can, >>>>>> even if it will be an iterative process in some cases. >>>>>> >>>>>> Jonathan >>>>>> >>>>>> >>>>>> >> >> ------------------------------------------------------- >> This SF.NET email is sponsored by: FREE SSL Guide from Thawte >> are you planning your Web Server Security? Click here to get a FREE >> Thawte SSL guide and find the answers to all your SSL security >> issues. >> http://ads.sourceforge.net/cgi-bin/redirect.pl?thaw0026en >> _______________________________________________ >> Gusdev-gusdev mailing list >> Gus...@li... >> https://lists.sourceforge.net/lists/listinfo/gusdev-gusdev > > > > ------------------------------------------------------- > This SF.NET email is sponsored by: FREE SSL Guide from Thawte > are you planning your Web Server Security? Click here to get a FREE > Thawte SSL guide and find the answers to all your SSL security issues. > http://ads.sourceforge.net/cgi-bin/redirect.pl?thaw0026en > _______________________________________________ > Gusdev-gusdev mailing list > Gus...@li... > https://lists.sourceforge.net/lists/listinfo/gusdev-gusdev > |