From: <am...@pc...> - 2005-11-18 03:01:11
|
Where is the "canonically correct" place to store simple scalar attributes that pertain to an entire sequence (AA and/or NT), e.g.: pI (isoelectric point) codon volatility codon bias (measured in various ways: CAI, etc) dinucleotide bias (e.g. Karlin's delta*) My non-GUS "gut" instinct is that these should all reside in weakly-typed DoTS.(NA/AA)SequenceAttribute tables with very simple tag/value schemas; but I'm looking for enlightenment. Thanks, -Aaron |
From: Chris S. <sto...@pc...> - 2005-11-22 18:45:31
|
Hi Aaron, Don't think that you got an answer to this so here's my perspective. The canonically correct strongly-typed place would be to create a view of NASequence / AASequence that has the desired attributes for a specific purpose (and the view would be named as such). On the other hand if these are overlooked attributes that really should be part of any sequence then we would alter the base NASequence/ AASequence tables. The major argument I would see for weak typed tag values is that there is a long and arbitrary list of such attributes but I'm not convinced that this is the case. Are the examples you gave, attributes needed now? Are there others? Thanks, Chris On Nov 17, 2005, at 10:01 PM, am...@pc... wrote: > > Where is the "canonically correct" place to store simple scalar > attributes that > pertain to an entire sequence (AA and/or NT), e.g.: > > pI (isoelectric point) > codon volatility > codon bias (measured in various ways: CAI, etc) > dinucleotide bias (e.g. Karlin's delta*) > > My non-GUS "gut" instinct is that these should all reside in weakly- > typed > DoTS.(NA/AA)SequenceAttribute tables with very simple tag/value > schemas; but > I'm looking for enlightenment. > > Thanks, > > -Aaron > > > > ------------------------------------------------------- > This SF.Net email is sponsored by the JBoss Inc. Get Certified Today > Register for a JBoss Training Course. Free Certification Exam > for All Training Attendees Through End of 2005. For more info visit: > http://ads.osdn.com/?ad_id=7628&alloc_id=16845&op=click > _______________________________________________ > Gusdev-gusdev mailing list > Gus...@li... > https://lists.sourceforge.net/lists/listinfo/gusdev-gusdev |
From: <am...@pc...> - 2005-11-22 21:03:46
|
Quoting Chris Stoeckert <sto...@pc...>: > if these are overlooked attributes that really should be part of any > sequence then we would alter the base NASequence/ AASequence tables. Some are, some aren't. The most notable ones to be included in base AASequence are min_molecular_weight and max_molecular_weight (just as you already use min_start and max_start to handle fuzzy locations). isoelectric_point also seems like a "should have". But "hydropathicity_gravy_score" and "aromaticity_score" are just the tip of a long list of esoteric attributes that creating a two-column view called DoTS.AASequenceAromaticity (aa_sequence_id, aromaticity_score) seems like overkill, and would lead to further proliferation of the schema. > The major argument I would see for weak typed tag values is that > there is a long and arbitrary list of such attributes but I'm not > convinced that this is the case. You're not convinced because you haven't seen a long and arbitrary list, or because you don't believe one could actually exist? On a related topic, how should we handle organellar/compartmental targetting predictions (i.e. not signalP predictions that are easily handled as locatable features, but rather output from such things as MitoPred and TargetP that again provide non-locatable "attributes" to an AASequence). One route would be to associate the GO component term via an IEA evidence code - but then where do the algorithm_id and score(s) go? Alternatively, these become yet more AASequence attributes ... Thanks, -Aaron |
From: Chris S. <sto...@pc...> - 2005-11-27 23:31:49
|
Hi Aaron, see comments in line. On Nov 22, 2005, at 4:03 PM, am...@pc... wrote: > Quoting Chris Stoeckert <sto...@pc...>: > >> if these are overlooked attributes that really should be part of >> any sequence then we would alter the base NASequence/ AASequence >> tables. > > Some are, some aren't. The most notable ones to be included in > base AASequence > are min_molecular_weight and max_molecular_weight (just as you > already use > min_start and max_start to handle fuzzy locations). > isoelectric_point also > seems like a "should have". > > But "hydropathicity_gravy_score" and "aromaticity_score" are just > the tip of a > long list of esoteric attributes that creating a two-column view > called > DoTS.AASequenceAromaticity (aa_sequence_id, aromaticity_score) > seems like > overkill, and would lead to further proliferation of the schema. > >> The major argument I would see for weak typed tag values is that >> there is a long and arbitrary list of such attributes but I'm not >> convinced that this is the case. > > You're not convinced because you haven't seen a long and arbitrary > list, or > because you don't believe one could actually exist? Both. And I should clarify that this applies to what groups would commonly store not what you (or anyone) can imagine (i.e., capture practical requirements) . This may reflect my ignorance of "hydropathicity_gravy_score" and am happy to be enlightened about why that is practically needed in the core GUS schema. Note of course that any group can of course add whatever special attributes to their GUS instance; it is where multiple groups need the same attributes that we want to capture this in core GUS. The downside to a weakly typed attribute table is the potential to have multiple entries for the same thing unless a controlled vocabulary can be used. > On a related topic, how should we handle organellar/compartmental > targetting > predictions (i.e. not signalP predictions that are easily handled > as locatable > features, but rather output from such things as MitoPred and > TargetP that again > provide non-locatable "attributes" to an AASequence). One route > would be to > associate the GO component term via an IEA evidence code - but then > where do > the algorithm_id and score(s) go? Alternatively, these become yet > more > AASequence attributes ... Actually, these are easily captured as AAFeatures which has a foreign key to AASequence. Locatable features are specified with AALocation with a foreign key to AAFeature. In fact, one can use PredictedAAFeature. This does underscore a semantic distinction between predicted features and sequence attributes. Attributes can be anything of course, and as such has no constraint on what is put there (including things that should be put elsewhere). Cheers, Chris > Thanks, > > -Aaron > > > > > ------------------------------------------------------- > This SF.Net email is sponsored by the JBoss Inc. Get Certified Today > Register for a JBoss Training Course. Free Certification Exam > for All Training Attendees Through End of 2005. For more info visit: > http://ads.osdn.com/?ad_id=7628&alloc_id=16845&op=click > _______________________________________________ > Gusdev-gusdev mailing list > Gus...@li... > https://lists.sourceforge.net/lists/listinfo/gusdev-gusdev |
From: Angel P. <an...@ma...> - 2005-11-28 18:07:12
|
Hi all, It was my understanding a while back that GUS was to take an approach that would allow the best of both worlds. Namely, I thought we had agreed to add generic feature and qualifier tables for both NA and AASequences, allowing us to capture arbitrary feature sets while still promoting the "important" features to the named views. Yes, generic views are more chado-like and they pose a threat for over-use and duplication, but in practice I would rather capture all of the features from an external source and possibly duplicate some features than miss some. This sort of situation would arise, for example, when subsequent releases of the DB are loaded after a feature gets "promoted" As for whole sequence attributes, are either of these acceptable solutions? a) leaving locations blank b) creating a location that spans the entire sequence for sequence attributes If not, I don't mind a generic sequence attribute table. -Angel Chris Stoeckert wrote: > Hi Aaron, > see comments in line. > > On Nov 22, 2005, at 4:03 PM, am...@pc... wrote: > >> Quoting Chris Stoeckert <sto...@pc...>: >> >>> if these are overlooked attributes that really should be part of >>> any sequence then we would alter the base NASequence/ AASequence >>> tables. >> >> >> Some are, some aren't. The most notable ones to be included in base >> AASequence >> are min_molecular_weight and max_molecular_weight (just as you >> already use >> min_start and max_start to handle fuzzy locations). >> isoelectric_point also >> seems like a "should have". >> >> But "hydropathicity_gravy_score" and "aromaticity_score" are just >> the tip of a >> long list of esoteric attributes that creating a two-column view called >> DoTS.AASequenceAromaticity (aa_sequence_id, aromaticity_score) seems >> like >> overkill, and would lead to further proliferation of the schema. >> >>> The major argument I would see for weak typed tag values is that >>> there is a long and arbitrary list of such attributes but I'm not >>> convinced that this is the case. >> >> >> You're not convinced because you haven't seen a long and arbitrary >> list, or >> because you don't believe one could actually exist? > > > Both. And I should clarify that this applies to what groups would > commonly store not what you (or anyone) can imagine (i.e., capture > practical requirements) . This may reflect my ignorance of > "hydropathicity_gravy_score" and am happy to be enlightened about > why that is practically needed in the core GUS schema. Note of course > that any group can of course add whatever special attributes to their > GUS instance; it is where multiple groups need the same attributes > that we want to capture this in core GUS. The downside to a weakly > typed attribute table is the potential to have multiple entries for > the same thing unless a controlled vocabulary can be used. > >> On a related topic, how should we handle organellar/compartmental >> targetting >> predictions (i.e. not signalP predictions that are easily handled as >> locatable >> features, but rather output from such things as MitoPred and TargetP >> that again >> provide non-locatable "attributes" to an AASequence). One route >> would be to >> associate the GO component term via an IEA evidence code - but then >> where do >> the algorithm_id and score(s) go? Alternatively, these become yet more >> AASequence attributes ... > > > Actually, these are easily captured as AAFeatures which has a foreign > key to AASequence. Locatable features are specified with AALocation > with a foreign key to AAFeature. In fact, one can use > PredictedAAFeature. > > This does underscore a semantic distinction between predicted > features and sequence attributes. Attributes can be anything of > course, and as such has no constraint on what is put there (including > things that should be put elsewhere). > > Cheers, > Chris > >> Thanks, >> >> -Aaron >> >> >> >> >> ------------------------------------------------------- >> This SF.Net email is sponsored by the JBoss Inc. Get Certified Today >> Register for a JBoss Training Course. Free Certification Exam >> for All Training Attendees Through End of 2005. For more info visit: >> http://ads.osdn.com/?ad_id=7628&alloc_id=16845&op=click >> _______________________________________________ >> Gusdev-gusdev mailing list >> Gus...@li... >> https://lists.sourceforge.net/lists/listinfo/gusdev-gusdev > > > > > ------------------------------------------------------- > This SF.net email is sponsored by: Splunk Inc. Do you grep through log > files > for problems? Stop! Download the new AJAX search engine that makes > searching your log files as easy as surfing the web. DOWNLOAD SPLUNK! > http://ads.osdn.com/?ad_id=7637&alloc_id=16865&op=click > _______________________________________________ > Gusdev-gusdev mailing list > Gus...@li... > https://lists.sourceforge.net/lists/listinfo/gusdev-gusdev -- Angel Pizarro Director, Bioinformatics Facility Institute for Translational Medicine and Therapeutics University of Pennsylvania 806 BRB II/III 421 Curie Blvd. Philadelphia, PA 19104-6160 P: 215-573-3736 F: 215-573-9004 |