From: Angel P. <an...@ma...> - 2005-11-28 18:07:12
|
Hi all, It was my understanding a while back that GUS was to take an approach that would allow the best of both worlds. Namely, I thought we had agreed to add generic feature and qualifier tables for both NA and AASequences, allowing us to capture arbitrary feature sets while still promoting the "important" features to the named views. Yes, generic views are more chado-like and they pose a threat for over-use and duplication, but in practice I would rather capture all of the features from an external source and possibly duplicate some features than miss some. This sort of situation would arise, for example, when subsequent releases of the DB are loaded after a feature gets "promoted" As for whole sequence attributes, are either of these acceptable solutions? a) leaving locations blank b) creating a location that spans the entire sequence for sequence attributes If not, I don't mind a generic sequence attribute table. -Angel Chris Stoeckert wrote: > Hi Aaron, > see comments in line. > > On Nov 22, 2005, at 4:03 PM, am...@pc... wrote: > >> Quoting Chris Stoeckert <sto...@pc...>: >> >>> if these are overlooked attributes that really should be part of >>> any sequence then we would alter the base NASequence/ AASequence >>> tables. >> >> >> Some are, some aren't. The most notable ones to be included in base >> AASequence >> are min_molecular_weight and max_molecular_weight (just as you >> already use >> min_start and max_start to handle fuzzy locations). >> isoelectric_point also >> seems like a "should have". >> >> But "hydropathicity_gravy_score" and "aromaticity_score" are just >> the tip of a >> long list of esoteric attributes that creating a two-column view called >> DoTS.AASequenceAromaticity (aa_sequence_id, aromaticity_score) seems >> like >> overkill, and would lead to further proliferation of the schema. >> >>> The major argument I would see for weak typed tag values is that >>> there is a long and arbitrary list of such attributes but I'm not >>> convinced that this is the case. >> >> >> You're not convinced because you haven't seen a long and arbitrary >> list, or >> because you don't believe one could actually exist? > > > Both. And I should clarify that this applies to what groups would > commonly store not what you (or anyone) can imagine (i.e., capture > practical requirements) . This may reflect my ignorance of > "hydropathicity_gravy_score" and am happy to be enlightened about > why that is practically needed in the core GUS schema. Note of course > that any group can of course add whatever special attributes to their > GUS instance; it is where multiple groups need the same attributes > that we want to capture this in core GUS. The downside to a weakly > typed attribute table is the potential to have multiple entries for > the same thing unless a controlled vocabulary can be used. > >> On a related topic, how should we handle organellar/compartmental >> targetting >> predictions (i.e. not signalP predictions that are easily handled as >> locatable >> features, but rather output from such things as MitoPred and TargetP >> that again >> provide non-locatable "attributes" to an AASequence). One route >> would be to >> associate the GO component term via an IEA evidence code - but then >> where do >> the algorithm_id and score(s) go? Alternatively, these become yet more >> AASequence attributes ... > > > Actually, these are easily captured as AAFeatures which has a foreign > key to AASequence. Locatable features are specified with AALocation > with a foreign key to AAFeature. In fact, one can use > PredictedAAFeature. > > This does underscore a semantic distinction between predicted > features and sequence attributes. Attributes can be anything of > course, and as such has no constraint on what is put there (including > things that should be put elsewhere). > > Cheers, > Chris > >> Thanks, >> >> -Aaron >> >> >> >> >> ------------------------------------------------------- >> This SF.Net email is sponsored by the JBoss Inc. Get Certified Today >> Register for a JBoss Training Course. Free Certification Exam >> for All Training Attendees Through End of 2005. For more info visit: >> http://ads.osdn.com/?ad_id=7628&alloc_id=16845&op=click >> _______________________________________________ >> Gusdev-gusdev mailing list >> Gus...@li... >> https://lists.sourceforge.net/lists/listinfo/gusdev-gusdev > > > > > ------------------------------------------------------- > This SF.net email is sponsored by: Splunk Inc. Do you grep through log > files > for problems? Stop! Download the new AJAX search engine that makes > searching your log files as easy as surfing the web. DOWNLOAD SPLUNK! > http://ads.osdn.com/?ad_id=7637&alloc_id=16865&op=click > _______________________________________________ > Gusdev-gusdev mailing list > Gus...@li... > https://lists.sourceforge.net/lists/listinfo/gusdev-gusdev -- Angel Pizarro Director, Bioinformatics Facility Institute for Translational Medicine and Therapeutics University of Pennsylvania 806 BRB II/III 421 Curie Blvd. Philadelphia, PA 19104-6160 P: 215-573-3736 F: 215-573-9004 |