Re: [GUSDEV] additional (AA/NA) sequence attributes

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Hi Aaron,
see comments in line.

On Nov 22, 2005, at 4:03 PM, am...@pc... wrote:

> Quoting Chris Stoeckert <sto...@pc...>:
>
>> if these are overlooked attributes that really should be part of   
>> any sequence then we would alter the base NASequence/ AASequence   
>> tables.
>
> Some are, some aren't.  The most notable ones to be included in  
> base AASequence
> are min_molecular_weight and max_molecular_weight (just as you  
> already use
> min_start and max_start to handle fuzzy locations).   
> isoelectric_point also
> seems like a "should have".
>
> But "hydropathicity_gravy_score" and "aromaticity_score" are just  
> the tip of a
> long list of esoteric attributes that creating a two-column view  
> called
> DoTS.AASequenceAromaticity (aa_sequence_id, aromaticity_score)  
> seems like
> overkill, and would lead to further proliferation of the schema.
>
>> The major argument I would see for weak typed tag values is  that  
>> there is a long and arbitrary list of such attributes but I'm  not  
>> convinced that this is the case.
>
> You're not convinced because you haven't seen a long and arbitrary  
> list, or
> because you don't believe one could actually exist?

Both. And I should clarify that this applies to what groups would  
commonly store not what you (or anyone) can imagine (i.e., capture  
practical requirements) . This may reflect my ignorance of  
"hydropathicity_gravy_score"  and am happy to be enlightened about  
why that is practically needed in the core GUS schema. Note of course  
that any group can of course add whatever special attributes to their  
GUS instance; it is where multiple groups need the same attributes  
that we want to capture this in core GUS.  The downside to a weakly  
typed attribute table is the potential to have multiple entries for  
the same thing unless a controlled vocabulary can be used.

> On a related topic, how should we handle organellar/compartmental  
> targetting
> predictions (i.e. not signalP predictions that are easily handled  
> as locatable
> features, but rather output from such things as MitoPred and  
> TargetP that again
> provide non-locatable "attributes" to an AASequence).  One route  
> would be to
> associate the GO component term via an IEA evidence code - but then  
> where do
> the algorithm_id and score(s) go?  Alternatively, these become yet  
> more
> AASequence attributes ...

Actually, these are easily captured as AAFeatures which has a foreign  
key to AASequence. Locatable features are specified with AALocation  
with a foreign key to AAFeature. In fact, one can use  
PredictedAAFeature.

This does underscore a semantic distinction between predicted  
features and sequence attributes. Attributes can be anything of  
course, and as such has no constraint on what is put there (including  
things that should be put elsewhere).

Cheers,
Chris

> Thanks,
>
> -Aaron
>
>
>
>
> -------------------------------------------------------
> This SF.Net email is sponsored by the JBoss Inc.  Get Certified Today
> Register for a JBoss Training Course.  Free Certification Exam
> for All Training Attendees Through End of 2005. For more info visit:
> http://ads.osdn.com/?ad_id=7628&alloc_id=16845&op=click
> _______________________________________________
> Gusdev-gusdev mailing list
> Gus...@li...
> https://lists.sourceforge.net/lists/listinfo/gusdev-gusdev