Re: [GUSDEV] additional (AA/NA) sequence attributes

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Hi all,
It was my understanding a while back that GUS was to take an approach 
that would allow the best of both worlds.

Namely, I thought we had agreed to add generic feature and qualifier 
tables for both NA and AASequences, allowing us to capture arbitrary 
feature sets while still promoting the "important" features to the named 
views. Yes, generic views are more chado-like and they pose a threat for 
over-use and duplication, but in practice I would rather capture all of 
the features from an external source and possibly duplicate some 
features than miss some. This sort of situation would arise, for 
example, when subsequent releases of the DB are loaded after a feature 
gets "promoted"

As for whole sequence attributes, are either of these acceptable solutions?
a) leaving locations blank
b) creating a location that spans the entire sequence for sequence 
attributes

If not, I don't mind a generic sequence attribute table.
-Angel

Chris Stoeckert wrote:

> Hi Aaron,
> see comments in line.
>
> On Nov 22, 2005, at 4:03 PM, am...@pc... wrote:
>
>> Quoting Chris Stoeckert <sto...@pc...>:
>>
>>> if these are overlooked attributes that really should be part of   
>>> any sequence then we would alter the base NASequence/ AASequence   
>>> tables.
>>
>>
>> Some are, some aren't.  The most notable ones to be included in  base 
>> AASequence
>> are min_molecular_weight and max_molecular_weight (just as you  
>> already use
>> min_start and max_start to handle fuzzy locations).   
>> isoelectric_point also
>> seems like a "should have".
>>
>> But "hydropathicity_gravy_score" and "aromaticity_score" are just  
>> the tip of a
>> long list of esoteric attributes that creating a two-column view  called
>> DoTS.AASequenceAromaticity (aa_sequence_id, aromaticity_score)  seems 
>> like
>> overkill, and would lead to further proliferation of the schema.
>>
>>> The major argument I would see for weak typed tag values is  that  
>>> there is a long and arbitrary list of such attributes but I'm  not  
>>> convinced that this is the case.
>>
>>
>> You're not convinced because you haven't seen a long and arbitrary  
>> list, or
>> because you don't believe one could actually exist?
>
>
> Both. And I should clarify that this applies to what groups would  
> commonly store not what you (or anyone) can imagine (i.e., capture  
> practical requirements) . This may reflect my ignorance of  
> "hydropathicity_gravy_score"  and am happy to be enlightened about  
> why that is practically needed in the core GUS schema. Note of course  
> that any group can of course add whatever special attributes to their  
> GUS instance; it is where multiple groups need the same attributes  
> that we want to capture this in core GUS.  The downside to a weakly  
> typed attribute table is the potential to have multiple entries for  
> the same thing unless a controlled vocabulary can be used.
>
>> On a related topic, how should we handle organellar/compartmental  
>> targetting
>> predictions (i.e. not signalP predictions that are easily handled  as 
>> locatable
>> features, but rather output from such things as MitoPred and  TargetP 
>> that again
>> provide non-locatable "attributes" to an AASequence).  One route  
>> would be to
>> associate the GO component term via an IEA evidence code - but then  
>> where do
>> the algorithm_id and score(s) go?  Alternatively, these become yet  more
>> AASequence attributes ...
>
>
> Actually, these are easily captured as AAFeatures which has a foreign  
> key to AASequence. Locatable features are specified with AALocation  
> with a foreign key to AAFeature. In fact, one can use  
> PredictedAAFeature.
>
> This does underscore a semantic distinction between predicted  
> features and sequence attributes. Attributes can be anything of  
> course, and as such has no constraint on what is put there (including  
> things that should be put elsewhere).
>
> Cheers,
> Chris
>
>> Thanks,
>>
>> -Aaron
>>
>>
>>
>>
>> -------------------------------------------------------
>> This SF.Net email is sponsored by the JBoss Inc.  Get Certified Today
>> Register for a JBoss Training Course.  Free Certification Exam
>> for All Training Attendees Through End of 2005. For more info visit:
>> http://ads.osdn.com/?ad_id=7628&alloc_id=16845&op=click
>> _______________________________________________
>> Gusdev-gusdev mailing list
>> Gus...@li...
>> https://lists.sourceforge.net/lists/listinfo/gusdev-gusdev
>
>
>
>
> -------------------------------------------------------
> This SF.net email is sponsored by: Splunk Inc. Do you grep through log 
> files
> for problems?  Stop!  Download the new AJAX search engine that makes
> searching your log files as easy as surfing the  web.  DOWNLOAD SPLUNK!
> http://ads.osdn.com/?ad_id=7637&alloc_id=16865&op=click
> _______________________________________________
> Gusdev-gusdev mailing list
> Gus...@li...
> https://lists.sourceforge.net/lists/listinfo/gusdev-gusdev

-- 

Angel Pizarro
Director, Bioinformatics Facility
Institute for Translational Medicine and Therapeutics
University of Pennsylvania
806 BRB II/III
421 Curie Blvd.
Philadelphia, PA 19104-6160

P: 215-573-3736
F: 215-573-9004