Re: [Gmod-schema] Re: Chado Questions

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

On Nov 23, 2005, at 8:10 AM, Scott Cain wrote:

> Ben,
>
> I cc'ed this to the schema mailing list, both because I want my  
> response
> archived and looked at by other people to make sure I am describing the
> use of feature_relationship and featureloc correctly.
>
> Scott
>
>
> On Wed, 2005-11-23 at 10:41 -0500, Ben Faga wrote:
>> On Wed, 2005-11-23 at 10:13, Scott Cain wrote:
>>> Hi Ben,
>>>
>>> I hope I can help, because you're not likely to get much response  
>>> from
>>> the schema mailing list until after the holiday.
>>>
>>> I'm not sure how to answer the map question.  The most obvious thing  
>>> is
>>> via the feature_relationship table, but since the actual meaning of  
>>> the
>>> word 'map' is not clear to me, I'm not sure f_r would work.  The
>>> relationships in f_r are typically 'part_of' (for gene/mRNA/exon),  
>>> but
>>> could easily be something else.  For instance, could a map set be a
>>> feature like a chromosome?

I don't know what a map set is, but it doesn't sound like a feature so  
it shouldn't go in the feature table

are we talking cmap integration here? This sounds like a job for the  
neglected map module, which may need some work...

>>> For chromosomes, denoting containment is a little different: you  
>>> don't
>>> use f_r, but give the feature_id of the chromosome in the featureloc
>>> table as featureloc.srcfeature_id.  There is no reason that a given
>>> feature can't have more than one featureloc entry to different
>>> srcfeatures.  You just give a different featureloc.rank to the  
>>> different
>>> locations (with a rank of 0 being the 'standard' location, ie, on the
>>> chromosome).

You would use different locgroups, with locgroup>0 indicating a  
redundant localisation at a different level in the assembly
(this is all in the schema SQL)

>> You've confused me.  It seems a little backwards to me.  I would think
>> that you would use the featureloc table for things like gene/mRNA/exon
>> to place them on the sequence and the feature relationship table for
>> correspondences.
>
> The featureloc table does describe how features are mapped to
> chromosome, but merely having overlapping coordinates is not sufficient
> for one feature to be related to another.  For example, an exon could
> lie within the boundaries of a gene and not belong to that gene  
> (because
> it is part of another gene).  The f_r table is used for defining what  
> is
> part of what.

this is correct.

though in fact you could use f_r to store overlaps relationships  
between features, you would just provide a different relationship type.  
I don't think you'd ever want to do this however, as you can get the  
overlaps relation out dynamically using the featureloc table or any of  
the overlap views.

> 'Correspondences' is a CMap concept that ties common features on
> separate maps together, right?  I don't think you would use
> feature_relationship for that at all--I think it would be encapsulated
> in featureloc (see below).
>>
>> Maybe because I confused you first.  I'll start over.
>>
>> A chromosome is a map (or an assembly is a map), basically the base
>> sequence is a map (in this case).  There must be a way in Chado to  
>> group
>> assemblies (or chromosomes) by their origin.
>>
>> For instance, let's say I have the human genome sequence from NCBI  
>> but I
>> also have an old version of the UCSC genome from years ago.  How to I
>> query chado to give me only UCSC assemblies?
>
> Assuming your database is populated correctly (for whatever definition
> of 'correctly' applies), this would be distinguished in the featureloc
> table.  One of those mappings would be the 'default' (lets say it is  
> the
> NCBI mapping), and so the featureloc.rank would be 0 and the

locgroup!

> srcfeature_id would be the feature_id for the NCBI chromosome.  Then,
> the UCSC assembly would be some other rank and the srcfeature_id would
> be the feature_id of the UCSC assembly.  To get only UCSC assemblies,
> you would make sure that the srcfeature_id is in the set of features_id
> that are UCSC assembly feature_ids.

This works, but in general isn't advised. With two distinct assemblies  
you are going to end up with different features. It makes more sense to  
simply have two versions of your features, one on each assembly, or  
you'll get into huge problems. This is a tricky issue, but as far as  
I'm aware no one is using chado to keep multiple versions of features  
on multiple versions of assemblies and/or assemblies from different  
sources.

localisation at different levels of the assembly (eg contigs vs  
scaffolds vs chromosomes) is fine however - I believe TIGR do this

> Is this any better?
>
>>
>>> Finally, in featureloc, fmin is never greater than fmax.
>>> featureloc.strand (-1,0,1) indicates direction.  Yes even with
>>> polypepties (though you could easily leave strand as 0 for a  
>>> polypeptide
>>> that isn't mapped to a chromosome).

I'm confused by this - if the polypeptide isn't located relative to  
anything, then there would be no featureloc row in which to indicate  
strand?

we should be careful to differentiate between 'mapping' (which in the  
chado sense includes cytogenetic mapping, rh mapping, linkage mapping,  
but excludes sequence-based localization) and featurelocs

>> That answers my question, thanks.
>>
>>> I hope that helps a little bit.
>>
>> It does.  Thank you.
>>
>> mwz
>>>
>>>
>>> On Wed, 2005-11-23 at 01:48 -0500, Ben Faga wrote:
>>>> Hey Scott,
>>>>
>>>> I'm hoping that you can help me with a couple questions (or point  
>>>> me to
>>>> the mailing list).
>>>>
>>>> In CMap we organize "maps" into "map sets", is there anything  
>>>> similar in
>>>> Chado?  How do you distinguish something like chromosomes from  
>>>> different
>>>> assemblies?
>>>>
>>>> Also, in featureloc, can fmin be greater than fmax?  Is "strand"  
>>>> how you
>>>> store the direction (even with proteins)?

strand is for direction. A protein with its location projected onto the  
genome has a directionality w.r.t that genome - though features located  
relative to a protein would all have +1 directionality

>>>>
>>>> Thanks,
>>>>
>>>> mwz
>>>>
>>>>
>>
> --  
> ----------------------------------------------------------------------- 
> -
> Scott Cain, Ph. D.                                          
> ca...@cs...
> GMOD Coordinator (http://www.gmod.org/)                      
> 216-392-3087
> Cold Spring Harbor Laboratory
>
>
>
> -------------------------------------------------------
> This SF.net email is sponsored by: Splunk Inc. Do you grep through log  
> files
> for problems?  Stop!  Download the new AJAX search engine that makes
> searching your log files as easy as surfing the  web.  DOWNLOAD SPLUNK!
> http://ads.osdn.com/?ad_id=7637&alloc_id=16865&op=click
> _______________________________________________
> Gmod-schema mailing list
> Gmo...@li...
> https://lists.sourceforge.net/lists/listinfo/gmod-schema