From: chris m. <cj...@fr...> - 2005-11-23 19:02:45
|
On Nov 23, 2005, at 8:10 AM, Scott Cain wrote: > Ben, > > I cc'ed this to the schema mailing list, both because I want my > response > archived and looked at by other people to make sure I am describing the > use of feature_relationship and featureloc correctly. > > Scott > > > On Wed, 2005-11-23 at 10:41 -0500, Ben Faga wrote: >> On Wed, 2005-11-23 at 10:13, Scott Cain wrote: >>> Hi Ben, >>> >>> I hope I can help, because you're not likely to get much response >>> from >>> the schema mailing list until after the holiday. >>> >>> I'm not sure how to answer the map question. The most obvious thing >>> is >>> via the feature_relationship table, but since the actual meaning of >>> the >>> word 'map' is not clear to me, I'm not sure f_r would work. The >>> relationships in f_r are typically 'part_of' (for gene/mRNA/exon), >>> but >>> could easily be something else. For instance, could a map set be a >>> feature like a chromosome? I don't know what a map set is, but it doesn't sound like a feature so it shouldn't go in the feature table are we talking cmap integration here? This sounds like a job for the neglected map module, which may need some work... >>> For chromosomes, denoting containment is a little different: you >>> don't >>> use f_r, but give the feature_id of the chromosome in the featureloc >>> table as featureloc.srcfeature_id. There is no reason that a given >>> feature can't have more than one featureloc entry to different >>> srcfeatures. You just give a different featureloc.rank to the >>> different >>> locations (with a rank of 0 being the 'standard' location, ie, on the >>> chromosome). You would use different locgroups, with locgroup>0 indicating a redundant localisation at a different level in the assembly (this is all in the schema SQL) >> You've confused me. It seems a little backwards to me. I would think >> that you would use the featureloc table for things like gene/mRNA/exon >> to place them on the sequence and the feature relationship table for >> correspondences. > > The featureloc table does describe how features are mapped to > chromosome, but merely having overlapping coordinates is not sufficient > for one feature to be related to another. For example, an exon could > lie within the boundaries of a gene and not belong to that gene > (because > it is part of another gene). The f_r table is used for defining what > is > part of what. this is correct. though in fact you could use f_r to store overlaps relationships between features, you would just provide a different relationship type. I don't think you'd ever want to do this however, as you can get the overlaps relation out dynamically using the featureloc table or any of the overlap views. > 'Correspondences' is a CMap concept that ties common features on > separate maps together, right? I don't think you would use > feature_relationship for that at all--I think it would be encapsulated > in featureloc (see below). >> >> Maybe because I confused you first. I'll start over. >> >> A chromosome is a map (or an assembly is a map), basically the base >> sequence is a map (in this case). There must be a way in Chado to >> group >> assemblies (or chromosomes) by their origin. >> >> For instance, let's say I have the human genome sequence from NCBI >> but I >> also have an old version of the UCSC genome from years ago. How to I >> query chado to give me only UCSC assemblies? > > Assuming your database is populated correctly (for whatever definition > of 'correctly' applies), this would be distinguished in the featureloc > table. One of those mappings would be the 'default' (lets say it is > the > NCBI mapping), and so the featureloc.rank would be 0 and the locgroup! > srcfeature_id would be the feature_id for the NCBI chromosome. Then, > the UCSC assembly would be some other rank and the srcfeature_id would > be the feature_id of the UCSC assembly. To get only UCSC assemblies, > you would make sure that the srcfeature_id is in the set of features_id > that are UCSC assembly feature_ids. This works, but in general isn't advised. With two distinct assemblies you are going to end up with different features. It makes more sense to simply have two versions of your features, one on each assembly, or you'll get into huge problems. This is a tricky issue, but as far as I'm aware no one is using chado to keep multiple versions of features on multiple versions of assemblies and/or assemblies from different sources. localisation at different levels of the assembly (eg contigs vs scaffolds vs chromosomes) is fine however - I believe TIGR do this > Is this any better? > >> >>> Finally, in featureloc, fmin is never greater than fmax. >>> featureloc.strand (-1,0,1) indicates direction. Yes even with >>> polypepties (though you could easily leave strand as 0 for a >>> polypeptide >>> that isn't mapped to a chromosome). I'm confused by this - if the polypeptide isn't located relative to anything, then there would be no featureloc row in which to indicate strand? we should be careful to differentiate between 'mapping' (which in the chado sense includes cytogenetic mapping, rh mapping, linkage mapping, but excludes sequence-based localization) and featurelocs >> That answers my question, thanks. >> >>> I hope that helps a little bit. >> >> It does. Thank you. >> >> mwz >>> >>> >>> On Wed, 2005-11-23 at 01:48 -0500, Ben Faga wrote: >>>> Hey Scott, >>>> >>>> I'm hoping that you can help me with a couple questions (or point >>>> me to >>>> the mailing list). >>>> >>>> In CMap we organize "maps" into "map sets", is there anything >>>> similar in >>>> Chado? How do you distinguish something like chromosomes from >>>> different >>>> assemblies? >>>> >>>> Also, in featureloc, can fmin be greater than fmax? Is "strand" >>>> how you >>>> store the direction (even with proteins)? strand is for direction. A protein with its location projected onto the genome has a directionality w.r.t that genome - though features located relative to a protein would all have +1 directionality >>>> >>>> Thanks, >>>> >>>> mwz >>>> >>>> >> > -- > ----------------------------------------------------------------------- > - > Scott Cain, Ph. D. > ca...@cs... > GMOD Coordinator (http://www.gmod.org/) > 216-392-3087 > Cold Spring Harbor Laboratory > > > > ------------------------------------------------------- > This SF.net email is sponsored by: Splunk Inc. Do you grep through log > files > for problems? Stop! Download the new AJAX search engine that makes > searching your log files as easy as surfing the web. DOWNLOAD SPLUNK! > http://ads.osdn.com/?ad_id=7637&alloc_id=16865&op=click > _______________________________________________ > Gmod-schema mailing list > Gmo...@li... > https://lists.sourceforge.net/lists/listinfo/gmod-schema |