From: chris m. <cj...@fr...> - 2005-11-23 20:41:38
|
Hi Ben Your description makes sense. I think we can have a fairly well-defined schema mapping. I'm thinking something like this: map = feature [where feature_id in select srcfeature_id from featureloc] mapfeature = feature JOIN featureloc mapset = map JOIN cvterm [using type_id] LEFT JOIN dbxref JOIN db under these definitions, you can have maps within maps (a feature can have identity as a mapfeature and a map) I'd say don't worry too much about the chado map module just now AFAIK noone has any data in it; it may be in need of some tweaking, and that may end up being influenced by cmap On Nov 23, 2005, at 12:21 PM, Ben Faga wrote: > Wow, I'm impressed by how quickly everyone responded. Thank you. > > Backing up and explaining what I'm doing, sounds like a good > suggestion. > > The idea is to write a script that pulls data from Chado and puts it > into the CMap schema (for use with CMap). Then, install postgres > triggers to keep the CMap schema updated. So, basically I'm looking > for > all the places in Chado where data that can be displayed in CMap is > stored. > > I know that this is going to get complicated since there isn't a direct > table to table correspondence between the two schemas. > > Here's a little background: > > Map: > In CMap, a map is anything that can be represented by a line. A map > can > be a sequence contig, an FPC, a genetic map. For that matter, it can > be > a gene, or a transcript or a protein (but no one currently uses it that > way). > > So in that respect, a I'll eventually have to dip my fingers into all > of > the corners of Chado to grab data. Right now though, I'm focusing on > the sequence module since that is the most mature (I think) and I know > how to use the loaders for it. > > Map Set: > In CMap, a map set is made up of "maps" with a common type (sequence, > fpc, etc) and some reason to group them such as being from the same > assembly or "experiment" (as Andrew put it). > > It might very well be likely that this will be defined in different > ways > for different things. For instance, Allen's mention of the > dbxref.version would work for many things but it might not work for > data > in the maps module, so I would have to create a different way to create > map sets from that module. > > Put another way, I could just put "maps" the things from the same > module > and with the same type in their own set. So, all rows in the feature > table with a cvterm of "contig" would get grouped in a set, unless they > have a dbxref with the appropriate data or unless they meet some other > criteria. > > Features: > CMap features are a completely different concept than that of Chado. A > cmap feature is basically a range on a "map" with some info. So, in > the > context of the sequence module, anything with a featureloc can be a > cmap > feature. > > This actually brings some interesting possible uses for cmap. If you > had protein info, such as domains, you could make the proteins into > "maps" and then set the domains as features and then display > corresponding proteins based on domain. I know there are better > programs for this but this is just a thought. > > Correspondences: > I believe Andrew has a correct view of correspondences in CMap. > Basically, it's a "this feature corresponds with this feature" format. > > That said, I'm not looking for a table to hold correspondences for me. > I'm looking for any place in chado where correspondence like > information > is stored. > > Well that's that on background. That was long. I'm surprised you are > still reading this. > > Thanks, > > Ben > > > On Wed, 2005-11-23 at 14:15, chris mungall wrote: >> On Nov 23, 2005, at 8:59 AM, Andrew D. Farmer wrote: >> >>> Hello- >>> I'm not an expert in Chado, but I've dabbled a little with it and >>> more >>> with >>> CMap, so I'm going to throw my 2c in for what they are worth. >>> >>> 1) On the "correspondence" question, I think that the primary >>> difference between >>> the two systems is that Chado sees the "feature" as a primary >>> entity that may >>> have many different locations in different coordinate systems >>> (maps); >>> that is, the featureloc contains the relationship of one >>> conceptual >>> entity to one or more different coordinate systems (srcfeature). >>> So, >>> for example, a pairwise alignment is represented as a single >>> feature >>> with one location on the query sequence and one location on the >>> background >>> sequence. >>> CMap, on the other hand, sees a feature as "belonging" to one and >>> only >>> one map (it has no true identity independent of the map); >>> features >>> on different maps are related by correspondences, as opposed to a >>> normalization of the feature into a single entity with multiple >>> locations. To represent a pairwise alignment in CMap you'd >>> probably use one >>> feature on the query, one feature on the background with a >>> correspondence >>> to link them. >> >> I think they are actually in quite close correspondence here. All >> chado >> features have zero or one primary featurelocs (with locgroup=0; >> multiple featurelocs differing only in rank are for alignments). >> >> You *could* add secondary featurelocs to other assemblies but this >> isn't recommended. >> >>> My sense is that the Chado approach is appropriate when the data >>> manager >>> knows a priori about the identity of the features and can control >>> their >>> normalization, whereas the CMap approach probably makes more >>> sense >>> when the >>> assertion of correspondences is a post hoc conjecture based on >>> something >>> like name-matching. >>> >>> 2) I'm not sure CMap's "Map set" concept has a clear analog in Chado; >>> the >>> "Map set" is essentially just a grouping of all the maps that have >>> been produced >>> from a single "experiment" (e.g. the linkage groups from a genetic >>> map >>> or >>> contigs from FPC physical map). So they don't ultimately resolve to a >>> single >>> coordinate system, they are distinct coordinate systems defined >>> within >>> the >>> context of a single application of a mapping protocol. If you wanted >>> to list >>> all chromosomes from the UCSC assembly in Chado, how would you do it? >>> Maybe >>> through all UCSC top level features (chromosomes) having a common >>> relationship to a single "publication"? >> >> If people really want to store multiple versions of an assembly or >> different assemblies of the same genome in one chado db, we need to >> come up with some Best Practices for the various scenarios that will >> arise. AFAIK nobody has needed this so far.. >> >> The main difference between chado and CMap is that map doesn't really >> correspond to anything outside the map module in Chado, so it follows >> that there will be no correspondence with map set. Chado featurelocs >> shouldn't be overloaded with non-sequence based localizations - Chado >> is only generic up to a point! >> >> Can we back up here and provide some context - is the goal here to >> interoperate between cmap and chado? >> >>> Does this help at all, muddy the waters further, or expose my total >>> misunderstanding of things?? >>> >>> Andrew Farmer >>> >>> On Wed, 23 Nov 2005, Scott Cain wrote: >>> >>>> Ben, >>>> >>>> I cc'ed this to the schema mailing list, both because I want my >>>> response >>>> archived and looked at by other people to make sure I am describing >>>> the >>>> use of feature_relationship and featureloc correctly. >>>> >>>> Scott >>>> >>>> >>>> On Wed, 2005-11-23 at 10:41 -0500, Ben Faga wrote: >>>>> On Wed, 2005-11-23 at 10:13, Scott Cain wrote: >>>>>> Hi Ben, >>>>>> >>>>>> I hope I can help, because you're not likely to get much response >>>>>> from >>>>>> the schema mailing list until after the holiday. >>>>>> >>>>>> I'm not sure how to answer the map question. The most obvious >>>>>> thing is >>>>>> via the feature_relationship table, but since the actual meaning >>>>>> of >>>>>> the >>>>>> word 'map' is not clear to me, I'm not sure f_r would work. The >>>>>> relationships in f_r are typically 'part_of' (for gene/mRNA/exon), >>>>>> but >>>>>> could easily be something else. For instance, could a map set be >>>>>> a >>>>>> feature like a chromosome? >>>>>> >>>>>> For chromosomes, denoting containment is a little different: you >>>>>> don't >>>>>> use f_r, but give the feature_id of the chromosome in the >>>>>> featureloc >>>>>> table as featureloc.srcfeature_id. There is no reason that a >>>>>> given >>>>>> feature can't have more than one featureloc entry to different >>>>>> srcfeatures. You just give a different featureloc.rank to the >>>>>> different >>>>>> locations (with a rank of 0 being the 'standard' location, ie, on >>>>>> the >>>>>> chromosome). >>>>> You've confused me. It seems a little backwards to me. I would >>>>> think >>>>> that you would use the featureloc table for things like >>>>> gene/mRNA/exon >>>>> to place them on the sequence and the feature relationship table >>>>> for >>>>> correspondences. >>>> >>>> The featureloc table does describe how features are mapped to >>>> chromosome, but merely having overlapping coordinates is not >>>> sufficient >>>> for one feature to be related to another. For example, an exon >>>> could >>>> lie within the boundaries of a gene and not belong to that gene >>>> (because >>>> it is part of another gene). The f_r table is used for defining >>>> what >>>> is >>>> part of what. >>>> >>>> 'Correspondences' is a CMap concept that ties common features on >>>> separate maps together, right? I don't think you would use >>>> feature_relationship for that at all--I think it would be >>>> encapsulated >>>> in featureloc (see below). >>>>> >>>>> Maybe because I confused you first. I'll start over. >>>>> >>>>> A chromosome is a map (or an assembly is a map), basically the base >>>>> sequence is a map (in this case). There must be a way in Chado to >>>>> group >>>>> assemblies (or chromosomes) by their origin. >>>>> >>>>> For instance, let's say I have the human genome sequence from NCBI >>>>> but I >>>>> also have an old version of the UCSC genome from years ago. How >>>>> to I >>>>> query chado to give me only UCSC assemblies? >>>> >>>> Assuming your database is populated correctly (for whatever >>>> definition >>>> of 'correctly' applies), this would be distinguished in the >>>> featureloc >>>> table. One of those mappings would be the 'default' (lets say it is >>>> the >>>> NCBI mapping), and so the featureloc.rank would be 0 and the >>>> srcfeature_id would be the feature_id for the NCBI chromosome. >>>> Then, >>>> the UCSC assembly would be some other rank and the srcfeature_id >>>> would >>>> be the feature_id of the UCSC assembly. To get only UCSC >>>> assemblies, >>>> you would make sure that the srcfeature_id is in the set of >>>> features_id >>>> that are UCSC assembly feature_ids. >>>> >>>> Is this any better? >>>> >>>>> >>>>>> Finally, in featureloc, fmin is never greater than fmax. >>>>>> featureloc.strand (-1,0,1) indicates direction. Yes even with >>>>>> polypepties (though you could easily leave strand as 0 for a >>>>>> polypeptide >>>>>> that isn't mapped to a chromosome). >>>>> That answers my question, thanks. >>>>> >>>>>> I hope that helps a little bit. >>>>> >>>>> It does. Thank you. >>>>> >>>>> mwz >>>>>> >>>>>> >>>>>> On Wed, 2005-11-23 at 01:48 -0500, Ben Faga wrote: >>>>>>> Hey Scott, >>>>>>> >>>>>>> I'm hoping that you can help me with a couple questions (or point >>>>>>> me to >>>>>>> the mailing list). >>>>>>> >>>>>>> In CMap we organize "maps" into "map sets", is there anything >>>>>>> similar in >>>>>>> Chado? How do you distinguish something like chromosomes from >>>>>>> different >>>>>>> assemblies? >>>>>>> >>>>>>> Also, in featureloc, can fmin be greater than fmax? Is "strand" >>>>>>> how you >>>>>>> store the direction (even with proteins)? >>>>>>> >>>>>>> Thanks, >>>>>>> >>>>>>> mwz >>>>>>> >>>>>>> >>>>> >>>> >>> >>> -- >>> >>> Andrew Farmer >>> ad...@nc... >>> (505) 995-4464 >>> Database Administrator/Software Developer >>> National Center for Genome Resources >>> >>> --- >>> "To live in the presence of great truths and eternal laws, >>> to be led by permanent ideals- >>> that is what keeps a man patient when the world ignores him, >>> and calm and unspoiled when the world praises him." >>> -Balzac >>> --- >>> >>> >>> >>> >>> ------------------------------------------------------- >>> This SF.net email is sponsored by: Splunk Inc. Do you grep through >>> log >>> files >>> for problems? Stop! Download the new AJAX search engine that makes >>> searching your log files as easy as surfing the web. DOWNLOAD >>> SPLUNK! >>> http://ads.osdn.com/?ad_id=7637&alloc_id=16865&op=click >>> _______________________________________________ >>> Gmod-schema mailing list >>> Gmo...@li... >>> https://lists.sourceforge.net/lists/listinfo/gmod-schema >> > |