|
From: Scott C. <ca...@cs...> - 2005-02-22 19:40:42
|
On Tue, 2005-02-22 at 11:00 -0800, Chris Mungall wrote: > > On Tue, 22 Feb 2005, Scott Cain wrote: > > > On Tue, 2005-02-22 at 08:13 -0800, Chris Mungall wrote: > > > > > > > > > > A cv.name should never be "Ad hoc" because there will be collisions > > > > > between cvs > > > > > > > > Fair enough. Would you prefer 'local'? > > > > > > Sorry, I was being cryptic. What I meant was, even these "ad hoc" > > > ontologies must have some kind of name that communicates the nature of the > > > cvterms within them. It seemed like you were planning on lumping all > > > ad-hoc ontologies together, which has a high likelihood of producing > > > collisions on cvterm unique keys. > > > > > > It sounds like "Ad hoc:synonym" is actually from a cv of property types > > > that can be attached using featureprop et al. This is very definitely not > > > an ad-hoc ontology, it's crucial that featureprop types have their own cv > > > and are defined > > > > Actually, I only have a few things in 'local'. They are things that are > > fundamental to making chado work (or the gff loader work), like score > > and synonym. There are several other ad hoc ontologies that have more > > There is nothing gff-specific about the concept of 'score'. There needs to > be an ontology of score types (possibly as part of a larger statistical > term cv). The score column in a gff maps to the most generic score term in > this ontology, since gff doesn't let you distinguish between score types. > > Clearly this ontology won't materialize overnight, so I suggest for now > you create an on-the-fly cv called "score" or "program_output", with one > term in it called "score". If anyone is up for it we can create a mini-obo > file with a few more specific terms in it. Well, the real problem with score is that it ought to map to somewhere in analysisfeature, but we can't do that because we don't know what kind of score it is. If we scrapped the current analysisfeature and replaced it with a table that has a score and score_type column (for example), this would work better, because then we could do as you suggest and just give GFF scores the most generic form of score, which would allow users to go back and alter the type if they know (or provide it on the command line when loading a given file). The problem with this is that, in order for it to stay normalized, the score would have to go in a new table (analysisfeature_prop?), since you could have multiple types of scores for a given analysisfeature (which is presumably why there are multiple columns in analysysfeature to begin with). > > > descriptive names like 'property type', and 'Statistical terms'. I > > think this arrangement makes as much sense as FlyBase chado's use of the > > synonym type ontology with exactly one term in it: synonym. > > GO has typed synonyms (exact, narrower_than, etc). All inherit from a > generic "synonym" which corresponds to the cvterm that is in the current > fb chado. I can generate an obo file of synonym types for you if you like. > > cvterm is really not intended as a dumping ground for homeless strings. > The whole point of using cvterm for things such as feature types as well > as things like GO is to allow the chado model to be extensible, > interoperable, well-defined etc. > > > Darn it, as soon as I wrote that last sentence, I was reminded of how > > it's not true: Lincoln was complaining to me a few months ago about > > WormBase's lack of typed synonyms. He said it would be nice to have > > 'GenBank synonym', 'Swissprot synonym', etc. I happily pointed out that > > chado could easily do that. > > I'm not sure if I see these as synonym types. We'd want a way to use the > db table here. Can you give an example of a GenBank synonym? Things such > as genbank qualifiers and genbank feature types would go in a genbank cv. Well, I'm not a WormBase person, but I am reasonably sure what he was referring to was the fact the they need to tie accession numbers from different databases to a given feature in WormBase. I think you are probably right about how to do it in chado, though, since those accessions should be tied to a feature via feature_dbxref. > > I'm convinced all these so-called ad-hoc CVs can be given homes that will > later mature into full-fledged well-defined stable CVs That is no doubt true--and my putting them in 'local' for the time being doesn't really cause any long term problems (like by early choices getting calcified in place--it seems to me that these terms can be fairly fluid for some time to come without much harm). > > > Scott > > > > > -- ------------------------------------------------------------------------ Scott Cain, Ph. D. ca...@cs... GMOD Coordinator (http://www.gmod.org/) 216-392-3087 Cold Spring Harbor Laboratory |