Re: [Gmod-schema] Re: unique dbxref_id on cvterm

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

On Tue, 22 Feb 2005, Scott Cain wrote:

> On Tue, 2005-02-22 at 11:00 -0800, Chris Mungall wrote:
> >
> > On Tue, 22 Feb 2005, Scott Cain wrote:
> >
> > > On Tue, 2005-02-22 at 08:13 -0800, Chris Mungall wrote:
> > > > > >
> > > > > > A cv.name should never be "Ad hoc" because there will be collisions
> > > > > > between cvs
> > > > >
> > > > > Fair enough.  Would you prefer 'local'?
> > > >
> > > > Sorry, I was being cryptic. What I meant was, even these "ad hoc"
> > > > ontologies must have some kind of name that communicates the nature of the
> > > > cvterms within them. It seemed like you were planning on lumping all
> > > > ad-hoc ontologies together, which has a high likelihood of producing
> > > > collisions on cvterm unique keys.
> > > >
> > > > It sounds like "Ad hoc:synonym" is actually from a cv of property types
> > > > that can be attached using featureprop et al. This is very definitely not
> > > > an ad-hoc ontology, it's crucial that featureprop types have their own cv
> > > > and are defined
> > >
> > > Actually, I only have a few things in 'local'.  They are things that are
> > > fundamental to making chado work (or the gff loader work), like score
> > > and synonym.  There are several other ad hoc ontologies that have more
> >
> > There is nothing gff-specific about the concept of 'score'. There needs to
> > be an ontology of score types (possibly as part of a larger statistical
> > term cv). The score column in a gff maps to the most generic score term in
> > this ontology, since gff doesn't let you distinguish between score types.
> >
> > Clearly this ontology won't materialize overnight, so I suggest for now
> > you create an on-the-fly cv called "score" or "program_output", with one
> > term in it called "score". If anyone is up for it we can create a mini-obo
> > file with a few more specific terms in it.
>
> Well, the real problem with score is that it ought to map to somewhere
> in analysisfeature, but we can't do that because we don't know what kind
> of score it is.  If we scrapped the current analysisfeature and replaced
> it with a table that has a score and score_type column (for example),
> this would work better, because then we could do as you suggest and just
> give GFF scores the most generic form of score, which would allow users
> to go back and alter the type if they know (or provide it on the command
> line when loading a given file).  The problem with this is that, in
> order for it to stay normalized, the score would have to go in a new
> table (analysisfeature_prop?), since you could have multiple types of
> scores for a given analysisfeature (which is presumably why there are
> multiple columns in analysysfeature to begin with).

The gffscore corresponds to analysisfeature.rawscore - as the docs state,
this is the "native" scoring system used by a program. There are no
semantics imposed on it. If you happen to know that the score type is a
bitscore, then you can populate this column too.

the idea is that the analysisfeature score types would map to suitable
upper-level terms in some yet-to-be-defined program/program output
ontology

the analysisfeature.score columns are something of an exception to the
whole chado design philosophy. To be consistent, we really should have
just used featureprops. However, the ability to have the scores available
as floats to sql queries is incredibly useful.

one could perhaps argue the same for other kinds of featureprop

another way we could have done this is to have had different featureprop
tables at the physical layer: featureprop_float, featureprop_int,
featureprop_text, etc. The logical layer would provide one relation:
featureprop - this would be a view or materialized view over the
underlying featureprop_<type> tables. Applications that required sql
ordering over numeric featureprop values and such could cut beneath the
main presentation layer and go to the optimisation layer. In fact there is
nothing to stop anyone instantiating their own chado in this way (they'd
have to write trigger code and views or view-materializers of course). It
could be implemented in the converse direction, with materialized views
such as featureprop_float over featureprop.

this was the way chado was always meant to be: a simple, generic top
layer, with other presentation layers available

given that the dbms code to do all this is still at the TODO/alpha stage
the analysisfeature.score columns are a reasonable if ugly compromise
between genericness and utility

> > > descriptive names like 'property type', and 'Statistical terms'.  I
> > > think this arrangement makes as much sense as FlyBase chado's use of the
> > > synonym type ontology with exactly one term in it: synonym.
> >
> > GO has typed synonyms (exact, narrower_than, etc). All inherit from a
> > generic "synonym" which corresponds to the cvterm that is in the current
> > fb chado. I can generate an obo file of synonym types for you if you like.
> >
> > cvterm is really not intended as a dumping ground for homeless strings.
> > The whole point of using cvterm for things such as feature types as well
> > as things like GO is to allow the chado model to be extensible,
> > interoperable, well-defined etc.
> >
> > > Darn it, as soon as I wrote that last sentence, I was reminded of how
> > > it's not true: Lincoln was complaining to me a few months ago about
> > > WormBase's lack of typed synonyms.  He said it would be nice to have
> > > 'GenBank synonym', 'Swissprot synonym', etc.  I happily pointed out that
> > > chado could easily do that.
> >
> > I'm not sure if I see these as synonym types. We'd want a way to use the
> > db table here. Can you give an example of a GenBank synonym? Things such
> > as genbank qualifiers and genbank feature types would go in a genbank cv.
>
> Well, I'm not a WormBase person, but I am reasonably sure what he was
> referring to was the fact the they need to tie accession numbers from
> different databases to a given feature in WormBase.  I think you are
> probably right about how to do it in chado, though, since those
> accessions should be tied to a feature via feature_dbxref.

yep

> > I'm convinced all these so-called ad-hoc CVs can be given homes that will
> > later mature into full-fledged well-defined stable CVs
>
> That is no doubt true--and my putting them in 'local' for the time being
> doesn't really cause any long term problems (like by early choices
> getting calcified in place--it seems to me that these terms can be
> fairly fluid for some time to come without much harm).

ok fair enough - but calcification can happen quicker than you'd imagine.
if i want to write code that generates a feature report including certain
featureprops, then i'd have to hardcode either the cv.name,cvterm.name or
the dbxref - before you know it you can't modify the cvterms without
fearing you'll break code...

i see the featureprop cvterms et al as the second schema layer. it's more
flexible than the main relational layer, but even so, you're changing the
schema if you change the featureprop cvterms

> > > Scott
> > >
> > >
> >
>