|
From: Jim Hu <ji...@ta...> - 2007-04-16 00:12:52
|
Hi Don, Thanks for the input. It sounds like I need to be studying SO/SOFA more than Flybase, then. Which is actually easier from my pov. Once we build ours, I'm sure the ontology will evolve away from how we model things and we'll have our own historical issues. But at least we'll start closer to level 2 compliance. I'll have to look at your GMODTools chado output software. Jim On Apr 15, 2007, at 6:54 PM, Don Gilbert wrote: > > Jim, > > Here are a few of points to keep in mind in your studies > of Chado structure and SO/SOFA relations: > > Existing chado databases are often structured by history and > necessity more than current best ontology information, due to cost of > updating data structures. FlyBase's chado structure over the last few > years has not been current with SO ontologies and relations, in some > cases. > > You observed> ".. sample data from Flybase it seems like it's common > [for FlyBase data managers] to use part_of when mRNAs...". FlyBase > also uses an obsolete SO term "so", and other historical terms. > > The mapping between SO's feature relations and Chado's feature > tables is not one-to-one. SO should be biologically correct while > Chado aims for computational usefulness. Storage of features > among the various tables in the database is guided by database needs. > However, use of vocabulary terms in a chado database should reflect > the > ontology schema rather than the database schema. > > On this, > " I'm working on mapping various E. coli annotation sets to Chado.." > > it is helpful to follow existing examples, but not too far if they > don't match current ontology relations. The main reason in following > example databases would be to keep software working. Software *should > be* adaptable to use different ontology relations. > > At one point Chado software had 5 or 6 different choices for what > people called the SO or SOFA ontology in the CV table, and that wasn't > enough. This may still be the case, and gives you a simple idea where > lack of standard practices affect the software. > > The GMODTools chado output software I've written has various > choices for configuring ontology terms and gene model structures > for different database sources. E.g. yeast chado lacks mRNA features > between gene and protein levels. > > -- Don Gilbert > -- d.gilbert--bioinformatics--indiana-u--bloomington-in-47405 > -- gil...@in...--http://marmot.bio.indiana.edu/ ===================================== Jim Hu Associate Professor Dept. of Biochemistry and Biophysics 2128 TAMU Texas A&M Univ. College Station, TX 77843-2128 979-862-4054 |