|
From: Don G. <gil...@cr...> - 2007-04-15 22:55:00
|
Jim, Here are a few of points to keep in mind in your studies of Chado structure and SO/SOFA relations: Existing chado databases are often structured by history and necessity more than current best ontology information, due to cost of updating data structures. FlyBase's chado structure over the last few years has not been current with SO ontologies and relations, in some cases. You observed> ".. sample data from Flybase it seems like it's common [for FlyBase data managers] to use part_of when mRNAs...". FlyBase also uses an obsolete SO term "so", and other historical terms. The mapping between SO's feature relations and Chado's feature tables is not one-to-one. SO should be biologically correct while Chado aims for computational usefulness. Storage of features among the various tables in the database is guided by database needs. However, use of vocabulary terms in a chado database should reflect the ontology schema rather than the database schema. On this, " I'm working on mapping various E. coli annotation sets to Chado.." it is helpful to follow existing examples, but not too far if they don't match current ontology relations. The main reason in following example databases would be to keep software working. Software *should be* adaptable to use different ontology relations. At one point Chado software had 5 or 6 different choices for what people called the SO or SOFA ontology in the CV table, and that wasn't enough. This may still be the case, and gives you a simple idea where lack of standard practices affect the software. The GMODTools chado output software I've written has various choices for configuring ontology terms and gene model structures for different database sources. E.g. yeast chado lacks mRNA features between gene and protein levels. -- Don Gilbert -- d.gilbert--bioinformatics--indiana-u--bloomington-in-47405 -- gil...@in...--http://marmot.bio.indiana.edu/ |