From: Arnaud K. <ax...@sa...> - 2002-07-04 18:14:22
|
Hi Chris Sounds good. See below for some comments. Chris Stoeckert wrote: > Hi All, > Here are the phenotype tables I mentioned during the conference call. > I still need to look at the rest of the schema to see how best to link > it to gene sequences but welcome suggestions. We would use it to annotate alleles. I guess a new table is needed with the allele name, the degree of dominance, mutation data, complementation data ... > > Also, Angel has posted a picture of a proposed schema for capturing > mass spec data at: > http://www.cbil.upenn.edu/downloads/Proteomics/schema/ > Next steps are to have people look it over and try to fit available > data into it. > > Finally, I will write up my notes of our conference call and send to > Marie-Adele to edit and pass around (if she agrees ;-) ). > > Cheers, > Chris > > Phenotype: A linking table in the spirit of GOBO or PATO constructing > phenotypic descriptions from orthogonal ontologies using nounal phrases. > phenotype_id > pato_attribute_id > anatomy_id > cell_type_id can point to the Anatomy table > dev_stage_id > description varchar 255 text description of what the terms > represent in aggregate At the moment, as pombe, Trypanosoma and Leish are unicellular, the orthologous ontologies we would use are GO process and GO component, as well as the Parasite life cycle ontology being developed here by Mat. Berriman. > > PATOAttribute: A > pato_attribute_id > term_name varchar255 > term_synonym varchar 255 > version varchar 50 > parent_pato_attribute_id term_type varchar 20 "attribute", > "attribute value" So to summarize a phenotype is made from a set of ontology terms, there is a many to many relationship between each of these orthogonal ontologies and the Phenotype table. It also has a set of attributes, and each attribute is associated to a set of attribute values. cheers Arnaud Note this is based on Michael Ashburner's draft. ---------- On the representation of phenotypic data in genome databases - pato.ontology. A discussion paper. Version 2.0 22 February 2002. Michael Ashburner, FlyBase. pato: phenotype & trait ontology. This paper assumes some knowledge of the work of the Gene Ontology Consortium - see www.geneontology.org. A major conceptual and practical problem that faces all genomic databases is how to describe mutant phenotypes (I include in this term epigenetic phenotype due to RNAi, for example, and the more general class of traits, in the sense this word is used by plant and animal breeders). Some databases are, I know, struggling to find a solution to this problem, these include FlyBase, Gramene, the Mouse Genome Database, and others. Traditionally, phenotypes have been described as free text, for example in some FlyBase tables and in the Mouse Locus Catalog. This is great, but is obviously a poor solution if the data are to be efficiently analysed (e.g. by a search engine) computationally. I am not wholly convinced that we can ever get away from free text, at the very least as a "gloss" on a more systematic representation of these data. Nevertheless, the effort to represent these data systematically is very much worth while, and is again - I maintain - an effort that it is far better we do collaboratively, arriving at common standards, that independently. If this works it also has the consequence that rigorous queries across databases could be made (as now with GO). The core of my proposal is that phenotypic data can be represented as qualifications of descriptive nouns or nounal phrases. A typical descriptive noun would be "eye" or "leaf" or "wing development". For each class of such nouns there will be a finite (I hope) set of relevant attributes. For these examples attributes could include: shape, color, size; for "leaf" they could include: thickness, pilosity, for "wing development" they could include "abnormal". For each attribute there will be a finite (I hope) set of values. For the attribute color, for example, values would include: black, white, green. For each value there will be a finite (I hope) set of qualifiers, for green these may include dark, light. Using these three semantic classes we have, I think, the basis for the systematic description of phenotypes. The only extra need is to be able to describe the assay by means of which the phenotypes (i.e. values) were determined and the conditions, both environmental and genetic, under which the assay was performed. I will not consider assays and their conditions in this paper, since these are very much a concern of the MGED group and we should share their work (see www.mged.org). I see pato as being used to annotate (a) alleles and (b) "strains" or "lines" of an organism (e.g. for crop plants or posssibly mouse strains). I should say that I do not regard pato as the place for what (albeit loosely) we normally consider as attributes of alleles: e.g recessive, dominant, cold-sensitive, although pato should be designed to store phenotypic data that is conditional on an environmental condition (or background genotype). I am also aware that keeping pato in a flat file may be a challenge and that a more sophistical mechanism (?DAML+oil) may be needed. The purpose of sending this out at this preliminary stage is to get feedback, both at the conceptual and detailed levels. I thank John Walshaw, Suzi Lewis, Pankaj Jaiswal, Judy Blake and my Cambridge FlyBase curators (Rachel Drysdale, Gillian Milburn, Chihiro Yamada and Aubrey de Grey) for input & ideas. I have made _no_ attempt to make pato.ontology v.1.0 anything like complete - especially with respect to attribute values. We are experimenting now with re-casting some FlyBase annotation using pato.ontology, and will report. Please feel free to redistribute this to anyone who might give feedback. Distribution list at end of this file. If you never want to be bothered my me again, send me an email to that effect. Michael. The nouns/nounal phrases. ------------------------- These will be terms from an orthogonal ontology, for example the GO biological_process ontology, the GO cellular_component ontology or a species specific anatomy ontology (see go/anatomy for the Drosophila and Arabidopsis anatomy ontologies; mouse, worm and Monocot anatomies are on their way). The attributes. --------------- The hardest job will be to make a classification of attributes. A very first step is given below. We will then need a syntax for declaring for which nouns/nounal phrases a particular attribute (and its child attributes) can be used. For exampe attributes of size are of obviously relevant to anatomical structures, but not to a process such as "flight behavior". Their values. ------------- The second task is to determine the list of values any attribute may have. I have shown a few of these in pato.ontology. Their qualifiers. ----------------- Finally, we would need to determine the qualifiers that are relevant (allowed) for any particular attribute. Thus the qualifie fully would be relevant to a fertility attribute, but not to an attribute of size. I present first a "schema" of the relationships between these concepts: !antonym !antonym | | {mayhave} {has} | | concept <-{usedfor}--%attribute ---{has}---> <value ---{has}---> .qualifier | | {determinedby} | {mayhave} | | | units | assay ---{constrainedby}---> conditions ---{oftype}---> [MGED]environmental&|genetic attribute|value|qualifier ---{have}---> synonyms. concept is a concept (term) from another (orthogonal) ontology. In the draft pato.ontology: % is for attributes, < is for values of attributes |