From: chris m. <cj...@fr...> - 2006-09-01 04:07:13
|
Just some additional context for other people - as I mentioned in a previous email, the NCBO is developing a database called OBD which will be the data counterpart to OBO. In its first incarnation we'll be focusing on linking model organism phenotype data to human diseases, but the overall scope of OBD will be much wider. One of the grand aims is seamless integration of disparate kinds of data spanning multiple domains and levels of granularity. Not a simple task, but a task which we hold will be simplified by the application of solid ontological principles. This means we have to be clear about exactly what in reality the rows and columns in our databases refer to - this is why Nicole is interested in test data that includes individual phenotype instantiations in individual organisms, and MOD data that covers phenotypes generalised over multiple individuals in multiple experiments. Much of what follows pertains specifically to OBD Comments below.. On Aug 31, 2006, at 3:12 PM, Nicole Washington wrote: > Hi Barry, > > Thanks for the feedback... > >>> >>> id= patient1001 >>> E= patient* ^ instance_of(DOID:my_favorite_disease) I think this is trying to overload pheno-syntax too much - see below >> there are two entities here: >> the patient, a certain individual, [E1] >> this patient has a certain disease, say diabetes [Q1] >> > >> Human persons are not equal to diseases. They have diseases (which >> are qualities, as e.g. elevated temperature is a quality). > > i was only considering diseases as entities, not as qualities, > since the > diseases are defined as having phenotypic qualities. are they both? We're very interested in the relationship between diseases and their phenotypes, but I think we have to be careful about _defining_ a disease by its phenotypes. Perhaps we should expand this to the obo-disease list? > so, wait, a disease is a quality? does that mean that the disease > ontology is another "ontology of qualities" like PATO is? good question. Either a disease is a kind of quality that inheres in an organism, or something similar, like a condition Either way, pheno-syntax is can be used to represent it. You aren't forced to break a complex quality down into the bearer entity and the quality, you can use a single ID from an ontology of complex pre- coordinated terms, such as DO or MP, like this. All 3 examples below are valid pheno-syntax representations of specific kinds of qualities : P= DOID:000001 /* osteoporosis -- fake ID from disease ontology, which doesn't actually have this term */ P= MP:0000001 /* osteoporosis -- fake ID from mammalian phenotype ontology */ E= FMA:Bone Q= PATO:0000001 /* low mass */ (the P tag isn't explained in the summary yet, but it's there in the grammar) > >> but there is also a further entity [E2], which is the instance of the >> diabetes (quality) this patient has >> >> >>> Q= PATO:variability >> >> can you define 'variability'? > > what prompted this discussion was the description in OMIM of a > particular > mutation leading to a disease which had "differential inter- and > intra- > famililal variability", and then they went on to describe this > variability, which usually was detailed as the age at which the > patients > died...some were young, some were middle-aged, but all were less- > than the > average population. I think there is a danger of conflating two kinds of variability - the variability of a particular quality in a particular individual - such as a certain patient's fluctuating temperature - and a statistical variability between individuals or between populations. The former is within the domain of an ontology of qualities like PATO, but we should have other ways of recording the latter. This is related to penetrance, but what we want to describe is something more general First let's look at how to represent the inidividual shortened lifespan of an individual organism, or the general shortened lifespan of a collection of like individuals (eg in a family). I would suggest: E= BP:living Q= PATO:arrested This may raise some eyebrows, as there is no GO organismal process that covers the ensemble of all processes that contribute to the activities of an organism from birth through death (ie living) - though perhaps there should be. > >> >> >>> E2= DOID:sudden_death >>> Q= age >>> M= 12 years >> >> what is M? > > measurement E= BP:living Q= PATO:arrested M= 12 years is fine > > >> >>> id= patient1002 >>> E= patient* ^ instance_of(DOID:my_favorite_disease) >>> Q= PATO:variability >>> E2= DOID:sudden_death >>> Q= age >>> M= 82 years >>> >>> here, the E2 elaborates what the variability is in the E. >>> >>> [*] note that "patient" is not a term in an ontology, but i'm >>> proposing >>> we >>> could create it (which ontology?). perhaps there might also be >>> an entity >>> "patient population" that could be used to describe a particular >>> fraction >>> of the population, which would be further elaborated on with the >>> "expressivity=" term. >> >> I am assuming that we annotate data about entities which are >> instances, hence each patient will have some unique instance >> identifiers. > > I think i was grappling with this a little bit... how to represent > patient1001 as an instance. based on what you said, patient1001 is an > instance of homo sapien. got it. yep. In the MOD world, poor individual flies and yeast cells generally don't warrant identifiers. Identifiers may be attached to instances of collections but these may not make it beyond the lab into the literature so they're not recorded in MOD databases. In a database like OBD that cares about the instances involved and the relationships between them we will assign internal database IDs. > >> I do not think that we need to add 'patient' to PATO >> (though we may need to add terms like 'hospitalization', 'treatment', >> and so on, to express qualities of relevant individual human beings > > i guess because i was thinking of "patient" as an entity, i didn't > think > it appropriate for PATO, but some other ontology. Humans are indeed entities, but pheno-syntax is really just for recording the entity that directly bears the quality - this entity is typically part_of (directly or indirectly) the whole organism, except in some cases where the bearer of the quality is the whole organism (eg whole organism, small) > >>> maybe there needs to be a relation "diagnosed_with", such that we >>> can >>> define: >> >> 'Diagnosed_with' is not an ontologically coherent relation -- there >> are many human beings diagnosed with diseases from which they do not >> in fact suffer, and many disease-instances in human beings which are >> never diagnosed. >> > > ok, i see that. OBD has a particular was of representing things such as diagnoses and evidence that separates the representation of the state of the world from the representation of the statements positing the state of the world, so that processes such as diagnoses and annotation don't get mixed up with biological processes. So we would represent relationships between entities like this: 1. John has_quality q999 2. q999 instance_of bighairitis 3. p53 functions_in DNA_repair And annotations and diagnoses like this: 3. Diagnosis10001 posits 1,2 4. Annotation20002 posits 3. (with additional provenance/evidence/experiments attached) > >>> patient1001 = instance_of^(patient^diagnosed_with >>> (my_favorite_disease)) >>> >> Instead of using terms like >> patient^diagnosed_with(my_favorite_disease), you should just use the >> disease terms themselves, and then rely on PATO to give you the >> has_quality link to the disease. >> Simpler is better. >> Barry > > so, if a disease can be a quality and/or an entity, can this be > accurately > portrayed in our graph representation? For these more complicated cases, it's actually easier to use the OBD internal graph representation (essentially a collection of nodes representing instances and types/kinds/universals, and a collection of links representing the relationships between them) > > i'm not sure if i accurately described my original problem. let me > try > again. > > usually when we talk about flies or fish or yeast we're talking about > large populations and not instances. actually, in that case we're talking about an instance of a population :) > however, there is often interesting > variability from person to person classified with the same > disease. how > do we capture that? Good question. The simplest way is to simply represent every individual organism (including patients). However, this may not be practical for a number of reasons. I think in what follows we're causing additional difficulties by trying to combine generalised phenotype to disease associations (insofar as we recognise a distinction) with representation of individual organisms > there is a disease "my_favorite_disease" that has various qualities > (big > hair, absent nose, extra toes). so what exactly do we mean by having these qualities? Some of these may be accidental or causal - big hair may be a result of certain sociogeographical factors. For certain disorders such as polydactyly, the extra digits are a defining, necessary and sufficient condition. > anyone who is diagnosed with mutation1 of > this disease has these phenotypes. however, some doctor found it > interesting to note that his favorite patient, John, had 10 extra > toes, > whereas his other siblings only had 2 extra toes. some other doctor > noticed 5 unrelated cases where if a person had 8 extra toes, they > also > had 8 extra fingers. Ignoring trivial matters such as anonymisation these curious individuals and the amount of time someone would have to record this in an EHR or annotation, we could record this as follows: First we would have a representation of the disease "MFD", then a representation of the qualities "big hair" "extra toes" "no nose" (by post-coordinated EQ or pre-coordinated MP style phenotype qualities, it doesn't matter), and then links between them - could be ontological links, perhaps even causal links, or statistical associations. Then separately we would have IDs for John, any siblings, or individuals noted by other doctors; we would have a separate (internal) ID for each instantiation of MFD in each person, and links between the person instance ID and the disease instance ID. We'd have a link between the disease instance ID and the ID for MFD. We would also have separate internal IDs for each quality instantiation - ie an ID for the quality of having-10 extra toes, and links between John and his 10-extra-toes quality.. and so on. We can explicitly link each phenotype instance to the associations between the generalised qualities and MFD, representing some experts opinion that the former is evidence for the latter. Or we can also leave this sort of thing to the kind of advanced statistical mining we'd want to do on the database. Now if we're unwilling or unable to be so specific, we would just use some other mechanism for recording the variability in the occurrence of certain qualities in a certain subset of individuals (partitioned by their genotype, or by the presence of absence of a disease -- the former would be penetrance). > the question i'm wrestling with is how do we document these special > phenotypes for the special individuals diagnosed with > my_favorite_disease? > this is why i started asking about instances. i don't think we > necessarily want to attribue the special cases *directly* to the > class of > my-favorite-disease.... there might be something special about that > particular patient, unrelated to my-favorite-disease, that led him > to have > 10 toes (perhaps he was later discovered to have ten-toes- > disease).... so > i'm thinking we want these variabilities to be attributed to the > individual patients not the disease-as-a-whole. or do we want the > phenotypic variabilities attributed to instances of the disease rather > than instances of humans? I think to instances of humans, or instances of groups of humans cheers chris > > i hope i defined my dilemma better, but it might be more confusing. > > n > > > > > > ---------------------------------------------------------------------- > --- > Using Tomcat but need to do more? Need to support web services, > security? > Get stuff done quickly with pre-integrated technology to make your > job easier > Download IBM WebSphere Application Server v.1.0.1 based on Apache > Geronimo > http://sel.as-us.falkag.net/sel? > cmd=lnk&kid=120709&bid=263057&dat=121642 > _______________________________________________ > Obo-phenotype mailing list > Obo...@li... > https://lists.sourceforge.net/lists/listinfo/obo-phenotype > |