From: Michel D. <mic...@gm...> - 2011-04-22 13:48:49
|
Hi, Here are my comments regarding the SBML Level 3 Package proposal on annotation http://precedings.nature.com/documents/5610/version/1 A. syntax and semantics of containers and collections Containers (bag, seq, alt) specify *groups* those in which their members may be ordered (seq) or unordered (bag, alt), or that contain duplicates (bag, seq) or is unique (alt). Collections (list) specify *groups* that can only contain the specified members. So there are several criticisms in using this constructs for the SBML annotations. 0 - the relation is from the species to a group which has particular members. I don't believe this is really desireable because it both dilutes and confuses the semantics of the relation. 1 - these constructs are seldom (if at all) used in the linked data community. Their semantics are more amenable to being used to list items in forms or surveys, where you actually want to order list items (e.g. HTML ordered/unordered lists), or restrict the value choices (e.g. radio buttons). 2 - these constructs requires one to create a different kind of SPARQL query to get at the values. instead of asking for the value of subject, predicate or subject-predicate expression (e.g. :a :p ?y), one now has to ask for the member of a collection (e.g. using Jena :a :p [ rdfs:member ?y], or with no application-specific short cut it gets significantly more ugly). 3 - these constructs are not supported in OWL (they are elements of the syntax of the language, but not part of what modelers use) 4 - I don't think the cited examples are valid in the context of the stated intent. a. Figure 6 shows two annotations linked through the "is" property to a glucose species. Given the intended semantics of "is" and that ChEBI and KEGG are two different resources, it seems to be that what is actually meant is that this species corresponds to (represents) the physical entities denoted by the ChEBI and KEGG identifiers, and that these should not be disjoint types. I really see two distinct statements (in turtle format). :meta_glc bqbiol:is <urn:miriam:obo.chebi:CHEBI%3417234> :meta_glc bqbiol:is <urn:miriam:kegg.compound:C00234> b. Figure 7 shows the annotation of a calcium-calmodulin complex, in which the intent is to state that the complex is composed of calcium and that the complex is composed of calmodulin. the mereological nature of this statement basically doesn't require one to state either a conjunction (which would imply that the value is a member of both types) or disjunction (which would imply that the value is a member of one or the other types), but rather should be treated as a set of separate statements :cacam bqbiol:hasPart <urn:miriam:uniprot:P62158> :cacam bqbiol:hasPart <urn:miriam:kegg.compound:C00076> c. Figure 8 shows how bag elements can be separated, but i question the need to have bag involved at any level here. d. negative statements - the syntax for negative object assertions are provided by OWL2 B. predicates and qualifiers 1. "To satisfy RDF, predicates should be nouns," knowledge representation languages such as RDF/OWL are agnostic when it comes to the choice of the characters in a symbol, safe those that are reserved as elements of the language. The naming of entities is entirely up to the modeler and has no bearing on the interpretation by a tool over the data. Thus, the choice of nouns or verbs are entirely within the control of the modeler. IMHO, verb predicates are more accurate in the nature of the relation, and improve the quality of tools that want to work with (controlled) natural language expressions. Ultimately this is a style choice. 2. the list of new predicates provided in the appendix are not described, and hence I cannot offer my opinion on them as to their merit or whether they are instrinsically different relations. However, by looking at the names themselves, i doubt very much that this is the direction you actually benefit from. C. Provenance 1. Statements about attributes While i find the use of xpath expressions to identify parts of the XML element that one wants to refer to as being very necessary and interesting, I fear that the name may not be sufficiently unique and that in a large RDF graph of such annotations, they would get jumbled up. 2. Statements about statements The problem with the reification plan is that the assignment of internal identifiers (rdf:id) is never guaranteed to remain the same, and as such cannot be considered as linkable data. Thus annotations will necessarily have to be created and maintained in that same file, and this will preclude others from commenting on annotations. Another solution is to consider OWL2's annotation object, which is strongly typed to reify statements, and may be assigned a stable URI. Additional semantics of annotation 3. Options: In both 1 and 2 above, i might suggest the use of miriam identifiers for both models and their components. I might also suggest to investigate the provenance ontology [ http://trdf.sourceforge.net/provenance/ns.html] 3. n-ary relations (referred to as non-binary relations) n-ary relations are problematic for a large number of reasons including restriction on the number and type of relations to the decidability of reasoning. For these and other reasons, OWL2 maintains only binary object relations (and hence the created annotations would be incompatible with an OWL knowledge base), which then forces one to adopt more principled/modular patterns in the design of expressive and reason-able ontologies. Thus, I would recommend to think about the nature of the entities and the relations that hold between them. The use case is "Hexokinase 2 is modified by phosphoserine in position 158". First, the use case is badly worded, and I think it refers to either : 1 - there exists a variant of hexokinase 2 which contains a phosphorylated serine at position 158 (because surely hexokinase 2 and modified hexokinase 2 are two different kinds) 2 - there exists a process which phosphorylates hexokinase 2's serine at position 158 (and hence there is a regular hexokinase + phosphate as input and a phosphorylated hexokinase as output). In order to express (1), we might want to state (in turtle syntax) :hexokinase-2-PS158 rdf:type :protein; :is-variant-of :hexokinase-2; :has-proper-part [ rdf:type :phosphoserine; :has-attribute [ rdf:type :sequence-position; :has-value "158"^^xsd:int]] . in this way, we maintain the use of binary relations, and we now have new types which have relations that are appropriate to them. Thus modelling can be better controlled, and evolution of new types with additional restrictions follow along the design pattern. This pattern follows what we are doing with the Semanticscience Integrated Ontology (SIO) - http://code.google.com/p/semanticscience/wiki/SIO Hope the above is helpful in refining the proposal in so that it reflects more powerful design patterns for the RDF/OWL Semantic Web languages. m. -- Michel Dumontier Associate Professor of Bioinformatics Carleton University http://dumontierlab.com |