1. Summary
  2. Files
  3. Support
  4. Report Spam
  5. Create account
  6. Log in

Talk:BioPAXRules

From biopax

(Difference between revisions)
Jump to: navigation, search
(How (whether) to implement auto-fix in SharedUnificationXrefRule?: new section)
(SharedUnificationXrefRule won't fix (I did not find a reliable/safe auto-fix solution; - let data providers fix this issue manually))
 
Line 49: Line 49:
--[[User:Rodche|Rodche]] 16:14, 5 November 2010 (UTC)
--[[User:Rodche|Rodche]] 16:14, 5 November 2010 (UTC)
-
 
-
== How (whether) to implement auto-fix in SharedUnificationXrefRule? ==
 
-
 
-
I am going to implement what's in the following java comment:
 
-
 
-
/**
 
-
  * Checks whether more than one objects have the same
 
-
  * UnificationXref, which violates {@link UnificationXref} semantics,
 
-
  * unless these objects are, in fact, equivalent
 
-
  * (despite their URIs are different)
 
-
  *
 
-
  * Although this rule is easy to check, auto-fixing is not that straightforward!
 
-
  *
 
-
  * We could replace the {@link UnificationXref} xref with new {@link RelationshipXref}
 
-
  * (with the same db/id) for a pair of parent objects, such as,
 
-
  * when the problematic unification xref is removed/replaced:
 
-
  * - have different/incompatible BioPAX type, - may fix (unsafe), but it may
 
-
  * be ok, e.g., for CVs;
 
-
  * - are equivalent, i.e., they were also equivalent before, -  do nothing;
 
-
  * other rules may merge them;
 
-
  * - are not equivalent but used to be equivalent, -  do fix for sure;
 
-
  * - are not equivalent, were not equivalent, but have the same type - fix (unsafe);
 
-
  * It becomes hard to tell how to fix if there are more that two parents
 
-
  * and various types (though in rare cases)...
 
-
  *
 
-
  * Therefore, an alternative, easy, and quite safe approach would be to consider
 
-
  * only those parents that have the same type and additional unification xrefs
 
-
  * (which by the way, is not good); and this class implements such "fix" (which,
 
-
  * e.g., tackles a situation when a data provides accidently added internal "group"
 
-
  * unification xref or "homology", "hit" to multiple protein references, which
 
-
  * come from different organisms or/and have also different UniProt AC unifications
 
-
  * xrefs there).
 
-
  *
 
-
  * @author rodche
 
-
  */
 
-
@Component public class SharedUnificationXrefRule extends AbstractRule<UnificationXref> {
 
-
//etc...
 
-
 
-
Any comments or ideas?
 
-
--[[User:Rodche|Rodche]] 23:41, 16 March 2012 (UTC)
 

Current revision as of 00:03, 19 March 2012

Contents

SequenceModificationVocabulary

In BioPAX L3, SequenceModificationVocabulary controlled vocabulary object can be value of the modificationType property of objects of ModificationFeature and CovalentBindingFeature classes, which are attached either to EntityReference (e.g., ProteinReference) using entityFeature property, or - to PhysicalEntity (e.g., Protein or DnaRegion) using feature or notFeature properties. So, there are different contexts for the SequenceModificationVocabulary, and different ontology terms apply.

Protein

Rule says: use terms from "biological feature" (is it MI:0252, under MI:0116 "feature type"?); biopax-validator v1.01 uses the following constraint: term names from MI:0118 and all its children; MI:0120 ("post translation modification" is now obsolete term) and all its children. We, probably, have to use MOD now (PSI-MOD), e.g., terms under MOD:01156 and MOD:01157, because of recent changes in PSI-MI ("post translation modification" terms made obsolete/deleted)

Nucleic Acid

Sequence Ontology (SO) is currently recommended; biopax-validator v1.01 uses the following constraint: term names from SO:1000132 and all its children; SO:0001059 and all its children.

PS:

http://www.obofoundry.org/

BioPAXRules#3.4

Rodche 20:43, 14 May 2010 (UTC)


Xref's idVersion

Currently, in order to "check" an xref's 'id' format (it does not try to check whether the identifier actually present in the database), where possible, the BioPAX Validator uses the regular expression from Miriam resource (http://www.ebi.ac.uk/miriam/main/export/; a standard for ext. references that slowly takes over others, incl. the terms from MI "database citation"...) So, validator warnings like: "...xref's id is Q5BJF6-3, whereas it should probably be just Q5BJF6, and idVersion property be used" simply mean - Q5BJF6-3 did not not match the Miriam's UniProt template:

<datatype id="MIR:00000005" pattern="^([A-N,R-Z][0-9][A-Z][A-Z, 0-9][A-Z, 0-9][0-9])|([O,P,Q][0-9][A-Z, 0-9][A-Z, 0-9][A-Z, 0-9][0-9])$"> <name>UniProt</name> etc...

That's why it is suggested to move the isoform part ("3") to the other BioPAX property, idVersion. There is no other standard way (in BioPAX L3) at the moment, and we may think of the idVersion as idIsoform is such cases, and this should work in most cases... Anyway, if you're certain about this, one can always ignore these validator warnings (which, probably, just shifts the decision point to users's integrating data from different BioPAX files.)

Other options would be:

  1. think of the idVersion as "idIsoform" (explained above :))
  2. suggest Miriam (UniProt) to modify the template (both http://www.uniprot.org/uniprot/Q5BJF6#Q5BJF6-3 and http://www.uniprot.org/uniprot/Q5BJF6-3 works...)
  3. modify BioPAX specification...

--Rodche 14:47, 5 November 2010 (UTC)


BioPAX Element ID

This is also discussed at

--Rodche 16:14, 5 November 2010 (UTC)

Personal tools