The current implementation for extracting synonym tag information for an OBO entry does not conform to the OBO specification. Specificallym, there are 4 fields that define a synonym tag, the "value", the optional "scope", the options "type" and the optional "cross reference". The current implementation of the SynonymTagValueHandler only supports 3 fields. The CHEBI ontology in OBO format takes advantage of all 4 fields, and therefore the OBO parser fails to extract synonym information when all four fields are populated (e.g., the InChi designation for the compound)
The solution is to refine the SynonymTagValueHandler module by:
replacing the current regular expression with one that handles all four fields (groups):
Pattern valuePattern = Pattern.compile("\"([^\"]*)\"\\s*([^\\s]*)\\s*([^\\s]*)\\s*\\[([^\\]]*)\\]");
and then redefine the tag fields as follows:
private static final int VALUE_GROUP = 1;
private static final int SCOPE_GROUP = 2;
private static final int TYPE_GROUP = 3;
private static final int XREF_GROUP = 4;
The currently defined Type becomes the Scope. And the new Type is added as an additional AnnotationProperty to the OWL ontology.
Log in to post a comment.