From: Jennifer L. V. <ven...@st...> - 2017-07-19 22:44:23
|
Hi Ignazio, On Jul 15, 2017, at 1:45 AM, Ignazio Palmisano <ipa...@gm...<mailto:ipa...@gm...>> wrote: On 15 July 2017 at 00:36, Jennifer Leigh Vendetti <ven...@st...<mailto:ven...@st...>> wrote: Hi Ignazio, I’m working on an application [1] where we accept ontology submissions from end users and load them into an RDF store. Our process follows these steps: 1). User submits an ontology (in OWL, OBO, or SKOS format). 2). We load the ontology with version 4.3.1 of the OWL API to make sure there are no parsing errors. 3). We add some ontology annotations. 4). We save the ontology in RDF/XML format (using the OWL API). 5). We use the Raptor RDF parser [2] to serialize the ontology to ntriples in order to load the ontology data into 4store [3]. We have an ontology saved by the OWL API that Raptor fails to parse because it encounters what it claims to be an illegal blank node identifier: Illegal rdf:nodeID value '_:genid25' rapper: Failed to parse file As far as I can tell _: is the standard start of a blank node identifier, in RDF/XML as well as N-Triples. https://www.w3.org/TR/REC-rdf-syntax/#section-Nodes https://www.w3.org/TR/n-triples/#BNodes The ontology that was originally submitted to us doesn’t contain any blank nodes. The blank nodes are added only after we save the ontology with the OWL API. I’ve pared down the original ontology in order to have a test case (downloadable from here [4]). If you execute the following code with this ontology: File source = new File("ordo_orphanet_part.owl"); OWLOntologyManager manager = OWLManager.createOWLOntologyManager(); OWLOntology sourceOntology = manager.loadOntology(IRI.create(source)); File output = new File("out.xrdf"); manager.saveOntology(sourceOntology, new RDFXMLDocumentFormat(), IRI.create("file:" + output.getAbsolutePath())); … we end up with an axiom with a blank node identifier in the output file: <owl:Annotation> <owl:annotatedSource> <owl:Axiom rdf:nodeID="_:genid2"> <owl:annotatedSource rdf:resource="http://www.orpha.net/ORDO/Orphanet_10"/> <owl:annotatedProperty rdf:resource="http://www.geneontology.org/formats/oboInOwl#hasDbXref"/> <owl:annotatedTarget rdf:datatype="http://www.w3.org/2001/XMLSchema#string">ICD-10:Q98.8</owl:annotatedTarget> <obo:ECO_0000218 rdf:datatype="http://www.w3.org/2001/XMLSchema#string">Attributed</obo:ECO_0000218> </owl:Axiom> </owl:annotatedSource> <owl:annotatedProperty rdf:resource="http://purl.obolibrary.org/obo/ECO_0000218"/> <owl:annotatedTarget rdf:datatype="http://www.w3.org/2001/XMLSchema#string">Attributed</owl:annotatedTarget> <obo:ECO_0000218 rdf:datatype="http://www.w3.org/2001/XMLSchema#string">NTBT (narrower term maps to a broader term)</obo:ECO_0000218> </owl:Annotation> … whereas the original ontology doesn’t have this: <owl:Annotation> <owl:annotatedSource> <owl:Axiom> <owl:annotatedSource rdf:resource="http://www.orpha.net/ORDO/Orphanet_10"/> <owl:annotatedProperty rdf:resource="http://www.geneontology.org/formats/oboInOwl#hasDbXref"/> <owl:annotatedTarget rdf:datatype="http://www.w3.org/2001/XMLSchema#string">ICD-10:Q98.8</owl:annotatedTarget> <obo:ECO_0000218 rdf:datatype="http://www.w3.org/2001/XMLSchema#string">Attributed</obo:ECO_0000218> </owl:Axiom> </owl:annotatedSource> <owl:annotatedProperty rdf:resource="http://purl.obolibrary.org/obo/ECO_0000218"/> <owl:annotatedTarget rdf:datatype="http://www.w3.org/2001/XMLSchema#string">Attributed</owl:annotatedTarget> <obo:ECO_0000218 rdf:datatype="http://www.w3.org/2001/XMLSchema#string">NTBT (narrower term maps to a broader term)</obo:ECO_0000218> </owl:Annotation> The need for a blank node id arises from the need to identify the same axiom appearing multiple times - e.g., multiple annotations. This might not be essential in this ontology (I haven't looked into the specific case yet), but it's quite hard to tell when it can be safely skipped. Skipping it a priori causes very interesting bugs in ontologies with nested annotations. There currently is no setting to disable this output - there's only a setting to always output node ids, even when the node appears only once in the ontology. One possible workaround: OWLAPI has an NTriple output mode (through a Sesame writer) using NTriplesDocumentFormat as an output format. This would enable you to output directly to NTriples (I don't know if you've tried this already; if it has failed in some way I'd like to know which issues you found). I wrote some test code to output this ontology to ntriples using the NTriplesDocumentFormat, and It outputs without errors. I perhaps should have mentioned in my first message that we’re running into this problem in BioPortal with some regularity. We save ontologies in RDF/XML format with the OWL API and end up with files that contain blank nodes identifiers with “_:genidN”, where N is a number. Then our code for loading ontologies into our system fails since the rapper utility won’t process anything with this blank node identifier syntax. In all past cases, I’ve been able to find something invalid in the original ontology source file, e.g. invalid values entered for object properties, or something of the like. Once the invalidities were fixed, the occurrences of “_:genidN" went away. This is the first ontology however, where I can’t find anything about the OWL that seems invalid. Hence, I finally decided to post on the list and ask about whether or not this blank node identifier syntax is valid. From what I can tell, the Raptor RDF parser project is no longer active. I initially tried contacting them with regard to the blank node identifier syntax, but never received a response. At any rate, I will probably investigate the workaround you suggest above as one possibility for addressing the issues we’re having in BioPortal. Best, Jennifer Cheers, I. Trying to use Raptor to serialize the output file to ntriples results in the error I mentioned above: rapper --input rdfxml --output ntriples out.xrdf > data.triples rapper: Parsing URI file:///Users/jvendetti/Development/Examples/ontologies/ordo/11/out.xrdf with parser rdfxml rapper: Serializing with serializer ntriples rapper: Error - URI file:///Users/jvendetti/Development/Examples/ontologies/ordo/11/out.xrdf:166 - Illegal rdf:nodeID value '_:genid2' rapper: Failed to parse file out.xrdf rdfxml content rapper: Parsing returned 44 triples If I remove the ‘:’ character from the blank node identifier, Raptor will serialize the output file without errors. I wasn’t familiar with what the legal syntax is for blank node identifiers, so I went looking for documentation. I’ve had a hard time finding anything that indicates a standard for what the identifier should contain. Do you see any issue with the identifiers that the OWL API generates? At any rate, I’d like to find a way to get this ontology to load in our system. I’m wondering if you know of a way we could adjust our OWL API code to avoid generation of these blank nodes in our output file? Or, perhaps configure what characters the blank node identifiers can contain? Many thanks, Jennifer [1] http://bioportal.bioontology.org/ [2] http://librdf.org/raptor/rapper.html [3] https://github.com/4store/4store [4] https://stanford.box.com/s/uj5qia52y3fpy86v0cd7r9clon98o72d ------------------------------------------------------------------------------ Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org<http://slashdot.org/>! http://sdm.link/slashdot _______________________________________________ Owlapi-developer mailing list Owl...@li...<mailto:Owl...@li...> https://lists.sourceforge.net/lists/listinfo/owlapi-developer ------------------------------------------------------------------------------ Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org<http://slashdot.org/>! http://sdm.link/slashdot _______________________________________________ Owlapi-developer mailing list Owl...@li...<mailto:Owl...@li...> https://lists.sourceforge.net/lists/listinfo/owlapi-developer |