Menu

#68 Replace inhouse RDF parsers and writers with OpenRDF Rio

open
nobody
None
5
2014-08-20
2012-06-15
No

As a first step to resolving the current issues with the inhouse RDF parsers and writers, I implemented a basic patch to allow any OpenRDF RDFHandler to be used to collect RDF triples representing an OWLOntology. This includes all of the RDFWriter implementations along with any custom RDFHandlers, such as StatementCollector that do not actually write their output to either an OutputStream or a Writer.

The patch can be viewed in the branch sesame-rio of my github repository [1]. It is based on top of the patches for bugs 3516742 (duplicated-rdf-classes branch) and 3521488 (service-provider-interface branch)

[1] https://github.com/ansell/owlapi/compare/service-provider-interface...sesame-rio

Discussion

  • Ignazio Palmisano

    I am thinking of setting up a branch to experiment with big API level changes, and this is definitely a candidate. I'll see what can be included in the regular API.

     
  • Peter Ansell

    Peter Ansell - 2012-06-16

    The logic in RioRenderer.render(RDFResource node) is broken. It does not emit some triples right now but I am not sure what the reason is yet.

     
  • Matthew Horridge

    Does this refer to the RDF/XML parser? If so, why does it need replacing?

     
  • Peter Ansell

    Peter Ansell - 2012-06-26

    The RDF/XML parser can stay if people wish to use it, but there would be no need to reimplement any RDF parsers or writers specifically for OWLAPI if there was a generic, pluggable, parser and writer interface. In my case I would like to do it for Rio, but once it is done for Rio it should be easy to port to any other RDF library that outputs RDF triples as it parses a document.

    It can only be a good thing to make it easier to maintain the RDF sections of OWLAPI. The current RDF/XML and Turtle parsers in OWLAPI are quite messy and prone to breakages. If anything changes in the near future with RDF-1.1 just around the corner and the Turtle specification going through change as it moves towards standardisation, it would be much easier to be able to offload that responsibility to one or more dedicated RDF libraries.

     
  • Peter Ansell

    Peter Ansell - 2012-06-26

    In addition, it would make it unnecessary to output statements to a serialisation. It would also make it unnecessary to import statements from a concrete serialisation, as Sesame Rio allows an RDFHandler to store the statements in memory instead of writing them out to an OutputStream or Writer. It also allows imports from statements that are already in memory, so there would be a two-way bridge between OWLAPI and the RDF library without going through a serialisation.

     
  • Matthew Horridge

    Thanks for the clarification. The RDF/XML parser was written by a 3rd party (originally from the KAON 1 tool suite), and, at the time, it was *significantly* faster than the alternatives. I wonder how it compares now.

    The outputting is rather messy - from memory, AbstractRDFTranslator (or whatever it's called) sort of does this, but it could be factored out much more nicely.

     
  • Peter Ansell

    Peter Ansell - 2012-06-28

    After running the test case through using the Rio Turtle and RDF/XML parsers, there seem to be a few bugs in the test suite, mostly surrounding the use of blank nodes in turtle. I ignored the tests that rely on testBlankNodes.ttl, testBlankNodes2.ttl as they don't seem to be valid Turtle documents. I also ignored the test that relies on testBlankNodesAssertions.ttl as it fails if the order of the blank nodes is serialised differently, which doesn't seem to be necessary from either the RDF or Turtle specs.

    There is also a known bug with the Rio RDF/XML parser where it trims whitespace at the start and end of literals, see http://www.openrdf.org/issues/browse/SES-879 , which causes the LiteralWithNewLineTestCase to fail

    There are only two failures that I am worried about because I don't understand them.

    For both RDF/XML and Turtle, the SWRLRuleTestCase adds a single extra axiom to the ontology, and it doesn't seem to mean anything.

    testRDFXML(org.semanticweb.owlapi.api.test.SWRLRuleTestCase): Add axiom: AnnotationAssertion(<http://www.w3.org/2003/11/swrl#Builtin> <http://www.owlapi#myBuiltIn> <http://www.w3.org/2003/11/swrl#Builtin>)(..)
    testTurtle(org.semanticweb.owlapi.api.test.SWRLRuleTestCase): Add axiom: AnnotationAssertion(<http://www.w3.org/2003/11/swrl#Builtin> <http://www.owlapi#myBuiltIn> <http://www.w3.org/2003/11/swrl#Builtin>)(..)

    The most worrying test that is failing is the failure to accurately interpret/parse the PrimerRDFXMLRoundTrippingTestCase. I added a new PrimerTurtleRoundTrippingTestCase based on the turtle version of the primer and it fails with the Sesame Turtle parser in the same way that the RDFXML test fails, so it is not necessary a Sesame Rio issue. There is a case where an error is ignored and the OWLRDFConsumer.getErrorEntity function is called to substitute an IRI for the error.

    The axiom that is stated by the RDF/XML test to be removed is the following, but this may partly a red herring as it recovers temporarily using getErrorEntity and doesn't actual fail to parse the document in either case :
    testRDFXML(org.semanticweb.owlapi.api.test.PrimerRDFXMLRoundTrippingTestCase): Rem axiom: DatatypeDefinition(<http://example.com/owl/families/majorAge> DataIntersectionOf(<http://org.semanticweb.owlapi/error#Error2> DataComplementOf(<http://example.com/owl/families/minorAge>) ))(..)

    The RDF/XML stack trace when it reaches the getErrorEntity function is:
    RioOWLRDFConsumerAdapter(OWLRDFConsumer).getErrorEntity(EntityType<E>) line: 1953
    RioOWLRDFConsumerAdapter(OWLRDFConsumer).generateAndLogParseError(EntityType<E>, IRI) line: 2014
    RioOWLRDFConsumerAdapter(OWLRDFConsumer).translateClassExpressionInternal(IRI) line: 1971
    RioOWLRDFConsumerAdapter(OWLRDFConsumer).translateClassExpression(IRI) line: 1943
    TPEquivalentClassHandler(AbstractTripleHandler).translateClassExpression(IRI) line: 140
    TPEquivalentClassHandler.translateEquivalentClasses(IRI, IRI, IRI) line: 110
    TPEquivalentClassHandler.handleTriple(IRI, IRI, IRI) line: 92
    RioOWLRDFConsumerAdapter(OWLRDFConsumer).handle(IRI, IRI, IRI) line: 1316
    OWLRDFConsumer$1.handleResourceTriple(IRI, IRI, IRI) line: 1455
    RioOWLRDFConsumerAdapter(OWLRDFConsumer).iterateResourceTriples(ResourceTripleIterator<E>) line: 2434
    RioOWLRDFConsumerAdapter(OWLRDFConsumer).endModel() line: 1453
    RioOWLRDFConsumerAdapter.endRDF() line: 64
    RDFXMLParser.parse(InputSource) line: 261
    RDFXMLParser.parse(InputStream, String) line: 209
    RioParserImpl.parse(OWLOntologyDocumentSource, OWLOntology, OWLOntologyLoaderConfiguration) line: 193
    ParsableOWLOntologyFactory.loadOWLOntology(OWLOntologyDocumentSource, OWLOntologyFactory$OWLOntologyCreationHandler, OWLOntologyLoaderConfiguration) line: 219
    OWLOntologyManagerImpl.loadOntology(IRI, OWLOntologyDocumentSource, OWLOntologyLoaderConfiguration) line: 746
    OWLOntologyManagerImpl.loadOntologyFromOntologyDocument(OWLOntologyDocumentSource) line: 694
    OWLOntologyManagerImpl.loadOntologyFromOntologyDocument(InputStream) line: 710
    PrimerRDFXMLRoundTrippingTestCase(AbstractFileRoundTrippingTestCase).createOntology() line: 68
    PrimerRDFXMLRoundTrippingTestCase(AbstractRoundTrippingTestCase).setUp() line: 74

    The slightly different turtle stack trace when it reaches the getErrorEntity function is:
    RioOWLRDFConsumerAdapter(OWLRDFConsumer).getErrorEntity(EntityType<E>) line: 1953
    RioOWLRDFConsumerAdapter(OWLRDFConsumer).generateAndLogParseError(EntityType<E>, IRI) line: 2014
    RioOWLRDFConsumerAdapter(OWLRDFConsumer).translateDataRange(IRI) line: 1843
    DataRangeListItemTranslator.translate(IRI) line: 64
    DataRangeListItemTranslator.translate(IRI) line: 50
    OptimisedListTranslator<O>.translateList(IRI, List<O>) line: 87
    OptimisedListTranslator<O>.translateList(IRI) line: 138
    OptimisedListTranslator<O>.translateToSet(IRI) line: 149
    RioOWLRDFConsumerAdapter(OWLRDFConsumer).translateToDataRangeSet(IRI) line: 2050
    RioOWLRDFConsumerAdapter(OWLRDFConsumer).translateDataRange(IRI) line: 1782
    TPEquivalentClassHandler.translateEquivalentDataRanges(IRI, IRI, IRI) line: 102
    TPEquivalentClassHandler.handleTriple(IRI, IRI, IRI) line: 95
    RioOWLRDFConsumerAdapter(OWLRDFConsumer).handle(IRI, IRI, IRI) line: 1316
    OWLRDFConsumer$1.handleResourceTriple(IRI, IRI, IRI) line: 1455
    RioOWLRDFConsumerAdapter(OWLRDFConsumer).iterateResourceTriples(ResourceTripleIterator<E>) line: 2434
    RioOWLRDFConsumerAdapter(OWLRDFConsumer).endModel() line: 1453
    RioOWLRDFConsumerAdapter.endRDF() line: 64
    TurtleParser.parse(Reader, String) line: 194
    RioParserImpl.parse(OWLOntologyDocumentSource, OWLOntology, OWLOntologyLoaderConfiguration) line: 189
    ParsableOWLOntologyFactory.loadOWLOntology(OWLOntologyDocumentSource, OWLOntologyFactory$OWLOntologyCreationHandler, OWLOntologyLoaderConfiguration) line: 219
    OWLOntologyManagerImpl.loadOntology(IRI, OWLOntologyDocumentSource, OWLOntologyLoaderConfiguration) line: 746
    OWLOntologyManagerImpl.loadOntologyFromOntologyDocument(OWLOntologyDocumentSource) line: 694
    PrimerTurtleRoundTrippingTestCase(AbstractOWLAPITestCase).roundTripOntology(OWLOntology, OWLOntologyFormat) line: 214
    PrimerTurtleRoundTrippingTestCase(AbstractRoundTrippingTestCase).testTurtle() line: 104
    PrimerTurtleRoundTrippingTestCase.testTurtle() line: 82

    Were there any changes necessary to the current RDF/XML and Turtle parsers to get the http://example.com/owl/families/majorAge section of the primer document to parse correctly?

     
  • Ignazio Palmisano

    I'll investigate this for 3.5

     
  • Peter Ansell

    Peter Ansell - 2012-10-10

    Thanks for looking into this for the next release. I will try to clean up this today and tommorrow along with the other substantial patches and rebase them each directly against the latest trunk.

    What would you like the strategy to be in regards to the necessary adding of a dependency on the sesame-model and sesame-rio-api modules? Would you prefer they only be attached to an owlapi-sesame module, or do you mind having those modules (which are quite small jar files physically) as a dependency for owlapi-api ?

    I will try to encourage the Sesame team into getting their next release into Maven Central. [1] However, until then we would need to manually deploy the modules we depend on ourselves to maven central as third-party artifacts [2].

    [1] http://www.openrdf.org/issues/browse/SES-875
    [2] https://docs.sonatype.org/display/Repository/Uploading+3rd-party+Artifacts+to+The+Central+Repository

     
  • Nobody/Anonymous

    Hello would you mind letting me know which hosting company you're working with? I've loaded your blog in 3 different internet browsers and I must say this blog loads a lot quicker then most. Can you suggest a good web hosting provider at a reasonable price? Thank you, I appreciate it!
    cheap north face http://ubbrkcpasp.blogbaker.com/2012/11/06/north-face-clearance-so-its-worthless

     
  • Nobody/Anonymous

    Hey! This post couldnt be written any better! Reading through this post reminds me of my previous room mate! He always kept talking about this. I will forward this write-up to him. Fairly certain he will have a good read. Thanks for sharing!
    <a href="http://www.iccup.com/dota/content/blogs/Pattern_heavenly_assistance_pertaining_to_ele.html" title="Design">Design</a>

     

Log in to post a comment.