Obtaining RDF graph from HTML using ontology

jaxvy
2010-02-25
2013-05-13
  • jaxvy
    jaxvy
    2010-02-25

    Hi,

    I am new to Aperture and while I was reading the wiki on the rdf usage I learned that to extract metadata from a given HTML page aperture uses the NIE ontology framework. Therefore I was wondering if aperture provides the necessary set of API components to use a different ontology (of course in the same format/structure as NIE) to extract different metadata from  a given HTML page. My aim is to extract rdf triples from a given web page according to a given ontology. Any help is appreciated. Thanks.

     
  • Soh Guan Hoe
    Soh Guan Hoe
    2010-02-26

    Hi I cannot find the button to create a new topic so borrow your thread to tag on. I am also looking for the Javadoc for org.ontoware.*

    I also cannot find the source code for those org.ontoware.* Java classes. So does that mean this package is closed source ?

    Thanks.

     
  • Antoni Mylka
    Antoni Mylka
    2010-02-26

    1. Sorry, there is no way to customize which ontology is used. What you can do is extract the triples to an in-memory model, and later convert them to your ontology.

    If I had to do it quickly, I would hack a little piece of code that would create two hashmaps, in one hashmap, the keys would be NIE properties, and the values would be properties from your ontology. Then I would do something like

    for (URI key : map.keySet()) {
        ClosableIterator<? extends Statement> it = model.findStatements(Variable.ANY, key, Variable.ANY);
        while (it.hasNext()) {
            Statement st = it.next();
            resultModel.addStatement(st.getSubject(), map.get(key), model.getObject());
       }
    }

    … and another similar loop for classes. It would look for (?x, rdf:type, nie:WhateverClass) and insert (?x, rdf:type, youront:YourClass) into the target model.

    Such an approach is hacky, but it would probably work. Obviously the code example above is pseudocode and will require some cleanup before the compiler can accept it, but you should get the idea.

     
  • Soh Guan Hoe
    Soh Guan Hoe
    2010-03-02

    Hi thanks a lot for the Javadoc URL. I was wondering if Aperture has any intention to move from Sourceforge to Apache instead. Usually when one look for Open Source solution, Apache is one of the must visit website. Also I believe your project can maybe merge with Tika to deliver a truly Open Source Crawler and Extraction framework.

    So instead of building a good index and search for which Apache Lucene is doing a very good job already, Aperture can maybe have it's own top-level project in Apache with Tika developers contributing in.

    All in all, I am very satisfied with Aperture. I do hope the XML filetype detection feature is ready though as nowadays XML content are everywhere. This may already be available in Tika though I have yet to try out.