Menu

#66 Wrong Extraction for # resources

open-later
5
2012-03-17
2011-06-08
No

The extraction framework doesn't follow the hash links in the same Wikipedia page.

Example:
When you execute the following SPARQL query:

SELECT ?o WHERE {
<http://dbpedia.org/resource/Love_Actually> <http://dbpedia.org/ontology/starring> ?o
}

you get:
http://dbpedia.org/resource/%23Cast

However this resource breaks the validity of the DBpedia ontology, since the range of the dbpedia-owl:starring property should be a Person.
In fact, if you go to the corresponding Wikipedia page (http://en.wikipedia.org/wiki/Love_Actually) you will see that in the Infobox there is a hash link to the "Cast" section in the same page. There you can find a list of the actual actors.

A similar problem happens with the resource http://dbpedia.org/resource/11%2709%2201_September_11 .
Here a lot of objects refer to wrong resources, instead of the actual ones.

Talking with Pablo Mendes in the developer mailing list, he said the problem should be in this file: http://dbpedia.hg.sourceforge.net/hgweb/dbpedia/extraction_framework/file/dec26907888b/core/src/main/scala/org/dbpedia/extraction/mappings/InfoboxExtractor.scala.

Discussion

  • Max Jakob

    Max Jakob - 2011-06-08
    • assigned_to: nobody --> maxjakob
     
  • Max Jakob

    Max Jakob - 2011-08-02

    I made an intermediate fix: now the framework extracts
    res:Love_Actually ont:starring res:Love_Actually%23Cast
    instead of
    res:Love_Actually ont:starring res:%23Cast

    This has the advantage that http://dbpedia.org/resource/%23Cast if not the value of many movies.

     
  • Max Jakob

    Max Jakob - 2011-08-02
    • assigned_to: maxjakob --> nobody
     
  • Max Jakob

    Max Jakob - 2011-08-02

    The final solution would obviously be to navigate to the respective SectionNode and parse the list. The object has to be directly after the '* '. Note that this would require some re-factoring in the ObjectParser and the other data parsers.

     
  • Christopher Sahnwaldt

    • assigned_to: nobody --> jcsahnwaldt
    • status: open --> open-fixed
     
  • Christopher Sahnwaldt

    DBpedia URIs with %23 don't really make sense, do they? We'll see what we can do. Actually extracting lists from sections seems like a major effort for which probably won't have time.

     
  • Christopher Sahnwaldt

    • labels: 973128 --> 2975619
    • status: open-fixed --> open-later
     
  • Christopher Sahnwaldt

    • labels: 2975619 -->
     
  • Christopher Sahnwaldt

    • milestone: --> Mapping-based extractor
    • labels: --> Feature Request
     
  • Roberto Mirizzi

    Roberto Mirizzi - 2012-03-17

    I do agree with you on the fact that DBpedia hash URIs used in this way are not very informative.
    The main problem is that in Wikipedia more and more pages adopt this technique when the data is too big to be contained within an Infobox, then a hash link to another section in the same page is used.

     
MongoDB Logo MongoDB