Menu

#53 how can I extract data of list pages from wikipedia?

Other extractors
open-accepted
2
2012-03-17
2011-02-03
No

what exactly I want to do is:

**input**: wikipedia xml dump

**output**: a list of triples like this:

<http://dbpedia.org/resource/Lists_of_computer_languages> <http://dbpedia.org/ontology/wikiListOf> <http://dbpedia.org/resource/C_(programming_language)> .

<http://dbpedia.org/resource/Lists_of_computer_languages> <http://dbpedia.org/ontology/wikiListOf> <http://dbpedia.org/resource/Java_(programming_language)> .

...

..

.

<http://dbpedia.org/resource/List_of_XML_markup_languages> <http://dbpedia.org/ontology/wikiListOf> <http://dbpedia.org/resource/AdsML> .

<http://dbpedia.org/resource/List_of_XML_markup_languages> <http://dbpedia.org/ontology/wikiListOf> <http://dbpedia.org/resource/Agricultural_Ontology_Service> .

We have already set up and customised dbpedia extraction framework but I think it would be difficult to configure the framework for extracting this data. I was shocked by the fact that extraction framework does not have any extractors for this !

Discussion

  • Max Jakob

    Max Jakob - 2011-06-10

    moved from Bugs to Feature Requests

     
  • Christopher Sahnwaldt

    We would have to extend TableMapping. Major effort, probably no time for that. Sorry.
    But it's open source, so if you find a way to do this, please submit a patch!
    Feel free to ask for help on dbpedia-discussion@lists.sourceforge.net

     
  • Christopher Sahnwaldt

    • priority: 5 --> 2
    • labels: --> 2978818
    • assigned_to: nobody --> jcsahnwaldt
    • status: open --> open-accepted
     
  • Christopher Sahnwaldt

    • milestone: --> 2702491
    • labels: 2978818 -->
     
  • Christopher Sahnwaldt

    • labels: --> 954756
     
  • Christopher Sahnwaldt

    • milestone: 2702491 --> Other extractors
    • labels: 954756 --> Feature Request
     
MongoDB Logo MongoDB