From: Dimitris K. <kon...@in...> - 2014-09-25 14:01:49
|
Hi Krzysztof, (adding the dev list in cc) This is something in my todo's for some time and can break in two cases: a) inner templates that can be mapped from the mappings wiki b) minor formatting templates For (a) : ATM we only extract information from top-level templates. what we can do is to change the mapping extractor and extract information from all templates that have a mapping I think this would work but we have to test and see any undesired output. The idea here is to map the inner templates in very high level classes (probably owl:Thing) in order not to break the typing mechanism. Something in this direction (not exactly) is the Authority control mapping [1], although this is not nested, it is mapped to dbo:Agent in order not to interfere with the possible main infobox template of its page that could be either a Person or an organization. This could also solve some of the problems we still have in the extraction of commons. In this case "templateProperty = 1" could be used as well as ConditionalMappings [2] I think this would give a very big boost in extracted data but not sure of any sideeffects Any opinions? For (b) : an easy option is to go at the code for very common templates, this is what we did with commons and some templates in English [3] but this does not scale much. To move this functionality in the mappings wiki will be difficult because each case is quite different but everything is possible ;) Cheers, Dimtiris [1] http://mappings.dbpedia.org/index.php/Mapping_en:Authority_control [2] http://mappings.dbpedia.org/index.php/Template:ConditionalMapping [3] https://github.com/jimkont/extraction-framework/blob/live_features/core/src/main/scala/org/dbpedia/extraction/config/transform/TemplateTransformConfig.scala#L76-115 On Wed, Sep 24, 2014 at 2:49 PM, Krzysztof Wecel <K....@ki...> wrote: > Hi Dimitris, > > can you advise on possibilities of extraction of nested templates in > DBpedia? > Details in my e-mail below. > > > Best regards, > Krzysztof > > > > -------- Original Message -------- > Subject: Re: DBpedia - sophisticated extraction > Date: Wed, 24 Sep 2014 12:35:29 +0200 > From: Alexandru Todor <to...@in...> > Reply-To: to...@in... > To: Krzysztof Wecel <K....@ki...> > References: <542...@ki...> > > > > Hi Krzysztof, > > DBpedia can't handle nested templates, it's been an issue for years. > Regarding the Commons extraction, you should ask this question on the > mailing list, maybe Dimitris has more info on this issue. > > Cheers, > Alexandru > > > On 09/24/2014 07:24 AM, Krzysztof Wecel wrote: > > Hi, > > > > I've quite a challenging template to extract. What I have found is that > > you somehow managed to overcome license extraction problem from Commons > > (mentioned by Dimitris), which looks similar to my problem. > > > > There are two issues: > > 1. I have an embedded template > > 2. The template is using positions of attributes, not names. > > > > For the second I assume one can use index instead of name: > > "templateProperty = 1" (though it does notseem to work) > > > > Please let me know if the following is possible to extract using current > > extraction framework: > > > > {{Super infobox > > |type = DK > > |country = PL > > {{Legend|red|Highway 5}} > > |points = > > {{ABC|ok|A|1|e=175}} > > {{ABC|ok|A|1|e=86}} > > {{ABC|ok|K|91}} > > {{XYZ|ok|WA|0|[[Oxygen]]|A|1|86}} > > ... > > }} > > > > > > > > Best regards, > > Krzysztof > > > > > > -- > Dr Krzysztof Wecel http://kie.ue.poznan.pl/en/member/krzysztof-wecel > Department of Information Systems > Poznan University of Economics, al. Niepodleglosci 10, 61-875 Poznan > K....@ki... Tel:+48(61)854-3632 Fax:+48(61)854-3633 > > -- Dimitris Kontokostas Department of Computer Science, University of Leipzig Research Group: http://aksw.org Homepage:http://aksw.org/DimitrisKontokostas |