From: Pablo M. <pab...@gm...> - 2011-09-27 11:40:27
|
Dimitris, Cool! Maybe we could test Sweble first as the new AbstractExtractor, since it seems to be the weakest link? If it works for that, then it could be gradually introduced in the core to substitute SimpleWikiParser. Alessio, if you take the challenge, please keep us updated about your progress on dbp...@li... (btw, let's move this discussion there?) Cheers, Pablo On Tue, Sep 27, 2011 at 1:19 PM, Dimitris Kontokostas <ji...@gm...>wrote: > Some more info for the (current) abstract extraction process... > You will have to install a local modified mediawiki and load the > wikipedia dumps (after you clean then with the script) > The detailed process is described here: > > http://dbpedia.hg.sourceforge.net/hgweb/dbpedia/dbpedia/file/945c24bdc54c/abstractExtraction > > > I will also dare to give another idea. The guys behind Sweble > > (http://sweble.org/) claim it is very thorough, and there seems to be a > lot > > of activity behind it. > > This could be a new approach to the framework, not only for abstracts, > but to replace the SimpleWikiParser. > I think the current parser is LL and maybe we could change to an LR > Parser to handle better recursive syntax. > I haven't checked at sweble yet, but we could look into it > > Cheers, > Dimitris > |