CSV summaries compatibility

Jairo
2012-02-06
2013-05-30
  • Jairo
    Jairo
    2012-02-06

    hello,

    I'm processing spanish dbpedia dumps, but there is not spanish sentence detection for spanish language for openNLP 1.5.
    So, I'm extracting csv data with Parse::MediaWikiDump in wikipedia miner 1.1
    Is it possible to use the csv files obtained with version 1.1 in wikipedia miner 1.2 with Berkeley database?

    Thanks in advance!,

    Jairo Sarabia

     
  • David Milne
    David Milne
    2012-02-26

    Hi Jairo,

    Looking at this list of models it seems as if people simply don't bother training separate tokenizers and sentence detectors for Spanish. That is really weird, because they train more complex things, like person taggers.  I can only guess that the English taggers work well enough that separate training isn't necessary? What happens if you just use the English models?