CSV summaries compatibility

  • Jairo

    Jairo - 2012-02-06


    I'm processing spanish dbpedia dumps, but there is not spanish sentence detection for spanish language for openNLP 1.5.
    So, I'm extracting csv data with Parse::MediaWikiDump in wikipedia miner 1.1
    Is it possible to use the csv files obtained with version 1.1 in wikipedia miner 1.2 with Berkeley database?

    Thanks in advance!,

    Jairo Sarabia

  • David Milne

    David Milne - 2012-02-26

    Hi Jairo,

    Looking at this list of models it seems as if people simply don't bother training separate tokenizers and sentence detectors for Spanish. That is really weird, because they train more complex things, like person taggers.  I can only guess that the English taggers work well enough that separate training isn't necessary? What happens if you just use the English models?


