Menu

Corpus file formats for Maxent

Help
Soana
2008-07-02
2013-04-11
  • Soana

    Soana - 2008-07-02

    Hi everyone, I'm very new to OpenNLP.

    I need to understand how to create models in Italian language, so I need to know which file format should the trainig data need in order to create a new model.
    I am looking in a lot of websites but really I cannot find any tip!
    Should I need an italian Corpus for training models?
    If I need a Corpus, which style of annotation should I use in the Courpus?

    Thx a lot.

     
    • Thomas Morton

      Thomas Morton - 2008-07-10

      Hi,
         There is a samples directory in the maxent package which goes through how to set up training files for the maxent package in general.  There is nothing specific to Italian there as it depends on what type of classification you are trying to do for what kind of task.  The one thing to look out for is that you'll need to set the encoding for your reader so that it reads in the Italian using the correct encoding scheme and your text-based features don't get corrupted. 

      Hope this helps...Tom 

       

Log in to post a comment.