I need to understand how to create models in Italian language, so I need to know which file format should the trainig data need in order to create a new model.
I am looking in a lot of websites but really I cannot find any tip!
Should I need an italian Corpus for training models?
If I need a Corpus, which style of annotation should I use in the Courpus?
Thx a lot.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi,
There is a samples directory in the maxent package which goes through how to set up training files for the maxent package in general. There is nothing specific to Italian there as it depends on what type of classification you are trying to do for what kind of task. The one thing to look out for is that you'll need to set the encoding for your reader so that it reads in the Italian using the correct encoding scheme and your text-based features don't get corrupted.
Hope this helps...Tom
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi everyone, I'm very new to OpenNLP.
I need to understand how to create models in Italian language, so I need to know which file format should the trainig data need in order to create a new model.
I am looking in a lot of websites but really I cannot find any tip!
Should I need an italian Corpus for training models?
If I need a Corpus, which style of annotation should I use in the Courpus?
Thx a lot.
Hi,
There is a samples directory in the maxent package which goes through how to set up training files for the maxent package in general. There is nothing specific to Italian there as it depends on what type of classification you are trying to do for what kind of task. The one thing to look out for is that you'll need to set the encoding for your reader so that it reads in the Italian using the correct encoding scheme and your text-based features don't get corrupted.
Hope this helps...Tom