I've tried to use KH Coder in order to analyse a simple Italian txt.
The only statement i have on the file is the following "Una parte consistente di Italia vota politici che poi disprezza" Everything seems to work fine. The numbers of words is properly counted but i had a strange result on the word frequency list excel file generated. The words are truncates so instead of having "italia" the file shows "ital". Is there some particular setting i can modify in order to solve this issue ?
Other quick question. Is there any way to add the part of the speech for the Italian language ?
I'd like to have a list of element as for the English language such as Adjective Noun etc etc.
You seem to have CSS turned off.
Please don't fill out this field.
Thank you for the post!
KH Coder performs stemming or lemmatization in the pre-processing. So the "ital" is normal.
Currently KH Coder has no option to disable stemming / lemmatization, nor to recognize POS of Italian Language. Sorry for the inconvenience it may cause.
As of now, the only way is downloading the source code of KH Coder and modifying it yourself. Bypassing the stemming process is relatively easy if you know the Perl. To recognize POS, you have to integrate Italian POS tagger into KH Coder. Or you have to compose Italian model file for Stanford POS tagger.
Thank you Kohici !! Now is much clear. Sorry about my ignorance in matter of linguistic but is something I was checking for a friend of mine, involveded in a Phd in liguistic. I've just tested the program using a generic phrase, now makes sense even the english version.
What would it take from me to implement Finnish for KH Coder?
Because KH Coder is written in Perl, you need to write some Perl codes to add Finnish support.
The pre-processing of KH Coder goes like this:
1. Sentence Splitting: Splitting the paragraph into sentences
2. Tokenizing: Splitting the sentence into words
3. Lemmatization or Stemming of words
About step 3, you can use Snowball Stemmer to perform stemming of Finnish words. Or you can also just skip this step and recognize the words in conjugated forms. So this step is relatively easy.
But about step 1 and step 2, you have to write your own codes to perform the tasks, I think. (If you find Perl modules for that tasks, you may just use it.)
And please note that currently KH Coder cannot store alphabets with accents. So accents will be automatically dropped.