Learn how easy it is to sync an existing GitHub or Google Code repo to a SourceForge project!

Please provide pointer to paper explaining the operation and parameters used in each algorithm

Anonymous
2013-12-23
2013-12-24

• Anonymous
2013-12-23

Can someone please provide information about how each algorithm in the tool works? In particular, what parameters is each algorithm using to create each output? For example, what parameters are using in multidimensional scaling?

• Anonymous
2013-12-23

When we choose multidimensional scaling, what precisely is the parameter being measured for each word? Is it number of occurrences of a word (per paragraph or document?)?

When we choose SOM, what precisely is being used to classify words? Is it again the number of occurrences? Precisely how are the numbers of occurrences being computed?

Are all the word algorithms in KH Coder using number of occurrences as the statistic to operate upon? I understand the algorithms in general, but would like to understand precisely what metric is being used in each algorithm.

Thanks.

• HIGUCHI Koichi
2013-12-24

Basically KH Coder uses number of occurrences.
But it is transformed in some ways.

When we choose multidimensional scaling, what precisely is the parameter being measured for each word? Is it number of occurrences of a word (per paragraph or document?)?

It depends on what distance measure you choose. If you choose “Jaccard,” the numbers are transformed into 0 or 1. All numbers greater than 1 are transformed into 1 because Jaccard coefficient can handle binary (0-1) values only.

If you choose “Cosine,” KH Coder uses F / L * 1000 as the parameter. Where
F = number of occurrences of a word per paragraph or document,
L = length of the paragraph or document (number of words).
So F / L * 1000 means “how many times the word appear in 1000 words.”

If you choose “Euclid,” F / L * 1000 values will be standardized into z-scores (word wise). If we don’t use z-score, we may find group of high frequency words and group of low frequency words. We are interested in word occurrence patterns, not frequencies. So we use z-score to normalize frequencies.

When we choose SOM, what precisely is being used to classify words? Is it again the number of occurrences? Precisely how are the numbers of occurrences being computed?

SOM uses Euclid distance, so it’s the same parameter as you choose “Euclid” in multidimensional scaling options. Z-scores of F / L * 1000.

Last edit: HIGUCHI Koichi 2013-12-24

Anonymous