I've been trying to figure out how the co-occurrence networks in KH Coder are calculated. I've looked at the other posts in this forum and I have looked at the relevant academic papers. So far I've turned up a few different articles such as "KeyGraph: Automatic Indexing by Co-occurrence Graph based on Building Construction Metaphor" and "Keyword Extraction using Word Co-occurrence". Are these similar to the algorithm used in KH Coder? If not, what algorithm is used? I need to pin down the exact algorithm used to cite in my methods section.
I understand generally how co-occurrence works, but there seem to be a number of different algorithms used to calculate it.
You seem to have CSS turned off.
Please don't fill out this field.
Thank you for your post!
Well, I think KH Coder uses very normal algorithms to create co-occurrence networks. But detailed algorithms are only documented in the manual that is written in Japanese language. I am sorry for the inconvenience it may cause. Now I try to explain it in English here.
First, co-occurrence network is a common technique in quantitative content analysis field. And content analysis is a very common technique for analyzing media messages in sociological field. You can refer to these literatures:
In the option window, you can select words using TF, DF and POS.
KH Coder uses Jaccard coefficient to calculate strength of co-occurrence. And the top 60 strongest co-occurrences are drawn as network edges. You can change the number (60) in the option windows.
About Jaccard coefficient, you may consult this book. It explains the characteristic of Jaccard coefficient that ignore 0-0 pairs.
Now we have nodes and edges, so we can make a network. Please note that KH Coder will not draw nodes (words) that don't have any edges.
We use Fruchterman-Reingold algorithm to determine positions of the nodes (words). KH Coder uses igraph package of R for actual calculation. You may cite this article.
Random walks (walktrap.community):
You can also consult the manual of igraph because KH Coder uses igraph to perform these community detections. http://cran.r-project.org/web/packages/igraph/igraph.pdf
It's just the number of edges the node has.
Eigen vector (evcent):
Again, you can also consult the manual of igraph because KH Coder uses igraph to calculate these centrality values.
By the way, blue means low centrality, white means medium centrality and pink means high centrality.
Please feel free to post further questions.
Thank you! I appreciate the inclusion of more sources so that I may further read about this on my own. You have been very patient and helpful. Cheers!
hi, i nearly feel disappointed but i am so lucky to click at the forum site. thank you for the explanation on co occurrence Mr Higuchi. Now my understanding to the software has increase as i dont understand japanese. thank you so much
Hi, thank you for the post.
Good luck with your analysis.
You can analyze Japanese, English, French, German, Italian, Portuguese and Spanish text with KH Coder.
Only with these languages because the fonts available in KH Coder system or because of what? Can we import texts in other languages as long as they support Unicode?
KH Coder is not ready for Unicode. It can only handle Japanese characters and alphabets. So it cannot handle Chinese, Korean or any other non-alphabetical characters.
Also KH Coder uses Lingua::Sentence to split sentences. And it uses Snowball for stemming. KH Coder can handle the language that is compatible with both of them.
Thank you for your quick replying.
How about Vietnamese (using alphabetical characters), as shows in the following:
Có một sự thật mà ai cũng công nhận, đấy là: một người đàn ông có một tài sản khá hẳn sẽ muốn có một người vợ. Dù cho người ta chỉ biết rất ít về cảm nghĩ hay quan điểm của người đàn ông như thế, khi anh ta đến cư ngụ trong vùng, sự thật ấy đã in sâu vào đầu óc của những gia đình sống xung quanh, đến nỗi họ xem người đàn ông này là tài sản hợp pháp của cô con gái này hay cô con gái kia của họ.
Vào một ngày, bà Bennet nói với chồng mình:
Ông thân yêu, ông có biết tin đã có người thuê Netherfield Park chưa?
I appreciate very much if you can help me in testing long Vietnamese texts, which I will send to you by email.
Thank you for your support.
KH Coder is currently not ready for Vietnamese.
 KH Coder always applies stemming or lemmatization to the data. When you handle Vietnamese, you should bypass stemming / lemmatization. But currently KH Coder does not have this option. So you have to modify the program to do this.
 Also, you have to remove all accents from your data.
Sorry for the inconvenience.
Hi. Could I use KH coder to analyze italian text?
Yes, you can.
Select "Stemming with Snowball" and "Italian" in the settings window.
How can I analyse the content of a book I just have a list of words to check their Frequency and Percentage.
You need text data of the book. Maybe you have to type the whole book into a text file.
Hi, Could I use KH coder to analyze polish texts?
No, you can't. Sorry for the inconvenience.
Currently, you can analyze Japanese, English, French, German, Italian, Portuguese and Spanish text with KH Coder. Also, Catalan, Chinese (simplified), Korean, Russian and Slovenian language data can be analyzed with the latest alpha version.
It doesn't support other languages.
Sign up for the SourceForge newsletter: