Co-occurrence Networks

Anonymous
2013-04-30
2016-08-25
  • Comment has been marked as spam. 
    Undo

    You can see all pending comments posted by this user  here

    Anonymous - 2013-04-30

    Hello Koichi,

    I've been trying to figure out how the co-occurrence networks in KH Coder are calculated. I've looked at the other posts in this forum and I have looked at the relevant academic papers. So far I've turned up a few different articles such as "KeyGraph: Automatic Indexing by Co-occurrence Graph based on Building Construction Metaphor" and "Keyword Extraction using Word Co-occurrence". Are these similar to the algorithm used in KH Coder? If not, what algorithm is used? I need to pin down the exact algorithm used to cite in my methods section.

    I understand generally how co-occurrence works, but there seem to be a number of different algorithms used to calculate it.

    Thanks again,
    Nathan

     
    Last edit: Anonymous 2015-01-13
  • HIGUCHI Koichi

    HIGUCHI Koichi - 2013-04-30

    Thank you for your post!

    Well, I think KH Coder uses very normal algorithms to create co-occurrence networks. But detailed algorithms are only documented in the manual that is written in Japanese language. I am sorry for the inconvenience it may cause. Now I try to explain it in English here.

    0. Co-occurrence network

    First, co-occurrence network is a common technique in quantitative content analysis field. And content analysis is a very common technique for analyzing media messages in sociological field. You can refer to these literatures:

    • Osgood, C.E., 1959, "The Representational Model and Relevant Research Methods," I. de S. Pool ed., Trends in Content Analysis. Urbana, IL: University of Illinois Press.
    • Danowski, J. A., 1993, "Network analysis of message content," W. D. Richards Jr. & G. A. Barnett eds., Progress in communication sciences IV, Norwood, NJ: Ablex 197-221

    1. Selection of Words (Nodes)

    In the option window, you can select words using TF, DF and POS.

    2. Calculation of Co-occurrences (Edges)

    KH Coder uses Jaccard coefficient to calculate strength of co-occurrence. And the top 60 strongest co-occurrences are drawn as network edges. You can change the number (60) in the option windows.

    About Jaccard coefficient, you may consult this book. It explains the characteristic of Jaccard coefficient that ignore 0-0 pairs.

    • Romesburg, H. C. (1984) Cluster Analysis for Researchers, Belmont, CA: Lifetime Learning Publications

    Now we have nodes and edges, so we can make a network. Please note that KH Coder will not draw nodes (words) that don't have any edges.

    3. Layout of Nodes

    We use Fruchterman-Reingold algorithm to determine positions of the nodes (words). KH Coder uses igraph package of R for actual calculation. You may cite this article.

    • Fruchterman, T. M. J. & Reingold, E. M. (1991) "Graph Drawing by Force-directed Placement," Software - Practice and Experience, 21(11):1129-1164.

    4a. Community Detection

    Betweeness (edge.betweenness.community):

    • M Newman and M Girvan (2004) "Finding and evaluating community structure in networks," Physical Review E 69, 026113

    Random walks (walktrap.community):

    Modularity (fastgreedy.community):

    You can also consult the manual of igraph because KH Coder uses igraph to perform these community detections. http://cran.r-project.org/web/packages/igraph/igraph.pdf

    4b. Centrality

    Betweenness:

    • Freeman, L.C. (1979). Centrality in Social Networks I: Conceptual Clarification. Social Networks, 1, 215-239.
    • Ulrik Brandes, A Faster Algorithm for Betweenness Centrality. Journal of Mathematical Sociology 25(2):163-177, 2001.

    Degree:
    It's just the number of edges the node has.

    Eigen vector (evcent):

    • Bonacich, P. (1987). Power and Centrality: A Family of Measures. American Journal of Sociology, 92, 1170-1182.

    Again, you can also consult the manual of igraph because KH Coder uses igraph to calculate these centrality values.
    http://cran.r-project.org/web/packages/igraph/igraph.pdf

    By the way, blue means low centrality, white means medium centrality and pink means high centrality.

    Please feel free to post further questions.

     
    Last edit: HIGUCHI Koichi 2013-04-30
  • Comment has been marked as spam. 
    Undo

    You can see all pending comments posted by this user  here

    Anonymous - 2013-04-30

    Thank you! I appreciate the inclusion of more sources so that I may further read about this on my own. You have been very patient and helpful. Cheers!

     
    Last edit: Anonymous 2015-01-16
  • Comment has been marked as spam. 
    Undo

    You can see all pending comments posted by this user  here

    Anonymous - 2013-05-23

    hi, i nearly feel disappointed but i am so lucky to click at the forum site. thank you for the explanation on co occurrence Mr Higuchi. Now my understanding to the software has increase as i dont understand japanese. thank you so much

     
    Last edit: Anonymous 2014-12-31
  • HIGUCHI Koichi

    HIGUCHI Koichi - 2013-05-23

    Hi, thank you for the post.
    Good luck with your analysis.

     
  • Comment has been marked as spam. 
    Undo

    You can see all pending comments posted by this user  here

    Anonymous - 2014-01-17

    You can analyze Japanese, English, French, German, Italian, Portuguese and Spanish text with KH Coder.
    Only with these languages because the fonts available in KH Coder system or because of what? Can we import texts in other languages as long as they support Unicode?
    Thank you.
    Brandon

     
    Last edit: Anonymous 2015-01-24
  • HIGUCHI Koichi

    HIGUCHI Koichi - 2014-01-17

    Hi.

    KH Coder is not ready for Unicode. It can only handle Japanese characters and alphabets. So it cannot handle Chinese, Korean or any other non-alphabetical characters.

    Also KH Coder uses Lingua::Sentence to split sentences. And it uses Snowball for stemming. KH Coder can handle the language that is compatible with both of them.

     
    Last edit: HIGUCHI Koichi 2014-01-17
  • Comment has been marked as spam. 
    Undo

    You can see all pending comments posted by this user  here

    Anonymous - 2014-01-17

    Thank you for your quick replying.
    How about Vietnamese (using alphabetical characters), as shows in the following:
    Có một sự thật mà ai cũng công nhận, đấy là: một người đàn ông có một tài sản khá hẳn sẽ muốn có một người vợ. Dù cho người ta chỉ biết rất ít về cảm nghĩ hay quan điểm của người đàn ông như thế, khi anh ta đến cư ngụ trong vùng, sự thật ấy đã in sâu vào đầu óc của những gia đình sống xung quanh, đến nỗi họ xem người đàn ông này là tài sản hợp pháp của cô con gái này hay cô con gái kia của họ.
    Vào một ngày, bà Bennet nói với chồng mình:
    Ông thân yêu, ông có biết tin đã có người thuê Netherfield Park chưa?
    I appreciate very much if you can help me in testing long Vietnamese texts, which I will send to you by email.
    Thank you for your support.
    Brandon

     
    Last edit: Anonymous 2014-12-27
  • HIGUCHI Koichi

    HIGUCHI Koichi - 2014-01-17

    Hi.

    KH Coder is currently not ready for Vietnamese.

    [1] KH Coder always applies stemming or lemmatization to the data. When you handle Vietnamese, you should bypass stemming / lemmatization. But currently KH Coder does not have this option. So you have to modify the program to do this.

    [2] Also, you have to remove all accents from your data.

    Sorry for the inconvenience.

     
    Last edit: HIGUCHI Koichi 2014-01-17
  • Comment has been marked as spam. 
    Undo

    You can see all pending comments posted by this user  here

    Anonymous - 2014-08-21

    Hi. Could I use KH coder to analyze italian text?

     
    Last edit: Anonymous 2015-02-08
  • HIGUCHI Koichi

    HIGUCHI Koichi - 2014-08-22

    Hi,

    Yes, you can.

    Select "Stemming with Snowball" and "Italian" in the settings window.

     
  • Comment has been marked as spam. 
    Undo

    You can see all pending comments posted by this user  here

    Anonymous - 2016-02-04

    Hi
    How can I analyse the content of a book I just have a list of words to check their Frequency and Percentage.

    Best regards

     
    • HIGUCHI Koichi

      HIGUCHI Koichi - 2016-08-25

      Hi,

      You need text data of the book. Maybe you have to type the whole book into a text file.

      Best,

       
      Last edit: HIGUCHI Koichi 2016-08-25
  • Comment has been marked as spam. 
    Undo

    You can see all pending comments posted by this user  here

    Anonymous - 2016-08-25

    Hi, Could I use KH coder to analyze polish texts?

    Regards

     
    • HIGUCHI Koichi

      HIGUCHI Koichi - 2016-08-25

      No, you can't. Sorry for the inconvenience.

      Currently, you can analyze Japanese, English, French, German, Italian, Portuguese and Spanish text with KH Coder. Also, Catalan, Chinese (simplified), Korean, Russian and Slovenian language data can be analyzed with the latest alpha version.

      It doesn't support other languages.

      Best,

       


Anonymous

Cancel  Add attachments





Get latest updates about Open Source Projects, Conferences and News.

Sign up for the SourceForge newsletter:





No, thanks