Questions about Modularity

Anonymous
2013-04-24
2013-04-26

  • Anonymous
    2013-04-24

    Hi ~

    First off, thanks for making this tool available!

    I am soon to be entering into graduate school and have already begun to explore what KH Coder can offer me as a student of sociology. In fact, I am planning on using KH Coder for conducting the content analysis to be used in my thesis. I have some background in statistics and linear algebra, but have not yet taken any courses on graph theory. So, in that respect I am a beginner. I have a question about how modularity is being calculated in KH Coder.

    I have found this explanation of modularity cited in a number of papers: http://www.pnas.org/content/103/23/8577.full

    Is this similar to what is underlying the network modularity calculations in KH Coder?

     
    Last edit: Anonymous 2013-04-24
  • HIGUCHI Koichi
    HIGUCHI Koichi
    2013-04-24

    Hello from Japan.

    KH Coder uses "igraph" package of R to detect communities. So you can consult the manual of igraph here for details:
    http://cran.r-project.org/web/packages/igraph/igraph.pdf

    KH Coder uses igraph function "edge.betweenness.community," "fastgreedy.community" (modularity) and "walktrap.community" (random walks).

    According to the igraph's manual p.89, you would be able to find the definition of modularity in this paper:
    http://arxiv.org/abs/cond-mat/0408187

    Best regards.

     

  • Anonymous
    2013-04-25

    Hello~

    You've already been very helpful but I have a few more questions.

    The first couple are about the 'export word-context matrix' files. What are the numbers that are in the output file? Is this a matrix of the Jaccard coefficients for co-occurring words? Is this a type of bigram matrix? Or, is this something else? Also, please explain what is meant by 'context'.

    The next question is about editing the R output file from a co-occurrence network. I have loaded the R source file into the included R package as per the online instructions, but where do I find the files 'edit_netowrk1.r' and 'edit_netowrk2.r'? Perhaps I have not configured KH Coder correctly?

    Thanks for all of your work and for the extra helping hand!

    *Edited: Clarification of question.

     
    Last edit: Anonymous 2013-04-25
  • HIGUCHI Koichi
    HIGUCHI Koichi
    2013-04-26

    Thank you for your post!

    In a "word-context matrix," each row represents a "context vector" of each word. So number of words equals to number of rows.

    To make a "context vector" of word_i, we can use word_1, word_2, word_3,,, and word_n. And the vector is (e_1, e_2, e_3,,, e_n) where e_1 is a mean of word_1 frequencies in documents that include word_i.

    If you use this matrix to perform cluster analysis of top 150 frequently appeared words for example, you can use appearance pattern information of not only 150 words, but n words. You can use information of over 5000 words to perform clustering of 150 words for example and thus you can use information of low frequency words. Generally speaking, there are many specific and valuable words in low frequency words. So I think it may be useful to exploit appearance pattern information of those low frequency words. (If you use simple document-term matrix to perform cluster analysis of top 150 words, you can only use the information of 150 words.)

    Please open "tutorial_en\using_R" folder to find the file "edit_network1.r."

     

  • Anonymous
    2013-04-26

    Thanks so much! Is there a procedure for being able to see more of what is going on behind the scenes of KH Coder within R? I know R is used as a back-end for the co-occurrence networks, but I guess I want to get a grasp over what calculations are happening in R. Is it as simple as saving the co-occurrence network as an R file, and then opening it with the included R package and running through it line-by-line?

    Maybe this is too advanced a question for here and something that I will able to find the answer to in time. If it's not too much trouble though, any tips to point me in the right direction would help. Perhaps you may be able to point me to some resources or have some ideas of literature that would be good for me to start with to learn more?

     
    Last edit: Anonymous 2013-04-26
  • HIGUCHI Koichi
    HIGUCHI Koichi
    2013-04-26

    Is it as simple as saving the co-occurrence network as an R file, and then opening it with the included R package and running through it line-by-line?

    Well, yes, you can do this. "R Source" files are just text files which include R commands. So you can open it with any text editors and copy and paste commands to R console.

    But this way, you can get not a grasp but details on statistical processing. And R commands files of KH Coder are somewhat long and complicated...

    You can try looking into "R Source" files or just ask questions here :)

     
    Last edit: HIGUCHI Koichi 2013-04-26


Anonymous


Cancel   Add attachments