multiple words

  • Comment has been marked as spam. 

    You can see all pending comments posted by this user  here


    Anonymous - 2013-08-21


    I am creating a correspondence analysis with some data (I posted the previous question about multiple documents as well).

    I am quite new with this type of software so more and more questions arise. In my txt document, I have propositions coming from cognitive maps. For example: "climate change requires concrete measures". When I create the correspondence analysis, I get the words "climate" and "change" separated. I would like them to be a set of words ("climate change", "decision making" and so on).

    I have seen the coding option of the program but I am not sure if that is the solution. Please, what should I add to the txt document so these words appear as a set?

    Again, sorry if the question is a bit obvious, but I am a complete newbie :)

    Thanks in advance!

  • HIGUCHI Koichi

    HIGUCHI Koichi - 2013-08-21

    Thank you for your post!
    It is very important question.

    Basically we have two or three choices.

    [1] We can pick up any arbitrary strings like “decision making” as a word. To do this, go to [Pre-Processing] [Select Words to Analyze] in the menu bar, input strings to the “force pick up” text field.

    [2] As you wrote, we can use coding rules to pick up multi-words concepts. You can use coding rules like this:

    *a word

    * both words
    decision and ( making or make )

    * words in close positions 1 (the distance between words <= 15 words)
    near(decision-make) or near(decision-making)

    * words in close positions 2 (in a same sentence and the distance between words <= 15 words)
    near(decision-make)[b] or near(decision-making)[b]

    * words in close positions 3 (in a same sentence and the distance between words <= 3 words)
    near(decision-make)[b3] or near(decision-making)[b3]

    [3] Just don’t care about it.

    Now, my recommendation in this case is [3] or [2].

    The choice [3] is actually not a bad option in the word-level analysis. Because there could be not only “decision making” but also “make decision,” “make our decisions,” or something like these in the data. Then if you pick up only “decision making” as a word, you can drop the other forms and thus you can drop some characteristics of the data from your analysis. So it can be better to just leave “decision” and “making” as separated words. And if you find “decision” and “making” are plotted very close in correspondence analysis, you can infer that these words may be often used together.

    And you can confirm this inference by making co-occurrence networks of words. If the words are connected in the network, it means that they are often used together in same paragraphs/sentences. If co-occurrence networks are not enough and you want to be very sure, then the choice [2], coding rules would be very useful. You can count concepts like “decision making” with coding rules.

    (In other words, you can only count words not concepts without using coding rules. For example, correspondence analysis of words can only tell you about words, and when you interpret it, you have to make inferences about concept. It is also important to check the original text using KWIC when you interpret results. Anyway, without making coding rules, the analysis would be in word-level. To perform the concept-level analysis, you need coding rules. But I don't mean that you always should make coding rules. The word-level analysis + inferences about concept + KWIC would be enough in certain cases.)

    So why did I implement the choice [1]? I think we need the function to pick up some proper nouns. For example, there is a character called “red shirt” in the novel “Botchan.” In this case, it can be better to pick up “red shirt” as a word.

    Well, did I make myself clear enough? If not, don’t hesitate to post more questions. Also please note that although I have some recommendations as a developer, the choice is always yours.

    Best regards,

    Last edit: HIGUCHI Koichi 2013-08-21
  • Comment has been marked as spam. 

    You can see all pending comments posted by this user  here


    Anonymous - 2013-08-26

    Thanks a lot for the answer! Since my sample of data is not very large, I decided to go for choice number 1. It allows me to visualize the sets of words as I wanted... so excellent!

    Thanks for the great software and support!



Cancel  Add attachments