Grammar or syntax pattern search

2012-05-26
2012-10-29
  • Hello. I am very impressed by the features of KH coder.
    I am currently interested in doing research in grammar or syntax pattern
    searching. I wonder how does it works in KH coder. Could you show me some
    examples of how it can be done so I know how complicated it is. For example,
    if I want to search all progressing tense sentences, e.g. "John is reading,
    John was reading, John will be reading, John has been reading, John had been
    reading..."

    Many thanks

     
  • HIGUCHI Koichi
    HIGUCHI Koichi
    2012-05-26

    Hello. Thank you for your post!

    Strictly speaking, KH Coder doesn't have grammar or syntax pattern search
    functions. I am sorry for the inconvenience.

    But you can try pattern search functions of KH Coder.

    Before you begin, please check whether you specified the word "be" as a stop
    word or not. If you specified "be" as a stop word, the word "be" is recognized
    as "OTHER" by KH Coder. In this case, you have to change the setting to use
    "OTHER" words. Go to "PRe-Processing" "Select Words to Analyze" in the menubar and check "OTHER." Then click "OK." (By the default settings, KH Coder ignore "OTHER" words)

    Now you can try pattern search in "KWIC Concordance" window.

    1. Go to "Tools" "Words" "Search Words" in the menubar
    2. Input "read" and click the "Search" button
    3. Find and double click the line like "reading VBG"

    Now KWIC Concordance window is open and you can find all "VBG" form of the
    "read" in your data. I mean "reading" by saying '"VBG" form of the "read".'

    BTW, about the names like "VBG," please consult this page:
    http://www.computing.dcu.ie/~acahill/tagset.html

    Now you can find all "reading" in your data, but it is not enough I guess.
    There should be a word "be" in front of "reading." Is this correct? Then you
    have to follow this procedure:

    1. Click "Additional Options" in the KWIC window
    2. Select "L1" as "Position" of "Condition 1"
    3. Input "be" as "Word" of "Condition 1"
    4. Click "OK"
    5. Click "Search" again

    Now you can find all "be" + "reading" in the data. You may choose "L1-5"
    instead of "L1" maybe.

    Best regards,

     
    Last edit: HIGUCHI Koichi 2012-11-11
  • Dear K.H.,
    Thank you very much for your time and response. They are very helpful. I still
    have some questions.

    About the VBG" form of the "read". Yes. It is like that. But what if I want to
    count the frequency of, or concordance "all Ving" followed by "all form of
    'Be'" ?
    For example.
    listing all sentence contains match the following sets:

    AllBefroms + Ving:
    where AllBeForms:{is, are, was, been…} (a finite set)
    And
    where Ving:{reading, writing, eating, singing…}(a non-finite set depends on
    the input corpus texts)

    So that I can search the pattern and frequency of not just
    "AllBeForms+Reading", but also "AllBeForms+Ving".

    I think I probably would need information of POS. And I have saw the POS
    option on the KWIC feature.

    So I was soldering, how I can utilize the POS information to do a research (or
    there is other ways), and also question about how the POS option is applied,
    is it through a tagged corpus, or searching by a internal dictionary, or other
    methods?

    If it is tagged, how is the POS tagged. Is it tagged by KH coder, or through
    external taggers?
    And if it is tagged by KH coder, how is the accuracy rate of the POS tags,

    And finally, how do I claim them in my research paper? (e.g. is there document
    to support the validity of accuracy rate of the tagging for my research
    result?)

    Many Thanks.

     
  • HIGUCHI Koichi
    HIGUCHI Koichi
    2012-05-31

    Hello. Thank you for your post!

    AllBefroms + Ving:
    where AllBeForms:{is, are, was, been…} (a finite set)
    And
    where Ving:{reading, writing, eating, singing…}(a non-finite set depends on
    the input corpus texts)

    Let's use "KWIC Concordance" function of KH Coder again.

    1. Go to "Tools" "Words" "KWIC Concordances" in the menubar
    2. Type "VBG" as "Conj." and click "Search"

    Now you find all Ving/VBG in the data. Add one more option to find AllBefroms
    + Ving. It's the same procedure as last time.

    1. Click "Additional Options" in the KWIC window
    2. Select "L1" as "Position" of "Condition 1"
    3. Input "be" as "Word" of "Condition 1"
    4. Click "OK"
    5. Click "Search" again

    See the number beside "Hits:" to check frequency.

    If it is tagged, how is the POS tagged. Is it tagged by KH coder, or through
    external taggers?

    It is tagged by Stanford POS Tagger:
    http://nlp.stanford.edu/software/tagger.shtml

    Windows package of KH Coder includes Stanford POS Tagger and "left3words-
    wsj-0-18" model.

    About accuracy, please consult the tagger's web site and papers. According to
    their paper, it looks better than 95% (Kristina Toutanova, Dan Klein,
    Christopher Manning, and Yoram Singer 2003).

    Best regards.

     
    Last edit: HIGUCHI Koichi 2012-11-11


Anonymous


Cancel   Add attachments