Menu

#70 Collocation based on addkey detecting node word as well

open
Client (37)
5
2014-08-18
2007-02-13
No

I ran a search for "any" word with the tag GE (in FLOB).

I then ran a collocation search and the top item listed was "'s" itself.

This can't be right, can it?

Query text:

<pos><word>_</word><poscode key="POS">GE</poscode></pos>

<pos><word>_</word><poscode key="POS">GE</poscode></pos> -> 's

This also affects the scope -- if the scope is L2 R2, then

AAA BBB 's CCC DDD

then AAA and BBB are in scope, as is CCC, but DDD isn't, and so far as I can see it should be...

Discussion

  • Andrew Hardie

    Andrew Hardie - 2007-02-13

    Logged In: YES
    user_id=1460495
    Originator: YES

    Just as a followup, here are the first few lines of the colloc list for the search given above:

    <listCollocs left="2" right="2">
    <colloc seq="1" freq="5602" score="424.1"><word>'s</word></colloc>
    <colloc seq="2" freq="876" score="181.8"><word>'</word></colloc>
    <colloc seq="3" freq="68" score="31.3"><word>st</word></colloc>
    <colloc seq="4" freq="48" score="23.5"><word>king</word></colloc>
    <colloc seq="5" freq="17" score="23.5"><word>stalnaker</word></colloc>
    <colloc seq="6" freq="68" score="22.9"><word>britain</word></colloc>
    <colloc seq="7" freq="23" score="22.7"><word>citizen</word></colloc>
    <colloc seq="8" freq="17" score="22.4"><word>behn</word></colloc>
    <colloc seq="9" freq="66" score="21.4"><word>father</word></colloc>
    <colloc seq="10" freq="71" score="20.7"><word>mother</word></colloc>
    <colloc seq="11" freq="29" score="19.1"><word>charter</word></colloc>
    <colloc seq="12" freq="14" score="18.3"><word>schongauer</word></colloc>

    And here are the same number of lines derived from a Word Query search for "'" or "'s" as GE, which should in theory give exactly the same results:

    <listCollocs left="2" right="2">
    <colloc seq="1" freq="70" score="32.3"><word>st</word></colloc>
    <colloc seq="2" freq="49" score="24.0"><word>king</word></colloc>
    <colloc seq="3" freq="17" score="23.5"><word>stalnaker</word></colloc>
    <colloc seq="4" freq="68" score="22.9"><word>britain</word></colloc>
    <colloc seq="5" freq="23" score="22.7"><word>citizen</word></colloc>
    <colloc seq="6" freq="17" score="22.4"><word>behn</word></colloc>
    <colloc seq="7" freq="67" score="21.8"><word>father</word></colloc>
    <colloc seq="8" freq="73" score="21.4"><word>mother</word></colloc>
    <colloc seq="9" freq="29" score="19.1"><word>charter</word></colloc>
    <colloc seq="10" freq="14" score="18.3"><word>schongauer</word></colloc>
    <colloc seq="11" freq="77" score="17.7"><word>children</word></colloc>
    <colloc seq="12" freq="93" score="17.5"><word>mr</word></colloc>

    this latter was based on the following query:

    <or><pos><word>'</word><poscode key="POS">GE</poscode></pos><pos><word>'s</word><poscode key="POS">GE</poscode></pos></or>

    Note that it is not just the presence / absence of "'s" at the top of the list. It is also the count of some of the other collocates (plus the total number of collocs on the list is larger in the second case).

     
  • Tony Dodd

    Tony Dodd - 2007-02-28

    Logged In: YES
    user_id=1036552
    Originator: NO

    If you look at the solution to an allpos query you'll see that the hit covers the pos tag but not the word itself. I'm surprised noone has complained about this; not only does it have the side effect mentioned here in collocations, it also looks strange. I'm slightly worried that fixing it may break sequence queries with allpos yet again but I'll take a look.