Menu

words in more than one family

Help
2016-04-14
2016-04-14
  • Colin Batchelor

    Colin Batchelor - 2016-04-14

    I am clearly missing something, but I can't fathom what it is. I am writing a grammar with words which are ambiguous between parts of speech and tccg is only picking up one POS. A minimal .ccg file which shows the behaviour is:
    --8<--
    family N { entry: n;}
    family V { entry: s\n/n; }
    family ADJ { entry: n/n; }

    word dog:N;
    word dog:V;
    word dog:ADJ;
    --8<--
    the micro-lexicon.xml file looks like this:
    --8<--
    <?xml version="1.0" encoding="UTF-8"?>
    <ccg-lexicon name="micro.ccg" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="../lexicon.xsd">
    <family name="N" pos="N" closed="true">
    <entry name="Entry-1">
    <atomcat type="n">
    <fs/>
    </atomcat>
    </entry>
    <member stem="dog"/>
    </family>
    <family name="V" pos="V" closed="true">
    <entry name="Entry-1">
    <complexcat>
    <atomcat type="s">
    <fs/>
    </atomcat>
    <slash dir="\\" mode="&lt;"/>
    <atomcat type="n">
    <fs/>
    </atomcat>
    <slash dir="/" mode="&gt;"/>
    <atomcat type="n">
    <fs/>
    </atomcat>
    </complexcat>
    </entry>
    <member stem="dog"/>
    </family>
    <family name="ADJ" pos="ADJ" closed="true">
    <entry name="Entry-1">
    <complexcat>
    <atomcat type="n">
    <fs/>
    </atomcat>
    <slash dir="/" mode="&gt;"/>
    <atomcat type="n">
    <fs/>
    </atomcat>
    </complexcat>
    </entry>
    <member stem="dog"/>
    </family>
    </ccg-lexicon>
    --8<--
    and the results of some test parsing look like this:
    --8<--
    tccg> dog
    1 parse found.

    Parse: n/n

    (lex) dog :- n/n

    tccg> dog dog
    1 parse found.

    Parse: n/n

    (lex) dog :- n/n
    (lex) dog :- n/n
    (>B) dog dog :- n/n

    tccg> dog dog dog
    1 parse found.

    Parse: n/n

    (lex) dog :- n/n
    (lex) dog :- n/n
    (lex) dog :- n/n
    (>B) dog dog :- n/n
    (>B) dog dog dog :- n/n

    tccg> dog dog dog dog
    1 parse found.

    Parse: n/n

    (lex) dog :- n/n
    (lex) dog :- n/n
    (lex) dog :- n/n
    (lex) dog :- n/n
    (>B) dog dog :- n/n
    (>B) dog dog dog :- n/n
    (>B) dog dog dog dog :- n/n

    --8<--

    I am at a loss to see how to get openccg to recognize other types for dog. What am I missing?

    Many thanks,
    Colin.

     
  • Michael White

    Michael White - 2016-04-14

    Hello Colin

    This looks to me like a bug in the ccg2xml compiler unfortunately. Most of the work I've been involved with has used the native XML grammar format directly, so it seems that you may be the first person to find this bug. I just tried replicating it using "object" as a noun and verb in the tinytiny.ccg grammar, and looking at the compiled morph.xml file, the POS for "object" as a noun is mistakenly listed as "V" (same as for the verbal entries).

    A quick workaround is to come up with a unique prefix for each distinct POS in the .ccg file -- e.g., using "n_object" for the noun usage of "object" -- then writing a simple script that runs after ccg2xml that rewrites "n_object" with "n_object" in the compiled morph.xml and lexicon.xml files.

    Of course it would be better to look into and fix the bug in ccg2xml, for which a volunteer would be great. I don't believe the author of the compiler, Ben Wing, is likely to be able to do this, regrettably.

    Mike

     
    • Colin Batchelor

      Colin Batchelor - 2016-04-14

      Hello Mike,

      That works! I hadn't thought of looking in the morph.xml file. tccg is now reporting 3 parses for "dog".

      I'll have a quick look at ccg2xml to see whether anything obvious leaps out.

      Many thanks!
      Colin.

       

Log in to post a comment.