Michael White - 2003-11-05

Here's a reply I'm reposting here.  -Mike

-------

Hi Luke,

Your question addresses a pretty subtle point actually.

First, let me note that all the cem-* grammars are part of a cross-linguistic study of how to handle ergativity via a parametric lexicon in CCG, which Cem Bozsahin is carrying out with Mark Steedman.  As such, these grammars are probably not the best ones to look at first.  I'd say the worldcup one is the best starting point currently.  We're actually missing a tiny intro grammar and gentle introduction to OpenCCG grammars -- Geert-Jan Kruijff started working on such an intro this summer, but didn't end up having time to complete it.  I'll attach the draft, in case it's useful.

Now, on to your question.  The reason these parses fail is that the category for 'and' that you've listed is for sentential coordination, not NP coordination, and
this grammar doesn't happen to have the right category for NP coordination (again, see the worldcup grammar for the right category).  So, why is this the wrong category for NP coordination?  Mainly, it's because NP coordination requires a different semantics, crucially involving variable binding, as the EWNLG paper or journal article submission should explain.  Technically though, it looks like the sentential coord category for 'and' should still produce a parse -- the reason it doesn't is as follows.  First, note that the dollar (stack) variables are coindexed, so the category is actually

and :- S$1\S$1/S$1

This means that the stack of args under S get unified, and in particular, after type-raising, the NP args for John and Mary will get unified.  But of course, we don't want the John and Mary NPs to get unified, b/c they should have their own discourse referents.  What actually happens is that the unification succeeds, but then the LF fails a well-formedness check in the hylo package.

In general, if you turn on the display of features in tccg, you can often figure out where a derivation is failing, but in this case it might still be a mystery b/c the derivation isn't cut off until the LF well-formedness check.

Finally, as for where to list such questions -- I think the Help forum, accessible from the project page (http://openccg.sourceforge.net/), is probably the most appropriate place, so I'll copy this reply there.  (There is also a developers forum, which has some traffic on it already, but you need to be a developer to access that one.)

-Mike

-----Original Message-----
From: Luke Sean Zettlemoyer
Sent: 05 November 2003 04:52
To: Michael White
Cc: Jason Baldridge
Subject: RE: OpenCCG inquiry

I got the most recent version.  I will definitely contribute anything that
I produce that might be useful.

I already have a specific question.  The parser is not producing a parse
that I expect.  The situation occurs in the grammar in the cem-english
directory (and probably others).  These three lexical entries are
involved:

and :- S$\S$/S$
John :- np
Mary :- np

I expected two parses for 'John and Mary' where 'John' and 'Mary' are each
type raised and then combined by 'and'.  However, the parser fails.

It is interesting to note that 'and Mary' parses to (S/(S\NP))\(S/(S\NP))
and (S\(S/NP))\(S\(S/NP)) as I would expect.  Also, 'John' parses to
(S/(S\NP)) and (S\(S/NP)).

I must be missing something obvious...

Thanks,
Luke

PS.  If there is a more appropriate place for me to post / send questions
(like an email list), please let me know.