Compact Disjunctive LF ?

Help
Anonymous
2010-09-12
2013-04-08

  • Anonymous
    2010-09-12

    I am trying to create a compact version of the resulting LFs in the parses of the CCG Parse.java class.

    By default, those get unpacked into a fully flattened list of parses, but I am trying to achieve a single <lf> output which will contain a compact and disjunctive representation of the results.

    I have already found some infrastructure that could be useful, namely the HyloHelper "compact" and "append" methods, but I am still struggling with figuring out how to interact with the CKY table represented by parse/Chart.java, so that I can start compacting at the right moment of processing. I have seen that there are papers online showing off results with a similar CCG extension, but I have never found traces of the code anywhere online. Could anyone be able to give me some tips?

    Thanks,
    Deyan

     
  • Michael White
    Michael White
    2010-09-13

    Hello Deyan

    There is code to support generating from disjunctive logical forms (DLFs), but the creation of DLFs to date has been assumed to take place externally to OpenCCG (eg in a sentence planner).

    That said, it would be interesting to add the capability to create DLFs from the parse chart.  I'm curious, what is your intended usage?

    In principle, creating a DLF from the parse chart is conceptually straightforward - just add 'or' nodes for the 'or' branches in the and-or graph that the chart implicitly represents.  In practice, there will be a number of issues to address in implementing this capability (and unfortunately it's not high on my 'to do' list right now).  First, note that the LFs will have to be re-constructed bottom up, with the disjunctions in them.  Also, the unifications resulting from the rule applications will have to be re-applied to the DLF under construction.  Finally, the compact and convert routines will likely need to be modified to support inputs containing disjunctions.

    If you're interested in trying to implement this capability, I can answer more specific implementation questions.

    -Mike

     

  • Anonymous
    2010-09-23

    Hello Mike,

    Thank you for your helpful reply! I am afraid that the implications of the disjunctions being present during the parse process are a bit too sever for me to handle in the time-frame I had originally intended to fit. As you said, working with LFs already in the chart probably requires extensions of many (all?) components of the parsing process.

    My use case is a bit different from typical CCG usages, I am trying to parse linearized mathematical expressions into semantic trees, where the disjunctive compact form would allow me to convert the output to an explicit representation of the present ambiguities, which can in turn be further disambiguated by external tools. What is essential for my problem is that:

    1. I preserve all disambiguation my grammar can give me
    2. I don't multiply out ambiguity into multiple parses, but I localize it to the greatest extent possible. Of course, localizing should not be done at the expense of 1.

    If I achieve this I can directly move the ambiguous result into a standard XML language for representing mathematics (Content MathML) and almost immediately achieve searching on ambiguous mathematical expressions. That is one of the targets of my Master's Thesis, which I am currently working on. Any suggestions on what is the easiest way to get this outcome would be very welcome, as I don't think I will have the time to learn so much of OpenCCG as to make all necessary extensions to the Parser components. A tentative solution I am thinking of is to post-process the sequence of parses with my own disjunctive compactification, but that is a highly non-trivial problem in itself.

     
  • I don't know anything else about the specific domain you are working with, but if it is a non-human grammar of mathematical expressions, OpenCCG might be overkill. Have you looked into parser generators like Antlr?

    Having said that, you should look at work by Magdalena Wolska, using OpenCCG to work with mathematical expressions. Various papers available on her publications page:

    http://www.coli.uni-saarland.de/~magda/publications.html

    Jason

     

  • Anonymous
    2010-09-24

    I am not fully convinced it would be an overkill, since mathematical expressions in full generality come back and bit you in unexpectedly complex ways. I have already invested some work in the OpenCCG approach and will try to push that further, at least for now. If things turn out to be a dead-end street I would retrace my steps and start with something like Antlr indeed.

    Thank you to the pointer to Magdalena, actually she has been kind enough to offer support to my work already. So I am lucky to have valuable help as far as the grammar is concerned, my bottleneck at the moment is getting this representation bit out of the way, hence I am trying to write my own compactifier for disjunctive LFs.

    Thanks for the tips,
    Deyan

     
  • Michael White
    Michael White
    2010-09-25

    Hello Deyan

    Thanks for the background on your project.  I think you may be
    right that it will be easier to create a disjunctive LF from
    multiple parses than doing so directly from the chart, at least
    in terms of avoiding getting too far into openccg internals.

    I've thought about this a bit and it seems that it may not be too
    hard to come up with something reasonable.  The way to do it
    might be to start with the flattened LFs for each parse, matching
    up the elementary predications (EPs) that are equal, and creating
    a disjunct for the ones that differ.  I'll illustrate by example
    below, though I'm sure there will be some further details to
    resolve to turn this into an algorithm.  If you pursue this
    direction, I'd like to hear how it goes.

    Note btw that as you've probably already discovered, for each
    parse there is a distinct sign, whose LF is accessible via the
    category (ie get the cat then get the LF).  As shown in
    TextCCG.java, the LF can be 'compacted' and 'converted', ie
    compacted from flat form into hierarchical form, with nominals
    instantiated as atoms.  You can do this and then flatten the
    result to get a flat list of EPs with nominal atoms.  (Another
    option might be to just skip the compaction step.)

    Ok, as an example, let's look at the classic 5-way ambiguity in
    'A man saw a girl on the hill with a telescope' (ignoring
    determiners and tense).  In flat form, the 5 readings can be
    represented as follows, where the common part is shown just once:

    @e(see) ^ @e(<Arg0>m ^ @e(<Arg1>g) ^ @m(man) ^ @g(girl) ^
    @o(on) ^ @o(<Arg1>h) ^ @h(hill) ^
    @w(with) ^ @w(<Arg1>t) ^ @t(telescope) ^

    1. … @e(<Mod>o) ^ @h(<Mod>w)
    2. … @g(<Mod>o) ^ @g(<Mod>w)
    3. … @g(<Mod>o) ^ @h(<Mod>w)
    4. … @e(<Mod>o) ^ @e(<Mod>w)
    5. … @g(<Mod>o) ^ @e(<Mod>w)

    So, if you take readings 1 and 2, recognize the common parts, and
    disjoin the parts that differ, you get:

    1..2: … (@e(<Mod>o) ^ @h(<Mod>w)) v (@g(<Mod>o) ^ @g(<Mod>w))

    Now, when we get to the 3rd reading, we could just add a 3rd
    disjunct.  But, we'll get a more compact DLF if we recognize that
    the final two EPs can be partially merged with the 2nd disjunct,
    ie by recognizing that @g(<Mod>o) is in common, and disjoining
    @h(<Mod>w):

    1..3: … (@e(<Mod>o) ^ @h(<Mod>w)) v
              (@g(<Mod>o) ^ (@g(<Mod>w) v @h(<Mod>w)))

    Similarly, the 4th reading can be integrated by partially merging
    with the first disjunct:

    1..4: … (@e(<Mod>o) ^ (@h(<Mod>w) v @e(<Mod>w))) v
              (@g(<Mod>o) ^ (@g(<Mod>w) v @h(<Mod>w)))

    Finally, the 5th reading can be integrated by partially merging
    with the 2nd disjunct again:

    1..5: … (@e(<Mod>o) ^ (@h(<Mod>w) v @e(<Mod>w))) v
              (@g(<Mod>o) ^ (@g(<Mod>w) v @h(<Mod>w) v @e(<Mod>w)))

    At this point the DLF represents all five readings.  The
    conjoined part could then be compacted for better readability, I
    imagine.

    In the general case, it may be that incrementally merging in this
    way does not produce an optimally compact DLF, but that may not
    be too important in the scheme of things.

    -Mike