cutting down on realization overheads

Help
Raveesh
2009-11-06
2013-04-08
  • Raveesh

    Raveesh - 2009-11-06

    Hi Mike and others,

    By now you might be aware that I am *extending* a DotCCG grammar for prosody. Well the changes are in place, but the realization has become too slow, and at time it even goes out of memory. Reason being: with these prosodic enhancements

    1. the size of lexicon has grown up by a multiple of F, where F= (number of pitch accents) x (semantic features). That is, where as the earlier grammar had only one lexical entry for "box", now it has "box", "box@H\*","box@L\*",… and for each of these we have all possible permutations of semantic  feature such as INFO,KONTRAST, OWNER, COMMITMENT (H* vs L*). I know Mike, I have no other choice but to bring LF-valued features into DocCCG. This is my plan for the weekend.
    2. the syntactic categories for verbs for example has multiplied. What was something like s\\!np/adj earlier has now a prosodic equivalent i.e. s\\!np/(s/!adj) for handling boundary tones.
    4. as a consequence of 2. there are additional typechange rules. i.e. besides having
    adj:=> n/^n  for adjectives I now have adj:=> (s/!adj). Same for pp, np  complements of verbs.

    So, in all the realization task is burdened with lexical entries, multiple families and typechange rules. Are there any guidelines or parameter setups or anything to cut this overhead down. I can imagine of doing things in the realizer code for filtering specific to prosody but that would not be the right approach! Suggestions, comments, criticism are invited.

    thanks,
    Raveesh

     
  • Michael White

    Michael White - 2009-11-07

    The category for pred adjectives is usually s\np.  It would probably be somewhat more efficient to have multiple categories in the lexicon rather than use a type-changing rule in this case, but I doubt it would make a huge difference.

    If you're doing realization in a limited domain, then you should be able to put together a reasonably small sample of good realizations from which you can train an n-gram model that can help guide realization in anytime mode.  If you're just trying to test the grammar with ccg-test, it will use 4-gram precision to guide the realizer which really helps to guide the search.  Try it with packing turned off and make sure you have a pruning value set, eg 5 or 10, as well as a time limit and new best time limit set (eg maybe 5000 and 1000 ms).  You can increase java's memory limit by editing your bin/ccg-env file and upping the mem limit specified in $JAVA_ARGS.

     

Log in to post a comment.

Get latest updates about Open Source Projects, Conferences and News.

Sign up for the SourceForge newsletter:

JavaScript is required for this form.





No, thanks