Menu

regex expansions

Help
Tom Aquent
2007-07-06
2013-04-08
  • Tom Aquent

    Tom Aquent - 2007-07-06

    Thank you very much to the developers of this software for your work. 

    The regsub expansion seems to collapse all white space, and I was wondering if there is a way around it (could be due to my inexperience with python regular expressions). 

    Basically I'm trying to generate the word list by convoluting various morpheme classes instead of listing them by hand.  Let me give a simplified example:

    #### Words ####

    def mac(pos) {
      word _prefix_root_suffix: pos;
    }

    def prefixgen(P1,P2,P3) {
      regsub('_prefix', P1, mac(pos))
      regsub('_prefix', P2, mac(pos))
      regsub('_prefix', P3, mac(pos))
    }

    def rootgen(R1,R2,R3) {
      regsub('_root', R1, prefixgen(a, b, c))
      regsub('_root', R2, prefixgen(a, b, c))
      regsub('_root', R3, prefixgen(a, b, c))
    }

    def suffixgen(S1,S2,S3) {
      regsub('_suffix', S1, rootgen(d, e, f))
      regsub('_suffix', S2, prefixgen(d, e, f))
      regsub('_suffix', S3, prefixgen(d, e, f))
    }

    suffixgen(g,h,i)

    This is what happens:

    $ ccg2xml.py -p "" xml_seed.ccg
    ccg2xml: Processing xml_seed.ccg
    Error, line 1: Syntax error at 'wordadg:pos;wordbdg:pos;wordcdg:pos;wordaeg:pos;wordbeg:pos;wordceg:pos;wordafg:pos;wordbfg:pos;wordcfg:pos;'

    Can anyone help with this?  Thanks so much,
    Tom

     
    • Jason Baldridge

      Jason Baldridge - 2007-09-28

      Hi Tom,

      The way you were using the macros is actually in the reverse direction that they are meant to be used. Rather than going into exactly what is the problem, here's some code which does what you want:

      def suffixgen(prefix, root, pos) {
        word prefix . root . g: pos;
        word prefix . root . h: pos;
        word prefix . root . i: pos;
      }

      def rootgen(prefix, pos) {
        suffixgen(prefix, d, pos);
        suffixgen(prefix, e, pos);
        suffixgen(prefix, f, pos);
      }

      def prefixgen(pos) {
        rootgen(a, pos);
        rootgen(b, pos);
        rootgen(c, pos);
      }

      prefixgen(N);

      It will construct all the combinations, with "pos" as "N".

      You might find it helpful to read the paper which we just finished for the Grammar Engineering Across the Frameworks workshop. It is available here:

      http://comp.ling.utexas.edu/jbaldrid/papers/baldridge_etal_geaf07.pdf

      Hope this helps!

      Jason, Ben, and Sudipta

       

Log in to post a comment.