Menu

Pyparsing usage advice

2004-07-31
2004-07-31
  • Nitin Madnani

    Nitin Madnani - 2004-07-31

    Hi Paul

    First off, let me complement you on a fantastic piece of software. I have been looking for something like this for so long. I write a lot of natural language processing applications and I write most of them in python so I have to write some kind of parser every time. This makes it soo much easier. Ok, now on to the real question....

    I am using pyparsing to parse transformation rules for my natural language processing application. The rules also have structure to them, i.e., a rule might look like:

    [ A [ B ] ] <--> [ C [ D ] ]

    where A, B, C, D are nodes in the rule (more complex than this)

    I have figured out how to parse the nodes out of the rules and return them like this:

    [ [ A, B ], [ C, D ] ]

    but I would like to return them in a list with the same hierarchical structure as in the original rule, i.e., I would like to return:

    [ [ A, [ B ] ] , [ C, [ D ] ] ]

    I hope that makes sense. Any advice is much appreciated.

    Thanks so much !!
    Nitin

     
    • Paul McGuire

      Paul McGuire - 2004-07-31

      Nitin -

      Thanks for your glowing complements!  My self-esteem is renewed for another day!

      You probably have some kind of construct named 'node', which corresponds to the "A", "B", "C", and "D" elements.  I am guessing that 'node' is a Forward element, since it appears that nodes can contain nodes.  When enclosing a nested node, be sure to enclose it using a Group construct, something like:

      from pyparsing import *

      ident = Word(alphas)
      node = Forward()
      node << OneOrMore( ident | Group( Suppress('[') + node + Suppress(']') ) )
      xform = node + "<-->" + node

      print xform.parseString("[ A [ B ] ] <--> [ C [ D ] ]")

      This will parse your expression.  The Group element will preserve the nesting within the expression.

      I'm not sure how far along you are with pyparsing, but look into the setResultsName() method.  You'll be able to do more with the results, as in:

      xform = node.setResultsName("lhs") + "<-->" + node.setResultsName("rhs")

      xformToks = xform.parseString("[ A [ B ] ] <--> [ C [ D ] ]")

      print xformToks.lhs
      print xformToks.rhs

      Using setResultsName helps protect your code from having to get updated when you expand your grammar.

      Best of luck!
      -- Paul

       

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.