Menu

How to access current line number

Peer
2006-03-31
2013-05-14
  • Peer

    Peer - 2006-03-31

    At fist a very large thank you to you for this package.

    I'm using pyparsing to parse C++. (I already saw someone using it to parse C).  I use it to generate some XML which represent the structure of the C++ program. I'm mainly using setResultsName() to name the XML tags.

    Now I would be able to somehow generate the line number of a match into the generated XML file. It might be a separate tag or it might be a part of a tag.

    Could you please help me?!
    If I wasn't precice enough I can give more information.

    Thanks!

     
    • Paul McGuire

      Paul McGuire - 2006-03-31

      Peer -

      This is so funny!  I've not really gotten this question before, and now you are the second person in the past 3 days to ask almost the exact same thing!  My earlier respondent asked how to get the location of the match within the given input string.  I replied with the e-mail I've attached below.

      To answer your question, you need to take one extra step, to convert from the location to the line number (and column number too, if you like).  Pyparsing provides three methods for accessing location data, given the input string and a location in the string:

      lineno(loc,strg) - gives the line number of loc within the given string (starting with 1)
      col(loc,strg) - gives the column number of loc within the given string (starting with 1)
      line(loc,strg) - gives the line of text of loc within the given string

      Here's a little test program showing how to use these methods:

      ----------
      import pyparsing

      inputString = """Now is the time
      for all good men
      to come to the
      aid of
      their country."""

      locn = 50 #pick an arbitrary place in the string

      # show where this location falls in the input string
      print inputString[:locn]+">"+inputString[locn:]
      print

      print "lineno:", pyparsing.lineno(locn, inputString)
      print "col:", pyparsing.col(locn, inputString)
      print "line:", pyparsing.line(locn, inputString)

      ----------
      This program prints:

      Now is the time
      for all good men
      to come to the
      ai>d of
      their country.

      lineno: 4
      col: 3
      line: aid of

      So, how do you get the location for a particular match in a string?  Here is the e-mail I sent out earlier just this week!  I hope that between the two of these e-mails, you can get further on your way.

      -- Paul

      =========================

      There are several ways to obtain the position of an element within the parsed text.  Here is the start of a sample pyparsing program:

      from pyparsing import *

      data = "The quick brown fox jumps over the lazy dog."
      wd = Word(alphas)

      Here are 3 options for getting both the matched text and the matching locations.

      1. Use scanString.  scanString returns a generator that yields the matching tokens, the start, and end location for every match found.  Here's how this would look with the above data:

      for t,s,e in wd.scanString(data):
          print "Word: %s, Start: %d, End: %d" % (t[0],s,e)

      This prints out:
      Word: The, Start: 0, End: 3
      Word: quick, Start: 4, End: 9
      Word: brown, Start: 10, End: 15
      Word: fox, Start: 16, End: 19
      Word: jumps, Start: 20, End: 25
      Word: over, Start: 26, End: 30
      Word: the, Start: 31, End: 34
      Word: lazy, Start: 35, End: 39
      Word: dog, Start: 40, End: 43   

      2. Use a parse action.  Parse actions are called during the parsing process, when the expression they are attached to matches.  Parse actions are called with the input string, the starting location of the match, and the matching tokens.  Unlike scanString, parse actions do not get passed the ending location.  Here's how a parse action would look for the above data:

         
      def foundWord(strng, locn, tokens):
          print "Word: %s, Start: %d, End: ???" % (tokens[0],locn)
      wd.setParseAction(foundWord)
      sentence = OneOrMore(wd) + oneOf(". ? !")
      sentence.parseString(data)

      This prints out:
      Word: The, Start: 0, End: ???
      Word: quick, Start: 4, End: ???
      Word: brown, Start: 10, End: ???
      Word: fox, Start: 16, End: ???
      Word: jumps, Start: 20, End: ???
      Word: over, Start: 26, End: ???
      Word: the, Start: 31, End: ???
      Word: lazy, Start: 35, End: ???
      Word: dog, Start: 40, End: ???

      3. Use a parse action, returning an object that packages both the token values and the location.  Parse actions can return strings or objects, which are included in the ParseResults structured object that is returned by parseString.  Look at this possible ending to our program:

      class TokenData(object):
          def __init__(self,s,loc):
              self.strg = s
              self.loc = loc
          def __str__(self):
              return "%(strg)s:%(loc)d" % self.__dict__

      def foundWordReturnTokenData(strng, locn, tokens):
          return TokenData(tokens[0],locn)

      wd.setParseAction(foundWordReturnTokenData)
      sentence = OneOrMore(wd) + oneOf(". ? !")
      for d in sentence.parseString(data):
          print d

      Which prints:
      The:0
      quick:4
      brown:10
      fox:16
      jumps:20
      over:26
      the:31
      lazy:35
      dog:40

      Parse actions can log data, return modified string data, or return objects.  In this example, an object containing the matched string AND the match location is returned.  For simplicity, I just added a __str__ method to the TokenData class to print out the values, but you could work with the .strg and .loc attributes on each element of the returned ParseResults to perform what ever task you prefer.  (Again, though, we do not have visibility to the end location.)

      Will one of these techniques work for you?

      -- Paul

       
    • Peer

      Peer - 2006-04-03

      Thank you. This works quite well. My method looks like this:

      def bodyEnd(strng, locn, tokens):
          global _bodylines
          startlno = _bodylines.pop()
          lno = lineno(locn, strng)
          return (tokens, "body length: %d" % (lno - startlno))

      But I have one more question. Currently I modify the tokens returned.  Is it possible to Group() my string so that a special tag is used for the XML output?  Currently the length is simply output as

      <ITEM>body length: 8</ITEM>

      I would like to receive a special tag like

      <BODY_LENGTH>body length: 8</BODY_LENGTH>

      Thank you!

      -- Peer

       
      • Paul McGuire

        Paul McGuire - 2006-04-03

        Peer -

        A couple of comments:
        1. Returning a tuple was formerly a method for returning modified tokens AND a modified parse location - I later disabled the ability to change the current parse location, and changed the pyparsing code to ignore the [0]'th element of any tuple returned, and just use the [1]'th element.  (This is why your current parse action works, but it is not going to last much longer.  I think the current version will start giving you deprecation warnings for this usage.)

        For your parse action, you should return just:

        return "body length: %d" % (lno - startlno)

        In a future release, I will undeprecate returning tuples from parse actions, but at that time, I will use the entire tuple as the return value.

        2. asXML takes its tagging hints from any results names you have defined.  Look at how this is done in the httpServerLogParser.py script included in the pyparsing examples.  In your program, you probably have code that looks something like:

        bodyEndExpr = ...whatever...
        bodyEndExpr.setParseAction(bodyEnd)

        To add a results name, you call setResultsName, AND assign back to the original token.  As in:

        bodyEndExpr = bodyEndExpr.setResultsName("BODY_LENGTH")

        The reason for this odd formation is that when I designed this method, I expected that a single expression (such as Word(nums) for an integer) might appear multiple times in a grammar, with different desired field names.  So I made setResultsName return an implicit copy of the original expression.  So something like this wont work:

        bodyEndExpr.setResultsName("BODY_LENGTH")  # WONT WORK!

        But this does:
        bodyEndExpr = (... blah ...).setResultsName("BODY_LENGTH")

        as does the previous expression.  Just remember that setResultsName is not a mutator, it is a copy-and-mutate operator.

        Hope this gives you a suitable XML tag; asXML() can be finicky to get just right.

        -- Paul

         
    • Peer

      Peer - 2006-04-04

      It helps - mostly.

      As you describe it the parseAction will overwrite the current token with new result.

      This is not exactly what I want to do. (I oversaw this fact last time.)

      Now I came to the solution to define an Empty expression after my real expression to call the parseAction. Then I can name the result and I receive a nice XML output.  Is this the right thing to do, or is there some better way?

      The generated XML is not perfect but it's good enough for me (I don't have to postprocess it using an XML reader...).

      Once again thank you for your help!
      Peer

       
      • Paul McGuire

        Paul McGuire - 2006-04-04

        Peer -

        I am not completely clear as to your problem with my last post.  Ordinarily, there is no conflict in setting both a parse action and a results name.  It would help if you could post at least a fragment of your grammar, and the corresponding parse action.

        As another option, the latest version of pyparsing allows you to attach multiple parse actions to a single ParserElement, and they are called in succession with the results from one (if the tokens are modified) passed as the tokens to the next.

        Your workaround is also interesting - I've not yet seen anyone attach a parse action to an Empty before.  This sounds like a clever workaround to your problem.

        -- Paul

         
    • Peer

      Peer - 2006-04-04

      The code is as follows:

      body << Group(Literal('{').setParseAction(bodyStart)
               + ZeroOrMore(vardecl | body | Literal('return')
                            | Word(bodychars).setResultsName("bodychars")
                            | ";")
               + Literal('}')
               + Empty().setParseAction(bodyEnd).setResultsName("BodyLength")
             ).setResultsName("body")

      Im my first attempt I attached the parse action to the closing brace (}) but then this brace is replaced by the body length information.

      My current code (shown above) uses an empty expression so that the closing brace shows up in the generated XML listing and the body length is listed as a separate tag after the brace.

      I hope I could explain my thoughts...

      Peer

       

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.