pyparsing-users Mailing List for Python parsing module (Page 9)

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

The countedText() pattern I posted on this thread did not work
when the literal text extended over multiple lines.  Below, an
updated version that fixes that.  Also expanded the comments
and cleaned up some details.

Best regards,
John Shipman (jo...@nm...), Applications Specialist, NM Tech Computer Center,
Speare 146, Socorro, NM 87801, (575) 835-5735, http://www.nmt.edu/~john
   ``Let's go outside and commiserate with nature.''  --Dave Farber
================================================================
#!/usr/bin/env python3
#================================================================
# countedtext: Using pyparsing for a string preceded by a count
#----------------------------------------------------------------
# Author: John W. Shipman (jo...@nm...)
#         New Mexico Tech Computer Center
#         Socorro, NM 87801
#
# Problem: Sanjay Ghemawat's venerable 'ical' calendar utility
# (http://en.wikipedia.org/wiki/Ical_%28Unix%29) saves events
# in a .calendar file, in which the description of an event
# is saved in a line like this:
#
#   Text [6 [Easter]]
#
# The problem is to write a pyparsing pattern that parses the
# count and the bracketed string.  The shortcut method is to
# use QuotedString(quoteChar='[', endQuoteChar=']'), but this
# fails if the literal string contains a ']' character.
#
# Paul McGuire responded immediately to my post on the pyparsing
# mailing list, suggesting that I study the implementation of the
# countedArray() helper.  Based on this advice, I offer this
# implementation of a countedText() pattern that matches an
# integer followed by a literal string in brackets, complete
# with a test driver.
#   2012-01-03: Now allows newlines in the literal string.
#     Also expanded the comments and simplified some code.
#     To convert to Python 2.7:
#       - Removed '3' from the end of the first line.
#       - Uncommend the __future__ just below.
#   2012-01-01: Initial version.
#----------------------------------------------------------------

####from __future__ import print_function
import sys
import re
import pyparsing as pp

def countedText():
     '''Defines a pattern of the form:
          N "[" char* "]"
        where N is an integer that specifies the length of the
        following bracketed string literal.
        Example: "6 [Easter]"
     '''
     #--
     # The basic trick is to use Forward to create a dummy token
     # whose content can be filled in later.  The time sequence:
     #   A. When countedText() is called:
     #      - Define a pattern 'intExpr' for N, and attach a parse
     #        action to it that converts to type int.
     #      - Use Forward() to create a dummy token 'stringExpr'
     #        that will eventually match the (char*) of the pattern.
     #      - Create a closure named 'countedParseAction' and attach it
     #        as a parse action to intExpr.
     #      - Return a pattern that matches the whole construct, with
     #        the dummy token at the position of the (char*) part.
     #   B. When intExpr is matched:
     #      - Its first parse action converts the value to type int.
     #      - Its second parse action is the countedParseAction()
     #        closure, which extracts N from the token list t.
     #      - It creates a pattern that matches exactly N characters,
     #        including newlines.  The '<<' operator for the Forward
     #        class is overloaded to drop the real pattern in place
     #        of the dummy pattern.
     #--
     intExpr = pp.Word(pp.nums).setParseAction(lambda t: int(t[0]))
     stringExpr = pp.Forward()

     def countedParseAction(s, l, t):
         '''Parse action that sets up the count in stringExpr.

           To match the part between the brackets, we use a regex of
           the form ".{N}".  This works even for N=0.  The re.DOTALL
           flag makes "." match any character, even newline.
         '''
         n = int(t[0])
         stringExpr << pp.Combine(
             pp.Suppress("[") +
             pp.Regex(".{{{0:d}}}".format(n), re.DOTALL) +
             pp.Suppress("]"))
         return []

     #--
     # The second parse action uses the count to define the
     # stringExpr pattern using the actual value of the count.
     #--
     intExpr.addParseAction(countedParseAction)
     return (intExpr + stringExpr)

# - - - - -   m a i n

testLines = [                # Test output
     "0 []",                  # ['']
     "11 [abcdefghijk]",      # ['abcdefghijk']
     "6 [Easter]",            # ['Easter']
     "4 []]]]]",              # [']]]]]]']
     "6 [123\n56]",           # ['123\n56']
     "6 [ abcdef]"            # Fails (leading whitespace not skipped)
     ]

LINE_PAT = countedText()

def main():
     """Main
     """
     for line in testLines:
         test(line)

def test(line):
     '''Test one line
     '''
     try:
         result = LINE_PAT.parseString(line, parseAll=True)
         print("/{0}/ -> {1}".format(line, result))
     except pp.ParseException as x:
         print("{0}\n{1}^ Fail".format(line, " "*(x.column-1)))

# - - - - -   E p i l o g u e

if __name__ == "__main__":
     main()

2004	Jan	Feb	Mar (1)	Apr	May (1)	Jun	Jul	Aug (2)	Sep	Oct	Nov (2)	Dec
2005	Jan (2)	Feb	Mar (2)	Apr (12)	May (2)	Jun	Jul	Aug (12)	Sep	Oct (1)	Nov	Dec
2006	Jan (5)	Feb (1)	Mar (10)	Apr (3)	May (7)	Jun (2)	Jul (2)	Aug (7)	Sep (8)	Oct (17)	Nov	Dec (3)
2007	Jan (4)	Feb	Mar (10)	Apr	May (6)	Jun (11)	Jul (1)	Aug	Sep (19)	Oct (8)	Nov (32)	Dec (8)
2008	Jan (12)	Feb (6)	Mar (42)	Apr (47)	May (17)	Jun (15)	Jul (7)	Aug (2)	Sep (13)	Oct (6)	Nov (11)	Dec (3)
2009	Jan (2)	Feb (3)	Mar	Apr	May (11)	Jun (13)	Jul (19)	Aug (17)	Sep (8)	Oct (3)	Nov (7)	Dec (1)
2010	Jan (2)	Feb	Mar (19)	Apr (6)	May	Jun (2)	Jul	Aug (1)	Sep	Oct (4)	Nov (3)	Dec (2)
2011	Jan (4)	Feb	Mar (5)	Apr (1)	May (3)	Jun (8)	Jul (6)	Aug (8)	Sep (35)	Oct (1)	Nov (1)	Dec (2)
2012	Jan (2)	Feb	Mar (3)	Apr (4)	May	Jun (1)	Jul	Aug (6)	Sep (18)	Oct	Nov (1)	Dec
2013	Jan (7)	Feb (7)	Mar (1)	Apr (4)	May	Jun	Jul (1)	Aug (5)	Sep (3)	Oct (11)	Nov (3)	Dec
2014	Jan (3)	Feb (1)	Mar	Apr (6)	May (10)	Jun (4)	Jul	Aug (5)	Sep (2)	Oct (4)	Nov (1)	Dec
2015	Jan	Feb	Mar	Apr (13)	May (1)	Jun	Jul (2)	Aug	Sep (9)	Oct (2)	Nov (11)	Dec (2)
2016	Jan	Feb (3)	Mar (2)	Apr	May	Jun	Jul (3)	Aug	Sep	Oct (1)	Nov (1)	Dec (4)
2017	Jan (2)	Feb (2)	Mar (2)	Apr	May	Jun	Jul (4)	Aug	Sep	Oct (4)	Nov (3)	Dec
2018	Jan (10)	Feb	Mar (1)	Apr	May	Jun (1)	Jul	Aug	Sep	Oct (2)	Nov	Dec
2019	Jan	Feb	Mar	Apr	May	Jun (2)	Jul	Aug	Sep	Oct	Nov	Dec
2020	Jan	Feb (1)	Mar	Apr	May (1)	Jun	Jul	Aug	Sep	Oct	Nov	Dec
2022	Jan	Feb	Mar	Apr	May	Jun	Jul	Aug	Sep	Oct	Nov	Dec (1)
2023	Jan	Feb	Mar	Apr (1)	May	Jun	Jul	Aug	Sep	Oct	Nov	Dec
2024	Jan	Feb (1)	Mar	Apr (1)	May	Jun	Jul (1)	Aug (3)	Sep (1)	Oct (1)	Nov	Dec

pyparsing-users Mailing List for Python parsing module (Page 9)

pyparsing-users — User notes and help on the pyparsing module