Thread: [Pyparsing] fixed-length field preceded by length?
Brought to you by:
ptmcg
From: John W. S. <jo...@nm...> - 2011-12-31 20:40:00
|
I'm trying to parse the .calendar file used by the old Unix ical calendar program (the one written by Sanjay Ghemawat, not any of the modern iCal utilities). Here is a typical line from this file: Text [6 [Easter]] As you can see, the length is given explicitly, followed by whitespace, and the text string in brackets. How do I write the pyparsing syntax for lines of this form? The length may be any number, and I'd like to handle the case where '[' or ']' characters occur inside the text string. It's quite straightforward to write the patterns, but how do I write the pattern for the text string in such a way that it gets the length from a previous token in the same pattern? I just paid $10 for Paul McGuire's "Getting Started" eBook and was quite disappointed that there's no reference section, and no obvious solution to my problem. Any suggestions? Best regards (and happy new year!), John Shipman (jo...@nm...), Applications Specialist, NM Tech Computer Center, Speare 146, Socorro, NM 87801, (575) 835-5735, http://www.nmt.edu/~john ``Let's go outside and commiserate with nature.'' --Dave Farber |
From: Paul M. <pt...@au...> - 2011-12-31 23:01:03
|
John - Sorry you were disappointed in GSWP; I had proposed a longer outline to O'Reilly which included a reference section, but this would have inflated the book beyond the 60-70 page length which was their ebook target at the time. Please take a look at how countedArray is implemented in the pyparsing code. I think you could copy this method and write a countedLengthTextString method using the same principles. If you are still spinning your wheels, write back! -- Paul (There are docs, how-to's, UML diagrams, and many examples included in the source distribution, please see if that material helps flesh out what you didn't get from GSWP. Also, there is online htmldoc (generated by epydoc) at http://packages.python.org/pyparsing/, and a link to an HTML how-to at http://pyparsing.svn.sourceforge.net/viewvc/pyparsing/src/HowToUsePyparsing. html.) -----Original Message----- From: John W. Shipman [mailto:jo...@nm...] Sent: Saturday, December 31, 2011 2:40 PM To: Pyp...@li... Subject: [Pyparsing] fixed-length field preceded by length? I'm trying to parse the .calendar file used by the old Unix ical calendar program (the one written by Sanjay Ghemawat, not any of the modern iCal utilities). Here is a typical line from this file: Text [6 [Easter]] As you can see, the length is given explicitly, followed by whitespace, and the text string in brackets. How do I write the pyparsing syntax for lines of this form? The length may be any number, and I'd like to handle the case where '[' or ']' characters occur inside the text string. It's quite straightforward to write the patterns, but how do I write the pattern for the text string in such a way that it gets the length from a previous token in the same pattern? I just paid $10 for Paul McGuire's "Getting Started" eBook and was quite disappointed that there's no reference section, and no obvious solution to my problem. Any suggestions? Best regards (and happy new year!), John Shipman (jo...@nm...), Applications Specialist, NM Tech Computer Center, Speare 146, Socorro, NM 87801, (575) 835-5735, http://www.nmt.edu/~john ``Let's go outside and commiserate with nature.'' --Dave Farber ---------------------------------------------------------------------------- -- Ridiculously easy VDI. With Citrix VDI-in-a-Box, you don't need a complex infrastructure or vast IT resources to deliver seamless, secure access to virtual desktops. With this all-in-one solution, easily deploy virtual desktops for less than the cost of PCs and save 60% on VDI infrastructure costs. Try it free! http://p.sf.net/sfu/Citrix-VDIinabox _______________________________________________ Pyparsing-users mailing list Pyp...@li... https://lists.sourceforge.net/lists/listinfo/pyparsing-users |
From: John W. S. <jo...@nm...> - 2012-01-01 19:31:34
|
#!/usr/bin/env python3 #================================================================ # countedtext: Using pyparsing for a string preceded by a count #---------------------------------------------------------------- # Author: John W. Shipman (jo...@nm...) # New Mexico Tech Computer Center # Socorro, NM 87801 # # Problem: Sanjay Ghemawat's venerable 'ical' calendar utility # (http://en.wikipedia.org/wiki/Ical_%28Unix%29) saves events # in a .calendar file, in which the description of an event # is saved in a line like this: # # Text [6 [Easter]] # # The problem is to write a pyparsing pattern that parses the # count and the bracketed string. The shortcut method is to # use QuotedString(quoteChar='[', endQuoteChar=']'), but this # fails if the literal string contains a ']' character. # # Paul McGuire responded immediately to my post on the pyparsing # mailing list, suggesting that I study the implementation of the # countedArray() helper. Based on this advice, I offer this # implementation of a countedText() pattern that matches an # integer followed by a literal string in brackets, complete # with a test driver. #---------------------------------------------------------------- import sys import pyparsing as pp def countedText(): '''Defines a pattern of the form: int "[" char* "]" where int is an integer that specifies the length of the following bracketed string literal. Example: "6 [Easter]" ''' stringExpr = pp.Forward() def countedParseAction(s, l, t): '''Parse action that sets up the count in stringExpr. ''' n = int(t[0]) #-- # CharsNotIn does not like exact=0. We use Combine so that # whichever pattern represents the contents, the result is # a single string. #-- if n > 0: contents = pp.CharsNotIn('', exact=n) else: contents = pp.Empty() stringExpr << pp.Combine(pp.Literal("[").suppress() + contents + pp.Literal("]").suppress()) return [] #-- # The first parse action converts the count to an int. #-- intExpr = pp.Word(pp.nums).setParseAction(lambda t: int(t[0])) #-- # The second parse action uses the count to define the # stringExpr pattern using the actual value of the count. #-- intExpr.addParseAction(countedParseAction) return (intExpr + stringExpr) linePat = countedText() # - - - - - m a i n testLines = [ # Test output "0 []", # [''] "11 [abcdefghijk]", # ['abcdefghijk'] "6 [Easter]", # ['Easter'] "4 [[[]]]", # ['[[]]'] "6 []]]]]]]", # [']]]]]]'] "6 [ abcdef]" # Fails (leading whitespace not skipped) ] def main(): """Main """ for line in testLines: test(line) def test(line): '''Test one line ''' print("\n", line, sep='') try: result = linePat.parseString(line, parseAll=True) print(result) except pp.ParseException as x: print("{}^".format(" "*(x.column-1))) print("No") # - - - - - E p i l o g u e if __name__ == "__main__": main() |
From: John W. S. <jo...@nm...> - 2012-01-04 02:45:55
|
The countedText() pattern I posted on this thread did not work when the literal text extended over multiple lines. Below, an updated version that fixes that. Also expanded the comments and cleaned up some details. Best regards, John Shipman (jo...@nm...), Applications Specialist, NM Tech Computer Center, Speare 146, Socorro, NM 87801, (575) 835-5735, http://www.nmt.edu/~john ``Let's go outside and commiserate with nature.'' --Dave Farber ================================================================ #!/usr/bin/env python3 #================================================================ # countedtext: Using pyparsing for a string preceded by a count #---------------------------------------------------------------- # Author: John W. Shipman (jo...@nm...) # New Mexico Tech Computer Center # Socorro, NM 87801 # # Problem: Sanjay Ghemawat's venerable 'ical' calendar utility # (http://en.wikipedia.org/wiki/Ical_%28Unix%29) saves events # in a .calendar file, in which the description of an event # is saved in a line like this: # # Text [6 [Easter]] # # The problem is to write a pyparsing pattern that parses the # count and the bracketed string. The shortcut method is to # use QuotedString(quoteChar='[', endQuoteChar=']'), but this # fails if the literal string contains a ']' character. # # Paul McGuire responded immediately to my post on the pyparsing # mailing list, suggesting that I study the implementation of the # countedArray() helper. Based on this advice, I offer this # implementation of a countedText() pattern that matches an # integer followed by a literal string in brackets, complete # with a test driver. # 2012-01-03: Now allows newlines in the literal string. # Also expanded the comments and simplified some code. # To convert to Python 2.7: # - Removed '3' from the end of the first line. # - Uncommend the __future__ just below. # 2012-01-01: Initial version. #---------------------------------------------------------------- ####from __future__ import print_function import sys import re import pyparsing as pp def countedText(): '''Defines a pattern of the form: N "[" char* "]" where N is an integer that specifies the length of the following bracketed string literal. Example: "6 [Easter]" ''' #-- # The basic trick is to use Forward to create a dummy token # whose content can be filled in later. The time sequence: # A. When countedText() is called: # - Define a pattern 'intExpr' for N, and attach a parse # action to it that converts to type int. # - Use Forward() to create a dummy token 'stringExpr' # that will eventually match the (char*) of the pattern. # - Create a closure named 'countedParseAction' and attach it # as a parse action to intExpr. # - Return a pattern that matches the whole construct, with # the dummy token at the position of the (char*) part. # B. When intExpr is matched: # - Its first parse action converts the value to type int. # - Its second parse action is the countedParseAction() # closure, which extracts N from the token list t. # - It creates a pattern that matches exactly N characters, # including newlines. The '<<' operator for the Forward # class is overloaded to drop the real pattern in place # of the dummy pattern. #-- intExpr = pp.Word(pp.nums).setParseAction(lambda t: int(t[0])) stringExpr = pp.Forward() def countedParseAction(s, l, t): '''Parse action that sets up the count in stringExpr. To match the part between the brackets, we use a regex of the form ".{N}". This works even for N=0. The re.DOTALL flag makes "." match any character, even newline. ''' n = int(t[0]) stringExpr << pp.Combine( pp.Suppress("[") + pp.Regex(".{{{0:d}}}".format(n), re.DOTALL) + pp.Suppress("]")) return [] #-- # The second parse action uses the count to define the # stringExpr pattern using the actual value of the count. #-- intExpr.addParseAction(countedParseAction) return (intExpr + stringExpr) # - - - - - m a i n testLines = [ # Test output "0 []", # [''] "11 [abcdefghijk]", # ['abcdefghijk'] "6 [Easter]", # ['Easter'] "4 []]]]]", # [']]]]]]'] "6 [123\n56]", # ['123\n56'] "6 [ abcdef]" # Fails (leading whitespace not skipped) ] LINE_PAT = countedText() def main(): """Main """ for line in testLines: test(line) def test(line): '''Test one line ''' try: result = LINE_PAT.parseString(line, parseAll=True) print("/{0}/ -> {1}".format(line, result)) except pp.ParseException as x: print("{0}\n{1}^ Fail".format(line, " "*(x.column-1))) # - - - - - E p i l o g u e if __name__ == "__main__": main() |