Complicated parse rule. Is this possible at all?
Brought to you by:
ptmcg
Long time ago I played around a little with pyparsing, but I didn't have a real purpose. Now, I was trying to parse ASP.NET's __VIEWSTATE
field, and hit some problems. The 'old' format, which was entirely ASCII based went quite well, but the new format uses binary field types, compressed variable length integers, and other exotic things.
My problem is, at least for now, decoding variable length strings. The format is something like:
\x05 varInt characters
where varInt
is a number of 7-bit values (little endian), with all of them having bit 7 set to 1 except the last one. characters
is a variable length (=varInt
) of characters.
varInt
like thisbit7off = Word(srange("[\x00-\x7f]")) bit7on = Word(srange("[\x80-\xff]")) stringStart = Literal("\x05") int7bits = Combine(intStart + bit7off) int14bits = Combine(intStart + bit7on + bit7off) int21bits = Combine(intStart + bit7on + bit7on + bit7off) int28bits = Combine(intStart + bit7on + bit7on + bit7on + bit7off) varInt = Combine(int7bits | int14bits | int21bits | int28bits)
varInt
characters?
Maybe an example of a
varInt
: f3 08 would break down to x*128 + 115 (=0xf3 & 0x7f). or, finally, 1139. Bit 7 of 0xf3 indicates extra bytes follow.Yes it is possible to define an adaptive parser, that, in an expression "A B" modifies B based on what is parsed at A. Pyparsing includes the countedArray helper, which may work for you as-is, or give you an idea on how to do this in your own code - here is the source for countedArray:
def countedArray( expr, intExpr=None ):
"""
Helper to define a counted list of expressions.
This helper defines a pattern of the form::
integer expr expr expr...
where the leading integer tells how many expr expressions follow.
The matched tokens returns the array of expr tokens as a list - the leading count token is suppressed.
intExpr defaults to Word(nums) with a parse action to convert the numeric string to an int. You should be able to use countedArray as-is if you pass in an intExpr that parses varint and evaluates it in a parse action to the correct integer value.
-- Paul