Menu

#17 Complicated parse rule. Is this possible at all?

v1.0 (example)
open
nobody
None
5
2017-01-02
2017-01-02
No

Long time ago I played around a little with pyparsing, but I didn't have a real purpose. Now, I was trying to parse ASP.NET's __VIEWSTATE field, and hit some problems. The 'old' format, which was entirely ASCII based went quite well, but the new format uses binary field types, compressed variable length integers, and other exotic things.

My problem is, at least for now, decoding variable length strings. The format is something like:

\x05 varInt characters

where varInt is a number of 7-bit values (little endian), with all of them having bit 7 set to 1 except the last one. characters is a variable length (=varInt) of characters.

  1. I think I could probably parse the varInt like this
        bit7off        = Word(srange("[\x00-\x7f]"))
        bit7on         = Word(srange("[\x80-\xff]"))
        stringStart    = Literal("\x05")

        int7bits     = Combine(intStart + bit7off)
        int14bits    = Combine(intStart + bit7on + bit7off)
        int21bits    = Combine(intStart + bit7on + bit7on + bit7off)
        int28bits    = Combine(intStart + bit7on + bit7on + bit7on + bit7off)
        varInt       = Combine(int7bits | int14bits | int21bits | int28bits)
  1. But is there a way to take the result obtained above to force the read of exactly varInt characters?

Discussion

  • John Coppens

    John Coppens - 2017-01-02

    Maybe an example of a varInt: f3 08 would break down to x*128 + 115 (=0xf3 & 0x7f). or, finally, 1139. Bit 7 of 0xf3 indicates extra bytes follow.

     
  • Paul McGuire

    Paul McGuire - 2017-01-02

    Yes it is possible to define an adaptive parser, that, in an expression "A B" modifies B based on what is parsed at A. Pyparsing includes the countedArray helper, which may work for you as-is, or give you an idea on how to do this in your own code - here is the source for countedArray:

    def countedArray( expr, intExpr=None ):
    """
    Helper to define a counted list of expressions.
    This helper defines a pattern of the form::
    integer expr expr expr...
    where the leading integer tells how many expr expressions follow.
    The matched tokens returns the array of expr tokens as a list - the leading count token is suppressed.

    If C{intExpr} is specified, it should be a pyparsing expression that produces an integer value.
    
    Example::
        countedArray(Word(alphas)).parseString('2 ab cd ef')  # -> ['ab', 'cd']
    
        # in this parser, the leading integer value is given in binary,
        # '10' indicating that 2 values are in the array
        binaryConstant = Word('01').setParseAction(lambda t: int(t[0], 2))
        countedArray(Word(alphas), intExpr=binaryConstant).parseString('10 ab cd ef')  # -> ['ab', 'cd']
    """
    arrayExpr = Forward()
    def countFieldParseAction(s,l,t):
        n = t[0]
        arrayExpr << (n and Group(And([expr]*n)) or Group(empty))
        return []
    if intExpr is None:
        intExpr = Word(nums).setParseAction(lambda t:int(t[0]))
    else:
        intExpr = intExpr.copy()
    intExpr.setName("arrayLen")
    intExpr.addParseAction(countFieldParseAction, callDuringTry=True)
    return ( intExpr + arrayExpr ).setName('(len) ' + _ustr(expr) + '...')
    

    intExpr defaults to Word(nums) with a parse action to convert the numeric string to an int. You should be able to use countedArray as-is if you pass in an intExpr that parses varint and evaluates it in a parse action to the correct integer value.

    -- Paul

     

Log in to post a comment.