In our project we implement several parsers with pyparsing. For some of the more complex parsers setting:
pyparsing.ParserElement.enablePackrat()
does result in speed up but for some simple parsers it results in significant slow downs and worse yet, significant memory usage. For example this simple parser:
is used on a large file with C code (approximately 1.7mb - 50kloc). Without packrat it completes in about 20 seconds but with packrat it takes 1 min 15 seconds (almost 4 times longer). It also uses several GB of memory. Looking at the code for packrat it implements a simple memoization scheme:
# this method gets repeatedly called during backtracking with the same arguments -
# we can cache these arguments and save ourselves the trouble of re-parsing the contained expression
def _parseCache( self, instring, loc, doActions=True, callPreParse=True ):
lookup = (self,instring,loc,callPreParse,doActions)
if lookup in ParserElement._exprArgCache:
value = ParserElement._exprArgCache[ lookup ]
if isinstance(value, Exception):
raise value
return (value[0],value[1].copy())
else:
try:
value = self._parseNoCache( instring, loc, doActions, callPreParse )
ParserElement._exprArgCache[ lookup ] = (value[0],value[1].copy())
return value
except ParseBaseException as pe:
pe.__traceback__ = None
ParserElement._exprArgCache[ lookup ] = pe
raise
There are several issues with this code:
There is a way to enable the cache but there is no official way to disable it. I propose adding a new method:
def disablePackrat():
ParserElement._packratEnabled = False
ParserElement._parse = ParserElement._parseNoCache
but this is a stop gap because the real problem is that the cache is global. Actually thinking about it the cache is only useful towards the top of the parse tree in the case where there are repeated identical tokens shared by many subbranches. As one gets to the leafs of the parse tree (which backtrack a lot more) the cache becomes more expensive (since there will rarely be a cache hit). It might make sense to allow the node itself to carry its own cache state (so like setDebug() maybe we need setCache() ).
Please look at the recent upgrades to packrat parsing's implementation.
Packrat parsing has undergone major rewrite/improvements