Re: [Pyparsing] Incremental parsing with no gaps between parsed ranges?
Brought to you by:
ptmcg
From: Dan L. <dl...@gm...> - 2014-10-27 19:09:06
|
Daniel Lenski <dlenski <at> gmail.com> writes: > So I think I'm going to have to do incremental parsing in order to get > reasonably fast feedback from the parser. Do you have any suggestions for > how to do this? I'm trying to figure out if there's a good way to do greedy > consumption of trailing (whitespace|comments) at the end of each valid > top-level element. I modified parseString very slightly and came up with parseConsumeString(). This version calls self._parse() followed by self.preParse() repeatedly to do what I want when self is the parser for a "top-level" item. def parseConsumeString(self, instring, parseAll=False, yieldLoc=True, loopResetCache=False): if not loopResetCache: ParserElement.resetCache() if not self.streamlined: self.streamline() #~ self.saveAsList = True for e in self.ignoreExprs: e.streamline() if not self.keepTabs: instring = instring.expandtabs() try: loc = 0 while loc<len(instring): sloc = loc if loopResetCache: ParserElement.resetCache() loc, tokens = self._parse(instring, loc) if yieldLoc: yield tokens, sloc, loc else: yield tokens loc = self.preParse(instring, loc) except ParseBaseException as exc: if not parseAll: return if ParserElement.verbose_stacktrace: raise else: # catch and re-raise exception from here, clears out pyparsing internal stack trace raise exc By the way, I moved ParserElement.resetCache() into the loop, in order to drastically reduce memory consumption with packrat caching. Memory consumption goes down from around 6G peak to around 100M peak, while running about 15-20% faster. This is on a Core i7 980X with 8G of RAM, Win7. In [1]: import my_parser_module as P In [2]: sample=open("large_file).read() In [3]: len(sample) Out[3]: 9153816 In [4]: %timeit -n1 for r in P.parseConsumeString(P.TopLevel.ignore(P.COMMENT), sample, True, True, True): pass 1 loops, best of 3: 1min 10s per loop In [6]: %timeit -n1 for r in P.parseConsumeString(P.TopLevel.ignore(P.COMMENT), sample, True, True, False): pass 1 loops, best of 3: 1min 22s per loop Thanks, Dan |