Re: [Pyparsing] Incremental parsing with no gaps between parsed ranges?

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

Daniel Lenski <dlenski <at> gmail.com> writes:

> So I think I'm going to have to do incremental parsing in order to get
> reasonably fast feedback from the parser. Do you have any suggestions 
for
> how to do this? I'm trying to figure out if there's a good way to do 
greedy
> consumption of trailing (whitespace|comments) at the end of each valid
> top-level element.

I modified parseString very slightly and came up with 
parseConsumeString().

This version calls self._parse() followed by self.preParse() repeatedly 
to do what I want when self is the parser for a "top-level" item.

    def parseConsumeString(self, instring, parseAll=False, 
yieldLoc=True, loopResetCache=False):
        if not loopResetCache:
            ParserElement.resetCache()
        if not self.streamlined:
            self.streamline()
            #~ self.saveAsList = True
        for e in self.ignoreExprs:
            e.streamline()
        if not self.keepTabs:
            instring = instring.expandtabs()
        try:
            loc = 0
            while loc<len(instring):
                sloc = loc
                if loopResetCache:
                    ParserElement.resetCache()
                loc, tokens = self._parse(instring, loc)
                if yieldLoc:
                    yield tokens, sloc, loc
                else:
                    yield tokens
                loc = self.preParse(instring, loc)
        except ParseBaseException as exc:
            if not parseAll:
                return
            if ParserElement.verbose_stacktrace:
                raise
            else:
                # catch and re-raise exception from here, clears out 
pyparsing internal stack trace
                raise exc

By the way, I moved ParserElement.resetCache() into the loop, in order 
to drastically reduce memory consumption with packrat caching. Memory 
consumption goes down from around 6G peak to around 100M peak, while 
running about 15-20% faster. This is on a Core i7 980X with 8G of RAM, 
Win7.

    In [1]: import my_parser_module as P

    In [2]: sample=open("large_file).read()
    In [3]: len(sample)
    Out[3]: 9153816

    In [4]: %timeit -n1 for r in 
P.parseConsumeString(P.TopLevel.ignore(P.COMMENT), sample, True, True, 
True): pass
    1 loops, best of 3: 1min 10s per loop

    In [6]: %timeit -n1 for r in 
P.parseConsumeString(P.TopLevel.ignore(P.COMMENT), sample, True, True, 
False): pass
    1 loops, best of 3: 1min 22s per loop

Thanks,
Dan