Re: [Pyparsing] Efficency of Keyword (and a couple other bits)

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

Just a brief *wow*...

I tried with the old method but with the keywords sorted by frequency so
RD came first, then ST, etc.  It made about a 10% improvement in speed.
Then I tried using the Regex method and it made a 200% improvement in
speed!   Pretty amazing to see :)

Corrin
-----Original Message-----
From: Ralph Corderoy [mailto:ra...@in...]=20
Sent: Wednesday, March 21, 2007 1:23 AM
To: Corrin Lakeland
Cc: pyp...@li...
Subject: Re: [Pyparsing] Efficency of Keyword (and a couple other bits)=20

Paul wrote:
> We need the Regex to treat these as keywords, so we will surround the=20
> alternatives with the re "word break" indicator "\b".  We don't want=20
> this to be used just for the first and last alternatives, so we'll=20
> enclose the alternatives in non-grouping parens, (?:...).  This gives=20
> us a re string of:
>=20
>     r"\b(?:%s)\b" % "|".join( x[0] for x in cursor.fetchall() )

I don't know if the re module optimises r'\b(?:apple|banana|cherry)\b'
where each of the alternatives is a literal.  If not, then there's gains
to be made by concocting a better regexp by noting that the matching
tries each alternative in turn, failing when on the first character that
doesn't match the current alternative.  If you've 500 alternatives this
is on average going to test 250 alternatives assuming they occur with
equal frequency.