Re: [Pyparsing] Efficency of Keyword (and a couple other bits)
Brought to you by:
ptmcg
From: Corrin L. <Cor...@da...> - 2007-03-20 22:53:21
|
Just a brief *wow*... I tried with the old method but with the keywords sorted by frequency so RD came first, then ST, etc. It made about a 10% improvement in speed. Then I tried using the Regex method and it made a 200% improvement in speed! Pretty amazing to see :) Corrin -----Original Message----- From: Ralph Corderoy [mailto:ra...@in...]=20 Sent: Wednesday, March 21, 2007 1:23 AM To: Corrin Lakeland Cc: pyp...@li... Subject: Re: [Pyparsing] Efficency of Keyword (and a couple other bits)=20 Paul wrote: > We need the Regex to treat these as keywords, so we will surround the=20 > alternatives with the re "word break" indicator "\b". We don't want=20 > this to be used just for the first and last alternatives, so we'll=20 > enclose the alternatives in non-grouping parens, (?:...). This gives=20 > us a re string of: >=20 > r"\b(?:%s)\b" % "|".join( x[0] for x in cursor.fetchall() ) I don't know if the re module optimises r'\b(?:apple|banana|cherry)\b' where each of the alternatives is a literal. If not, then there's gains to be made by concocting a better regexp by noting that the matching tries each alternative in turn, failing when on the first character that doesn't match the current alternative. If you've 500 alternatives this is on average going to test 250 alternatives assuming they occur with equal frequency. |