Re: [Pyparsing] Efficency of Keyword (and a couple other bits)
Brought to you by:
ptmcg
From: Corrin L. <Cor...@da...> - 2007-03-22 19:58:01
|
Slightly non-scientific since I didn't adjust for varying loads on the machine or disk caches, but running with and without sort both gave 85.6 processing time. Looks like the key was choosing to use the re module, and sorting didn't help. Interesting :) Corrin PS: matching the building name using ~(UNIT_TYPE) didn't work... Since that didn't give me the option to go .setResultsName. What I've got at the moment is: BUILDING =3D OneOrMore(Word(nameletters)).setParseAction(rejectBuildingName).setResul tsName("BuildingName") And the parse action is: def rejectBuildingName(string,loc,tokens):=20 """ Prevent building name of LWR GROUND and similar """ building_name =3D "" for token in tokens: if token =3D=3D self.SEPCHAR: if debug: print "Rejected the building name " + building_name raise ParseException(string,loc,"found a field seperator in the building") if building_name <> "": building_name +=3D " " building_name +=3D token if self.debug: print "Trying to reject the building name %s"%(building_name) if spare_parser <> None: r =3D None try: r =3D spare_parser.NOT_A_BUILDING.parseString(building_name) except ParseException, pe: r =3D r if r =3D=3D None: if debug: print "Looks like this building is not a floor or a unit" else: if debug: print "Rejected as this building looks like a floor or a unit" raise ParseException(string,loc,"Rejected %s as a building name - looks like a floor or a unit" % string) if debug: print "Looks like this building is okay" NOT_A_BUILDING =3D (UNIT_TYPE | FLOOR | BOX_LINE | BAG_LINE) It feels like a very roundabout way of doing it to me, though it seems to work well enough. -----Original Message----- From: Ralph Corderoy [mailto:ra...@in...]=20 Sent: Thursday, March 22, 2007 12:11 AM To: Corrin Lakeland Cc: pyp...@li... Subject: Re: [Pyparsing] Efficency of Keyword (and a couple other bits)=20 Hi Corrin, I'm glad you've got the speed-up you were after. Out of interest, how does the Regexp with the alternatives sorted by frequency compare with the Regexp with the alternatives sorted by reverse frequency? This would show if the re module is optimising without you needing to sort. Cheers, Ralph. |