[Pyparsing] Speeding up a parse
Brought to you by:
ptmcg
From: <dav...@l-...> - 2008-09-24 23:23:26
|
All, We've got a data file that we use for parsing "stuff". Presently, this file is 80K lines long. Presently, this file takes about 3.3 minutes to parse, which is an awfully long time to wait for something like this. There are 122 rules for parsing this file, and unfortunately the syntax of the data within is not very strict. This leads to constructs such as: Interaction = \ Keyword("(Interaction") + \ INT_ID + \ INT_Name + \ INT_ISRType + \ OneOrMore( INT_MOMInteraction | INT_Description | INT_DeliveryCategory | INT_MessageOrdering | INT_RoutingSpace ) + \ ZeroOrMore(InteractionComponent) + \ ")"; Where the intent of the OneOrMore section, is: 1.) All are optional 2.) They may appear in any order I've also tried Each([Optional(...), Optional(...)]) without much speedup success. I'm pretty sure that these constructs are causing a significant amount of backtracking, but I'm not sure the best way to go about cleaning up the grammar. Also, I tried using psyco to speed up the parse, but I'm making use of "keepOriginalText" option within the setParseAction() call, so that I can get a copy of the original text within my parse action. This seems to break psyco, based on one of the imports that is done. So two things: 1.) Any grammar speed up rules for the above? 2.) Any ideas to get the orignal text, as well as make use of psyco? Thanks --dw |