I have several LaTeX tags that I'm transforming. This is working fine, but the speed is really suffering the more tags I add.
For example, I set up one production grammar for index tags and one for heading tags. Then, I open a file, join it together as a single string for the contents to parse, and do something like this:
s = indextag.transformString(contents)
s = headings.transformString(s)
etc. etc. for all the tag types for which I've written a production.
For a file of 5,000 lines it's taking 1.25 minutes.
Am I crazy for doing it this way? Should I chain the productions together like
generictag = indextag | heading
and call transformString on generictag?
thanks, hope this makes sense.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Yes, definitely combine these tags into a single expression, and call transformString only once.
Next, you should look for ways to optimize the alternative matches. Do they all start with backslash? Then you can shortcircuit the parser checking using FollowedBy:
BSLASH = Literal("\\")
tagA = BSLASH + "a"
tagB = BSLASH + "b"
anyTag = tagA | tagB
betterAnyTag = FollowedBy("\\") + ( tagA | tagB )
evenBetterAnyTag = BSLASH + oneOf("a b")
-- Paul
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I have several LaTeX tags that I'm transforming. This is working fine, but the speed is really suffering the more tags I add.
For example, I set up one production grammar for index tags and one for heading tags. Then, I open a file, join it together as a single string for the contents to parse, and do something like this:
s = indextag.transformString(contents)
s = headings.transformString(s)
etc. etc. for all the tag types for which I've written a production.
For a file of 5,000 lines it's taking 1.25 minutes.
Am I crazy for doing it this way? Should I chain the productions together like
generictag = indextag | heading
and call transformString on generictag?
thanks, hope this makes sense.
Tim -
Yes, definitely combine these tags into a single expression, and call transformString only once.
Next, you should look for ways to optimize the alternative matches. Do they all start with backslash? Then you can shortcircuit the parser checking using FollowedBy:
BSLASH = Literal("\\")
tagA = BSLASH + "a"
tagB = BSLASH + "b"
anyTag = tagA | tagB
betterAnyTag = FollowedBy("\\") + ( tagA | tagB )
evenBetterAnyTag = BSLASH + oneOf("a b")
-- Paul