Work at SourceForge, help us to make it a better place! We have an immediate need for a Support Technician in our San Francisco or Denver office.


chaining grammar and transformString

Tim Arnold
  • Tim Arnold
    Tim Arnold

    I have several LaTeX tags that I'm transforming. This is working fine, but the speed is really suffering the more tags I add.

    For example, I set up one production grammar for index tags and one for heading tags. Then, I open a file, join it together as a single string for the contents to parse, and do something like this:

    s = indextag.transformString(contents)
    s = headings.transformString(s)
    etc. etc. for all the tag types for which I've written a production.

    For a file of 5,000 lines it's taking 1.25 minutes.

    Am I crazy for doing it this way? Should I chain the productions together like
    generictag = indextag | heading
    and call transformString on generictag?

    thanks, hope this makes sense.

    • Paul McGuire
      Paul McGuire

      Tim -

      Yes, definitely combine these tags into a single expression, and call transformString only once.

      Next, you should look for ways to optimize the alternative matches. Do they all start with backslash?  Then you can shortcircuit the parser checking using FollowedBy:

      BSLASH = Literal("\\")
      tagA = BSLASH + "a"
      tagB = BSLASH + "b"

      anyTag = tagA | tagB
      betterAnyTag = FollowedBy("\\") + ( tagA | tagB )

      evenBetterAnyTag = BSLASH + oneOf("a b")

      -- Paul