Inject tokens at parse time

2010-11-03
2013-05-14
  • Paul Sorenson

    Paul Sorenson - 2010-11-03

    I am parsing lines like so:

    getdevice = startOfLine + Literal('name') + Word('f', alphas).setResultsName('devicename')

    Where the type of the device is implicit in the name.  What I would like to add is a means to add my own token to the parser (not source) which annotates the device type based on which expression has successfully matched.

    Conceptually:

    getdevice = startOfLine + Literal('name') + Word('f', alphas).setResultsName('devicename') + Empty('devtype1').setResultsName('devicetype')

    Now Empty() does not accept an arg and I haven't tried subclassing it (yet).  I have tried to inject something using a parseAction but so far have not been able to come up with a way to inject my token.  I feel sure I have overlooked something very trivial and would appreciate pointers on how to do this.

    cheers

     
  • Paul McGuire

    Paul McGuire - 2010-11-03

    Let's say if the first letter after 'f' in the name is a vowel, then devicetype should be "BOX", and if not, then devicetype should be "TRIANGLE".  You can then set the devicetype using this parse action, attached to getdevice:

    def setDeviceType(tokens):
        if tokens.lower() in "aeiou":
            tokens["devicetype"] = "BOX"
        else:
            tokens["devicetype"] = "TRIANGLE"
    

    This will modify the tokens in-place, adding your new devicetype results name.  No need to inject Empty's anywhere. (Note that this does not insert BOX or TRIANGLE in the list of returned tokens, but you can still access the value using the results name.)

    • Paul
     
  • Paul McGuire

    Paul McGuire - 2010-11-03

    Oof!  That is supposed to be:

    if tokens.lower()[1] in 'aeiou':
    

    Sorry!

     
  • Paul Sorenson

    Paul Sorenson - 2010-11-03

    Thanks for the reply.  I was trying stuff like that but I think I did something like Literal("BOX") and it was quoted.

    What I am currently using is:

    class EmptyTok(Empty):
        '''
        Always matches and inserts arbitrary token.
        '''
        def __init__(self, ltoken):
            self.ltoken = ltoken
            Empty.__init__(self)
            self.mayReturnEmpty = False
           
        def parseImpl(self, instring, loc, doActions=True ):
            return loc,

    Along the lines of what I was thinking of earlier.  Even though it may seem counter intuitive, I find it convenient to have the injection right there in the grammar rather than in a parse action where it is a little hidden away.

    Maybe I could have thought of a more descriptive name for my sub class.

    cheers

     
  • Paul McGuire

    Paul McGuire - 2011-01-29

    You could also try just using this little helper method, that would take care of the Empty() creation, parse action and everything all at once:

    def insertToken(t):
        return Empty().setParseAction(replaceWIth(t))
    

    Then just add to your grammar as:

    getdevice = startOfLine + 'name' + Word('f', alphas)('devicename') + insertToken('BOX')('devtype')
    
    • Paul
     
  • Paul Sorenson

    Paul Sorenson - 2011-01-29

    Thanks - won't be at work for a week or two but will give that a try.

     

Log in to post a comment.