## [Pyparsing] Lists and Groups

 [Pyparsing] Lists and Groups From: Michael Farnbach - 2010-11-12 18:26:25 ```Hello all, First, thanks for the wonderful programming API. I'm having a long of fun with it. I have a question, I feel like I'm ruining the BNF expression when overly massage the input. Should I stick to BNF or start tokenizing? Allow me to explain. I've always been fascinated with interfaces which seem to understand human language, kind of like gmail's "quick-add" featurer for its calender. So, to learn how to do this better, I'm writing a little diddy to interpret simple commands into creating a shopping list. Here is a list of generic input I expect it to understand #test grammar tests= """\ buy chocolate milk, eggs and bread at store get eggs at @Vons need headache medicine and cough medicine from downtown drug store buy nails, claw hammer, and studs at @homedepot buy ragjoint at car parts store buy laptop at electronics store get candy @Sees get a clue""".splitlines() My evil mashup is as follows... # define grammar KW = Keyword and_ = Literal(" and ") comma = Literal(",") at = KW("from") | KW("at") label = Literal("@") + Word(alphanums) buy = KW("buy") | KW("get") | KW("need") item = Group( OneOrMore( ~ (at | and_) + (Word(alphas + "-"))) + Suppress(Optional(comma))) | Suppress(Optional(and_)) items = Group( OneOrMore( item ) ) store = KW("store") | KW("market") | KW("grocery") storetype = Suppress(at) + Group(OneOrMore( Word(alphas))) storelabel = label storename = storetype | storelabel shlisti = Suppress(buy) + items + Optional( storename ) What I wish to do is interpret a sentence starts with some synonym of "buy", interprets everything after that as a list of things to buy (accepting and as a synonym for a comma) until it reaches some synonym of "from", and then everything after that is interpreted as a store name. All I want to keep is the list of items, and the storename. What I wind up doing is as follows: buy chocolate milk, eggs and bread at store -> [[['chocolate', 'milk']]] get eggs at @Vons -> [[['eggs']]] need headache medicine and cough medicine from downtown drug store -> [[['headache', 'medicine', 'and', 'cough', 'medicine']], ['downtown', 'drug', 'store']] buy nails, claw hammer, and studs at @homedepot -> [[['nails']]] buy ragjoint at car parts store -> [[['ragjoint']], ['car', 'parts', 'store']] buy laptop at electronics store -> [[['laptop']], ['electronics', 'store']] get candy @Sees -> [[['candy']], '@', 'Sees'] get a clue -> [[['a', 'clue']]] Is there a way to express it in the nice BNF form or did I already jump off that reservation with the grouping and suppressing? ```

 [Pyparsing] Lists and Groups From: Michael Farnbach - 2010-11-12 18:26:25 ```Hello all, First, thanks for the wonderful programming API. I'm having a long of fun with it. I have a question, I feel like I'm ruining the BNF expression when overly massage the input. Should I stick to BNF or start tokenizing? Allow me to explain. I've always been fascinated with interfaces which seem to understand human language, kind of like gmail's "quick-add" featurer for its calender. So, to learn how to do this better, I'm writing a little diddy to interpret simple commands into creating a shopping list. Here is a list of generic input I expect it to understand #test grammar tests= """\ buy chocolate milk, eggs and bread at store get eggs at @Vons need headache medicine and cough medicine from downtown drug store buy nails, claw hammer, and studs at @homedepot buy ragjoint at car parts store buy laptop at electronics store get candy @Sees get a clue""".splitlines() My evil mashup is as follows... # define grammar KW = Keyword and_ = Literal(" and ") comma = Literal(",") at = KW("from") | KW("at") label = Literal("@") + Word(alphanums) buy = KW("buy") | KW("get") | KW("need") item = Group( OneOrMore( ~ (at | and_) + (Word(alphas + "-"))) + Suppress(Optional(comma))) | Suppress(Optional(and_)) items = Group( OneOrMore( item ) ) store = KW("store") | KW("market") | KW("grocery") storetype = Suppress(at) + Group(OneOrMore( Word(alphas))) storelabel = label storename = storetype | storelabel shlisti = Suppress(buy) + items + Optional( storename ) What I wish to do is interpret a sentence starts with some synonym of "buy", interprets everything after that as a list of things to buy (accepting and as a synonym for a comma) until it reaches some synonym of "from", and then everything after that is interpreted as a store name. All I want to keep is the list of items, and the storename. What I wind up doing is as follows: buy chocolate milk, eggs and bread at store -> [[['chocolate', 'milk']]] get eggs at @Vons -> [[['eggs']]] need headache medicine and cough medicine from downtown drug store -> [[['headache', 'medicine', 'and', 'cough', 'medicine']], ['downtown', 'drug', 'store']] buy nails, claw hammer, and studs at @homedepot -> [[['nails']]] buy ragjoint at car parts store -> [[['ragjoint']], ['car', 'parts', 'store']] buy laptop at electronics store -> [[['laptop']], ['electronics', 'store']] get candy @Sees -> [[['candy']], '@', 'Sees'] get a clue -> [[['a', 'clue']]] Is there a way to express it in the nice BNF form or did I already jump off that reservation with the grouping and suppressing? ```
 Re: [Pyparsing] Lists and Groups From: Paul McGuire - 2010-11-14 03:25:48 ```I don't think you've gone overboard here. Your BNF *will* be somewhat informal, but don't give up on it. Your basic form is: (get) (stuff) (at) (location) Given that you have some entries where you just get stuff, or specify a location just based on a leading '@' symbol, this gets a little more complex, (using []'s for optional parts: (get) (stuff) [[(at)] (location)] (stuff) and (location) are going to be pretty unstructured, but fortunately, you're defining some specific forms for (get) and (at). Your (stuff) can contain a list of items separated by commas, (and), or (, and), so I think you can define it as pretty open for (item), and then define a delimited list for the list of items. You'll need to specify the lookahead (as you already picked up in your posted pyparsing code) to avoid parsing "at" or "and" or a store label with a leading '@' as an item word. And using Keyword's for your delimiting words is a good choice, to guard against accidentally reading the leading 'at' in 'athletic socks' as 'at', and the remaining 'hletic socks' as some kind of store. So I'd informally write this as: shopping-list ::= get stuff [[at] location] get ::= "get" | "buy" | "pick up" at ::= "at" | "from" item ::= (~(and | at | '@') Word(alphas+'-'))... stuff ::= item [ (', and' | ',' | 'and') item ]... location ::= ['@'] Word(alphas)... Rendered into pyparsing, it ends up very similar to your posted code: COMMA,AT = map(Literal, ',@') KW = CaselessKeyword get = KW('get') | KW('buy') | KW('pick') + KW('up') | KW('need') at = KW('at') | KW('from') location = Combine(Optional(AT) + OneOrMore(Word(alphas)), ' ', adjacent=False) and_ = KW('and') itemdelim = COMMA + and_ | COMMA | and_ item = Combine(OneOrMore(~(at | itemdelim | '@') + Word(alphas)), ' ', adjacent=False) stuff = delimitedList(item, itemdelim) shoppingList = get + stuff("items") + Optional(Optional(at) + location("location")) And this seems to work ok for your posted tests. Please try to avoid defining Literals with embedded spaces, and *especially* not with leading spaces (as you do with " and ") - pyparsing's default whitespace skipping will almost surely make your leading-whitespace literal unmatchable. Note how I defined "pick up" as an option for (get) as two joined keywords - this immunizes us against cases with extra whitespace between the two words, at very little cost. Thanks for writing, and welcome to the World of Pyparsing! - :) -- Paul ```
 Re: [Pyparsing] Lists and Groups From: Michael Farnbach - 2010-11-15 16:39:48 ```Thanks! I've confirmed it works, even after suppressing 'get' and 'AT', and grouping the shopping list. --Michael On Sat, Nov 13, 2010 at 8:25 PM, Paul McGuire wrote: > I don't think you've gone overboard here. Your BNF *will* be somewhat > informal, but don't give up on it. > > Your basic form is: > > (get) (stuff) (at) (location) > > Given that you have some entries where you just get stuff, or specify a > location just based on a leading '@' symbol, this gets a little more > complex, (using []'s for optional parts: > > (get) (stuff) [[(at)] (location)] > > (stuff) and (location) are going to be pretty unstructured, but > fortunately, > you're defining some specific forms for (get) and (at). > > Your (stuff) can contain a list of items separated by commas, (and), or (, > and), so I think you can define it as pretty open for (item), and then > define a delimited list for the list of items. You'll need to specify the > lookahead (as you already picked up in your posted pyparsing code) to avoid > parsing "at" or "and" or a store label with a leading '@' as an item word. > And using Keyword's for your delimiting words is a good choice, to guard > against accidentally reading the leading 'at' in 'athletic socks' as 'at', > and the remaining 'hletic socks' as some kind of store. > > So I'd informally write this as: > > shopping-list ::= get stuff [[at] location] > get ::= "get" | "buy" | "pick up" > at ::= "at" | "from" > item ::= (~(and | at | '@') Word(alphas+'-'))... > stuff ::= item [ (', and' | ',' | 'and') item ]... > location ::= ['@'] Word(alphas)... > > Rendered into pyparsing, it ends up very similar to your posted code: > > COMMA,AT = map(Literal, ',@') > KW = CaselessKeyword > get = KW('get') | KW('buy') | KW('pick') + KW('up') | KW('need') > at = KW('at') | KW('from') > location = Combine(Optional(AT) + OneOrMore(Word(alphas)), ' ', > adjacent=False) > and_ = KW('and') > itemdelim = COMMA + and_ | COMMA | and_ > item = Combine(OneOrMore(~(at | itemdelim | '@') + Word(alphas)), ' ', > adjacent=False) > stuff = delimitedList(item, itemdelim) > shoppingList = get + stuff("items") + Optional(Optional(at) + > location("location")) > > And this seems to work ok for your posted tests. > > Please try to avoid defining Literals with embedded spaces, and > *especially* > not with leading spaces (as you do with " and ") - pyparsing's default > whitespace skipping will almost surely make your leading-whitespace literal > unmatchable. Note how I defined "pick up" as an option for (get) as two > joined keywords - this immunizes us against cases with extra whitespace > between the two words, at very little cost. > > Thanks for writing, and welcome to the World of Pyparsing! - :) > > -- Paul > > > ```