Thread: [Pyparsing] Choppping up search terms
Brought to you by:
ptmcg
From: Donn I. <don...@gm...> - 2009-06-07 08:46:09
|
Hello pyparsers. As much as pyparsing astounds my small mind, I have yet to write my own 'grammar' without having to ask for help! And here I am again. I hope someone will have mercy on me. I am adding a search function to Fonty Python and this starts with chopping- up a string into tokens. I have a simple set of rules: phrase1 phrase2 field: value1 value2 "value three" field2: value1 etc Any word on it's own if it's NOT after a fieldname is a phrase alone. Anything after a fieldname is a sub-phrase of that field, unless it's another fieldname. I am just not getting anything like what I am looking for. \d My hacking and surfing and re-hacking have ended here: import pyparsing as PP #testing with vers 1.4.8 and 1.5.0 simple = PP.Word(PP.alphas) quoted = PP.dblQuotedString.setParseAction(PP.removeQuotes) single = PP.OneOrMore(simple | quoted) special = PP.Combine(PP.Word(PP.alphas) + ":") + single #term = special | single query = PP.OneOrMore( special | single )# + PP.StringEnd() <- buggy tests=[ u"WTF Huh aField: aValue", # want: [u'WTF', u'Huh', {u'aField:':[u'aValue']}] u'aField : "aValue blah" bloop AnotherField: two', # want: [{u'aField:':[u'aValue blah',u'bloop']},{u'AnotherField:':u'two'}] u"aField: Someval AnotherField: two" # want : [{u'aField:':[u'SomeVal']},{u'AnotherField:':u'two'}] ] for test in tests: print test try: tokens=query.parseString(test) print tokens except: print "BUG" |
From: Paul M. <pt...@au...> - 2009-06-07 15:35:25
|
Donn - The thing you are lacking is the use of pyparsing's lookahead class, FollowedBy. In this case, you want to detect the difference between a single (which I have redefined as just "simple | quoted", and let the query expression take care of the repetition) and a field identifier. In this case, you want a negative lookahead of ~FollowedBy(":"). single = simple | quoted COLON = PP.Suppress(':') fieldValue = single + ~PP.FollowedBy(COLON) field = single + COLON + PP.Group(PP.OneOrMore(fieldValue)) phrase = fieldValue phrases = PP.ZeroOrMore(phrase) query = PP.Optional(phrases) + PP.ZeroOrMore(field) To add the dict-style access to the fields, add some results names, and use the Dict class to auto-define results names for each field name. query = PP.Optional(phrases)("phrases") + PP.Dict(PP.ZeroOrMore(PP.Group(field)))("fields") Keep plugging! -- Paul > -----Original Message----- > From: Donn Ingle [mailto:don...@gm...] > Sent: Sunday, June 07, 2009 3:46 AM > To: pyp...@li... > Subject: [Pyparsing] Choppping up search terms > > Hello pyparsers. > As much as pyparsing astounds my small mind, I have yet to write my own > 'grammar' without having to ask for help! And here I am again. I hope > someone will have mercy on me. > > I am adding a search function to Fonty Python and this starts with > chopping- > up a string into tokens. I have a simple set of rules: > > phrase1 phrase2 field: value1 value2 "value three" field2: value1 etc > > Any word on it's own if it's NOT after a fieldname is a phrase alone. > Anything after a fieldname is a sub-phrase of that field, unless it's > another fieldname. > > I am just not getting anything like what I am looking for. > \d > > My hacking and surfing and re-hacking have ended here: > > import pyparsing as PP #testing with vers 1.4.8 and 1.5.0 > > simple = PP.Word(PP.alphas) > quoted = PP.dblQuotedString.setParseAction(PP.removeQuotes) > single = PP.OneOrMore(simple | quoted) > special = PP.Combine(PP.Word(PP.alphas) + ":") + single > > #term = special | single > query = PP.OneOrMore( special | single )# + PP.StringEnd() <- buggy > > tests=[ > u"WTF Huh aField: aValue", > # want: [u'WTF', u'Huh', {u'aField:':[u'aValue']}] > u'aField : "aValue blah" bloop AnotherField: two', > # want: [{u'aField:':[u'aValue blah',u'bloop']},{u'AnotherField:':u'two'}] > u"aField: Someval AnotherField: two" > # want : [{u'aField:':[u'SomeVal']},{u'AnotherField:':u'two'}] > ] > > for test in tests: > print test > try: > tokens=query.parseString(test) > print tokens > except: > print "BUG" > > > > -------------------------------------------------------------------------- > ---- > OpenSolaris 2009.06 is a cutting edge operating system for enterprises > looking to deploy the next generation of Solaris that includes the latest > innovations from Sun and the OpenSource community. Download a copy and > enjoy capabilities such as Networking, Storage and Virtualization. > Go to: http://p.sf.net/sfu/opensolaris-get > _______________________________________________ > Pyparsing-users mailing list > Pyp...@li... > https://lists.sourceforge.net/lists/listinfo/pyparsing-users |
From: Donn I. <don...@gm...> - 2009-06-07 16:19:41
|
Paul McGuire wrote: > Keep plugging! I am not sure what's more magical: pyparsing or your regular ability to nail a problem first time. Your code is dead-on, but I will have to spend some time to savvy it. Thanks once again. Regards, \d BTW - my little project that uses pyparsing (with your solutions, of course) has finally been released and is on https://savannah.nongnu.org/projects/things/ You helped me a few times in 2007/2008. Thanks for the code and the help. |