pyparsing-users Mailing List for Python parsing module (Page 23)
Brought to you by:
ptmcg
You can subscribe to this list here.
2004 |
Jan
|
Feb
|
Mar
(1) |
Apr
|
May
(1) |
Jun
|
Jul
|
Aug
(2) |
Sep
|
Oct
|
Nov
(2) |
Dec
|
---|---|---|---|---|---|---|---|---|---|---|---|---|
2005 |
Jan
(2) |
Feb
|
Mar
(2) |
Apr
(12) |
May
(2) |
Jun
|
Jul
|
Aug
(12) |
Sep
|
Oct
(1) |
Nov
|
Dec
|
2006 |
Jan
(5) |
Feb
(1) |
Mar
(10) |
Apr
(3) |
May
(7) |
Jun
(2) |
Jul
(2) |
Aug
(7) |
Sep
(8) |
Oct
(17) |
Nov
|
Dec
(3) |
2007 |
Jan
(4) |
Feb
|
Mar
(10) |
Apr
|
May
(6) |
Jun
(11) |
Jul
(1) |
Aug
|
Sep
(19) |
Oct
(8) |
Nov
(32) |
Dec
(8) |
2008 |
Jan
(12) |
Feb
(6) |
Mar
(42) |
Apr
(47) |
May
(17) |
Jun
(15) |
Jul
(7) |
Aug
(2) |
Sep
(13) |
Oct
(6) |
Nov
(11) |
Dec
(3) |
2009 |
Jan
(2) |
Feb
(3) |
Mar
|
Apr
|
May
(11) |
Jun
(13) |
Jul
(19) |
Aug
(17) |
Sep
(8) |
Oct
(3) |
Nov
(7) |
Dec
(1) |
2010 |
Jan
(2) |
Feb
|
Mar
(19) |
Apr
(6) |
May
|
Jun
(2) |
Jul
|
Aug
(1) |
Sep
|
Oct
(4) |
Nov
(3) |
Dec
(2) |
2011 |
Jan
(4) |
Feb
|
Mar
(5) |
Apr
(1) |
May
(3) |
Jun
(8) |
Jul
(6) |
Aug
(8) |
Sep
(35) |
Oct
(1) |
Nov
(1) |
Dec
(2) |
2012 |
Jan
(2) |
Feb
|
Mar
(3) |
Apr
(4) |
May
|
Jun
(1) |
Jul
|
Aug
(6) |
Sep
(18) |
Oct
|
Nov
(1) |
Dec
|
2013 |
Jan
(7) |
Feb
(7) |
Mar
(1) |
Apr
(4) |
May
|
Jun
|
Jul
(1) |
Aug
(5) |
Sep
(3) |
Oct
(11) |
Nov
(3) |
Dec
|
2014 |
Jan
(3) |
Feb
(1) |
Mar
|
Apr
(6) |
May
(10) |
Jun
(4) |
Jul
|
Aug
(5) |
Sep
(2) |
Oct
(4) |
Nov
(1) |
Dec
|
2015 |
Jan
|
Feb
|
Mar
|
Apr
(13) |
May
(1) |
Jun
|
Jul
(2) |
Aug
|
Sep
(9) |
Oct
(2) |
Nov
(11) |
Dec
(2) |
2016 |
Jan
|
Feb
(3) |
Mar
(2) |
Apr
|
May
|
Jun
|
Jul
(3) |
Aug
|
Sep
|
Oct
(1) |
Nov
(1) |
Dec
(4) |
2017 |
Jan
(2) |
Feb
(2) |
Mar
(2) |
Apr
|
May
|
Jun
|
Jul
(4) |
Aug
|
Sep
|
Oct
(4) |
Nov
(3) |
Dec
|
2018 |
Jan
(10) |
Feb
|
Mar
(1) |
Apr
|
May
|
Jun
(1) |
Jul
|
Aug
|
Sep
|
Oct
(2) |
Nov
|
Dec
|
2019 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
(2) |
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2020 |
Jan
|
Feb
(1) |
Mar
|
Apr
|
May
(1) |
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2022 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
(1) |
2023 |
Jan
|
Feb
|
Mar
|
Apr
(1) |
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2024 |
Jan
|
Feb
(1) |
Mar
|
Apr
(1) |
May
|
Jun
|
Jul
(1) |
Aug
(3) |
Sep
(1) |
Oct
(1) |
Nov
|
Dec
|
From: Joshua J. K. <jk...@sk...> - 2008-03-06 20:47:20
|
On Wed, 2008-03-05 at 21:48 -0600, Paul McGuire wrote: > Ok, here are the changes I made to your code: Wow...just...wow. Really, I wasn't asking anyone to write my parser for me. I just wanted to know how to do it. Wow. Thanks you!! > I changed column_def to: > > col_width_def = (p.Suppress('(') + > p.delimitedList(p.Word(p.nums)).setParseAction( lambda toks: > [int(tok) for tok in toks]) + > p.Suppress(')') > ) That casting to int didn't matter much since we'll be just doing string output, but it does validate the input. :) > paren_quoted was just not the expression, and the "(p.nums + p.Optional(',' > + p.nums)" argument *actually* created a results name for the column_width > field. Gotcha. > I changed create_table to: > > create_table = (p.CaselessKeyword('create').suppress() > + p.CaselessKeyword('table').suppress() > + bracket_quoted.setResultsName('schema') > + "." + bracket_quoted.setResultsName('table_name') > #~ + p.nestedExpr(content=p.delimitedList(p.Or( > #~ [p.Group(column_def.setResultsName('columns')), > #~ p.Group(primary_key), > #~ p.Group(constraint)]))) > + p.Group(p.Suppress('(') + > p.delimitedList(p.Or( > [p.Group(column_def).setResultsName('columns', > listAllMatches=True), > p.Group(primary_key).setResultsName('pkeys', > listAllMatches=True), > p.Group(constraint).setResultsName('constraints', > listAllMatches=True)] > )) > + p.Suppress(')') > )("defs") > + p.CaselessKeyword('on').suppress() > + bracket_quoted.suppress() > ) > > There was no need to use nestedExpr for the list of column/key/constraint > items. Note the use of listAllMatches in the setResultsName calls. Saw that. That makes sense. So, it doesn't change the asList() output any, but I have attributes I can access by name now. Cool. One question: ("defs") is a call to what Groups() returns, but I'm not following. Can you point me to something in the docs that explains what is done when you call a pyparsing expression? Again, thanks so much! I'm much further along to where I need to be. j -- Joshua Kugler VOC/SigNet Provider (aka Web App Programmer) S&K Aerospace Alaska |
From: Paul M. <pa...@al...> - 2008-03-06 03:48:20
|
Ok, here are the changes I made to your code: I changed column_def to: col_width_def = (p.Suppress('(') + p.delimitedList(p.Word(p.nums)).setParseAction( lambda toks: [int(tok) for tok in toks]) + p.Suppress(')') ) column_def = (bracket_quoted.setResultsName('column_name') + bracket_quoted.setResultsName('column_type') + p.Optional(p.CaselessKeyword('IDENTITY')).setResultsName('identity') + #~ p.Optional(paren_quoted(p.nums + p.Optional(',' + p.nums))).setResultsName('column_width') + p.Optional(col_width_def).setResultsName('column_width') + nullable ) paren_quoted was just not the expression, and the "(p.nums + p.Optional(',' + p.nums)" argument *actually* created a results name for the column_width field. I changed create_table to: create_table = (p.CaselessKeyword('create').suppress() + p.CaselessKeyword('table').suppress() + bracket_quoted.setResultsName('schema') + "." + bracket_quoted.setResultsName('table_name') #~ + p.nestedExpr(content=p.delimitedList(p.Or( #~ [p.Group(column_def.setResultsName('columns')), #~ p.Group(primary_key), #~ p.Group(constraint)]))) + p.Group(p.Suppress('(') + p.delimitedList(p.Or( [p.Group(column_def).setResultsName('columns', listAllMatches=True), p.Group(primary_key).setResultsName('pkeys', listAllMatches=True), p.Group(constraint).setResultsName('constraints', listAllMatches=True)] )) + p.Suppress(')') )("defs") + p.CaselessKeyword('on').suppress() + bracket_quoted.suppress() ) There was no need to use nestedExpr for the list of column/key/constraint items. Note the use of listAllMatches in the setResultsName calls. Here is the code to print out the parsed results: dbData = create_table.parseString(sql) pprint.pprint(dbData.asList()) print print dbData.dump() print print "Columns" for c in dbData.defs.columns: print " Column: " + c.column_name print c.dump(indent=" ") print print "Primary Keys" for pk in dbData.defs.pkeys: print pk.asList() print print "Constraints" for c in dbData.defs.constraints: print "Constraint: " + c.constraint_name print c.asList() Note the use of dump() for quick listing of tokens, and any named fields. The output from this code is: ['dbo', '.', 'auth_login', [['login_id', 'int', 'IDENTITY', 1, 1, 'not null'], ['login', 'varchar', 255, 'null'], ['password_hash', 'varchar', 255, 'null'], ['login_name', 'varchar', 255, 'null'], ['paid_us_good', 'money', 'null'], ['asdfsdafsadf', 'nchar', 10, 'null'], ['primary key', 'clustered', ['login_id']], ['constraint', 'IX_auth_login', 'unique', 'nonclustered', ['login']]]] ['dbo', '.', 'auth_login', [['login_id', 'int', 'IDENTITY', 1, 1, 'not null'], ['login', 'varchar', 255, 'null'], ['password_hash', 'varchar', 255, 'null'], ['login_name', 'varchar', 255, 'null'], ['paid_us_good', 'money', 'null'], ['asdfsdafsadf', 'nchar', 10, 'null'], ['primary key', 'clustered', ['login_id']], ['constraint', 'IX_auth_login', 'unique', 'nonclustered', ['login']]]] - defs: [['login_id', 'int', 'IDENTITY', 1, 1, 'not null'], ['login', 'varchar', 255, 'null'], ['password_hash', 'varchar', 255, 'null'], ['login_name', 'varchar', 255, 'null'], ['paid_us_good', 'money', 'null'], ['asdfsdafsadf', 'nchar', 10, 'null'], ['primary key', 'clustered', ['login_id']], ['constraint', 'IX_auth_login', 'unique', 'nonclustered', ['login']]] - columns: [['login_id', 'int', 'IDENTITY', 1, 1, 'not null'], ['login', 'varchar', 255, 'null'], ['password_hash', 'varchar', 255, 'null'], ['login_name', 'varchar', 255, 'null'], ['paid_us_good', 'money', 'null'], ['asdfsdafsadf', 'nchar', 10, 'null']] - constraints: [['constraint', 'IX_auth_login', 'unique', 'nonclustered', ['login']]] - pkeys: [['primary key', 'clustered', ['login_id']]] - schema: dbo - table_name: auth_login Columns Column: login_id ['login_id', 'int', 'IDENTITY', 1, 1, 'not null'] - column_name: login_id - column_type: int - column_width: [1, 1] - identity: IDENTITY Column: login ['login', 'varchar', 255, 'null'] - column_name: login - column_type: varchar - column_width: [255] Column: password_hash ['password_hash', 'varchar', 255, 'null'] - column_name: password_hash - column_type: varchar - column_width: [255] Column: login_name ['login_name', 'varchar', 255, 'null'] - column_name: login_name - column_type: varchar - column_width: [255] Column: paid_us_good ['paid_us_good', 'money', 'null'] - column_name: paid_us_good - column_type: money Column: asdfsdafsadf ['asdfsdafsadf', 'nchar', 10, 'null'] - column_name: asdfsdafsadf - column_type: nchar - column_width: [10] Primary Keys ['primary key', 'clustered', ['login_id']] Constraints Constraint: IX_auth_login ['constraint', 'IX_auth_login', 'unique', 'nonclustered', ['login']] -- Paul |
From: Paul M. <pt...@au...> - 2008-03-06 03:26:46
|
Joshua - <lightbulb>Ah, now I see why you are confused!</lightbulb> The Group class does *not* do grouping like you see in SQL GROUP BY or the itertools groupby. Group is a way for a grammar developer to impart structure to the parsed tokens. The default behavior is for all the tokens to be returned in one flat list. Here is an example: testData = "123 abc 234 def 789 456 xyz" entry = Word(nums) + Optional(Word(alphas)) grammar = OneOrMore(entry) print grammar.parseString(testData).dump() Prints: ['123', 'abc', '234', 'def', '789', '456', 'xyz'] Now we will use Group to "group" the tokens for each entry: entry = Group( Word(nums) + Optional(Word(alphas)) ) Prints: [['123', 'abc'], ['234', 'def'], ['789'], ['456', 'xyz']] You are looking for something a little different. Pyparsing has a feature to implement it, but I've not seen it get much use. It is a qualifier on setResultsName. Before I describe the qualifier itself, let me again show you the default behavior. entry = Word(nums).setResultsName("int") + Optional(Word(alphas).setResultsName("word")) prints: ['123', 'abc', '234', 'def', '789', '456', 'xyz'] - int: 456 - word: xyz The last matching string is the one that is saved for the given results name. But sometimes, you want to keep *all* the matching strings. This is done using the listAllMatches qualifier: entry = Word(nums).setResultsName("int",listAllMatches=True) + Optional(Word(alphas).setResultsName("word",listAllMatches=True)) prints: ['123', 'abc', '234', 'def', '789', '456', 'xyz'] - int: ['123', '234', '789', '456'] - word: ['abc', 'def', 'xyz'] That's it for now, *this* is a long e-mail! :) I'll post the mods to your code in the next e-mail. -- Paul |
From: Joshua J. K. <jk...@sk...> - 2008-03-06 02:13:19
|
On Wed, 2008-03-05 at 17:08 -0900, Joshua J. Kugler wrote: > I was expecting instant response. Sigh...I *wasn't* expecting. j -- Joshua Kugler VOC/SigNet Provider (aka Web App Programmer) S&K Aerospace Alaska |
From: Joshua J. K. <jk...@sk...> - 2008-03-06 02:13:19
|
On Wed, 2008-03-05 at 20:05 -0600, Paul McGuire wrote: > Hah! You call that a long e-mail? You should see some of my responses on > the pyparsing wiki (see the Discussion tab on the Home page)! :) I'll take a look at those. > I tried to run this code, but you left out some important elements, so I > couldn't recreate your problem. On the face of it, your Group calls look > ok. Could you paste your code to http://pyparsing.pastebin.com, and I'll > give it a look? Sure...I didn't think about pastebin, and didn't want to flood the group with my code. See my code at: http://pyparsing.pastebin.com/m47c772c9 You'll notice that column defs, primary keys, and constraints are all in element [3] of the nested list returned from the asList() call. > Welcome to pyparsing! Thanks! j -- Joshua Kugler VOC/SigNet Provider (aka Web App Programmer) S&K Aerospace Alaska |
From: Joshua J. K. <jk...@sk...> - 2008-03-06 02:07:37
|
On Wed, 2008-03-05 at 20:02 -0600, Paul McGuire wrote: > Sorry not to respond sooner, I'm glad you were able to work this out on your > own. I assume you modified the Combine constructor using adjacent=False, to > accept whitespace between tokens. No worries about the delay. I was expecting instant response. Yes, I used ajacent=False and it worked nicely. Thanks. j -- Joshua Kugler VOC/SigNet Provider (aka Web App Programmer) S&K Aerospace Alaska |
From: Paul M. <pt...@au...> - 2008-03-06 02:05:17
|
Hah! You call that a long e-mail? You should see some of my responses on the pyparsing wiki (see the Discussion tab on the Home page)! :) I tried to run this code, but you left out some important elements, so I couldn't recreate your problem. On the face of it, your Group calls look ok. Could you paste your code to http://pyparsing.pastebin.com, and I'll give it a look? Welcome to pyparsing! -- Paul -----Original Message----- From: pyp...@li... [mailto:pyp...@li...] On Behalf Of Joshua J. Kugler Sent: Wednesday, March 05, 2008 3:31 PM To: pyp...@li... Subject: [Pyparsing] Grouping when using asList() Warning: long First: thank for the great package! It greatly simplifies my life. :) I'm working on parsing some SQL create statements from MS SQL (sigh, I know, but it's what we're stuck with at the moment). Here is the create: CREATE TABLE [dbo].[auth_login] ( [login_id] [int] IDENTITY (1, 1) NOT NULL , [login] [varchar] (255) NULL , [password_hash] [varchar] (255) NULL , [login_name] [varchar] (255) NULL , PRIMARY KEY CLUSTERED ( [login_id] ) ON [PRIMARY] , CONSTRAINT [IX_auth_login] UNIQUE NONCLUSTERED ( [login] ) ON [PRIMARY] ) ON [PRIMARY] You'll notice that the column definitions as well as the primary key and constraints are all within one enclosing pair of parentheses. My code is as such: create_table = (p.CaselessKeyword('create').suppress() + p.CaselessKeyword('table').suppress() + bracket_quoted.setResultsName('schema') + "." + bracket_quoted.setResultsName('table_name') + p.nestedExpr(content=p.delimitedList(p.Or( [p.Group(column_def.setResultsName('columns')), p.Group(primary_key), p.Group(constraint)]))) + p.CaselessKeyword('on').suppress() + bracket_quoted.suppress() ) p denotes the pyparsing module (import pyparsing as p) bracket_quoted is QuotedString using '[' and ']' column_def, primary_key, and constraint are all pyparsing expressions. So, at any rate, running that code and outputting via asList() gives me: ['dbo', '.', 'auth_login', [['login_id', 'int', 'IDENTITY', '1, 1', 'not', 'null'], ['login', 'varchar', '255', 'null'], ['password_hash', 'varchar', '255', 'null'], ['login_name', 'varchar', '255', 'null'], ['paid_us_good', 'money', 'null'], ['asdfsdafsadf', 'nchar', '10', 'null'], ['primary', 'key', 'clustered', ['login_id']], ['constraint', 'IX_auth_login', 'unique', 'nonclustered', ['login']]]] The column defs are in the same list element as the primary key and constraint. Apparently I'm not understanding the Group class. What can I do to put each of those three things (column defs, primary key defs, and constraint defs) in their own list elements? I want to know that the elements of l[3] are columns, l[4] are primary keys, or similar. Thanks! j -- Joshua Kugler VOC/SigNet Provider (aka Web App Programmer) S&K Aerospace Alaska ------------------------------------------------------------------------- This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ _______________________________________________ Pyparsing-users mailing list Pyp...@li... https://lists.sourceforge.net/lists/listinfo/pyparsing-users |
From: Paul M. <pa...@al...> - 2008-03-06 02:02:24
|
Joshua - Sorry not to respond sooner, I'm glad you were able to work this out on your own. I assume you modified the Combine constructor using adjacent=False, to accept whitespace between tokens. -- Paul -----Original Message----- From: pyp...@li... [mailto:pyp...@li...] On Behalf Of Joshua J. Kugler Sent: Wednesday, March 05, 2008 7:52 PM To: PyParsing User List Subject: Re: [Pyparsing] Help using Combine() Sigh...I went back and re-read the docs and saw the part about adjacent. Problem solved. Sorry about the noise. j On Wed, 2008-03-05 at 12:39 -0900, Joshua J. Kugler wrote: > I have this construct: > > primary_key = (p.CaselessKeyword('primary') + > p.CaselessKeyword('key') + > p.Or([p.CaselessKeyword('clustered'), p.CaselessKeyword('nonclustered')]) + > p.nestedExpr(content=p.delimitedList(bracket_quoted.setResultsName('key_colu mn'))) + > p.CaselessKeyword('on').suppress() + > bracket_quoted.suppress() > ) > > which I'm using to parse: > > PRIMARY KEY CLUSTERED ([login_id]) ON [PRIMARY] > > All is well, and that gives: > > ['primary', 'key', 'clustered', ['login_id']] > > I'd like that first element to be 'primary key' so I do: > > primary_key = (p.Combine(p.CaselessKeyword('primary') + > p.CaselessKeyword('key'), joinString=' ') + > p.Or([p.CaselessKeyword('clustered'), p.CaselessKeyword('nonclustered')]) + > p.nestedExpr(content=p.delimitedList(bracket_quoted.setResultsName('key_colu mn'))) + > p.CaselessKeyword('on').suppress() + > bracket_quoted.suppress() > ) > > But that gives me: > > ParseException: Expected "key" (at char 7), (line:1, col:8) > > So apparently I'm not using Group as intended, and the docs don't > provide an example of use. Might someone point me to the docs that > explain how to accomplish what I'm trying to do? > > Thanks! > > j > -- Joshua Kugler VOC/SigNet Provider (aka Web App Programmer) S&K Aerospace Alaska ------------------------------------------------------------------------- This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ _______________________________________________ Pyparsing-users mailing list Pyp...@li... https://lists.sourceforge.net/lists/listinfo/pyparsing-users |
From: Joshua J. K. <jk...@sk...> - 2008-03-06 01:51:24
|
Sigh...I went back and re-read the docs and saw the part about adjacent. Problem solved. Sorry about the noise. j On Wed, 2008-03-05 at 12:39 -0900, Joshua J. Kugler wrote: > I have this construct: > > primary_key = (p.CaselessKeyword('primary') + > p.CaselessKeyword('key') + > p.Or([p.CaselessKeyword('clustered'), p.CaselessKeyword('nonclustered')]) + > p.nestedExpr(content=p.delimitedList(bracket_quoted.setResultsName('key_column'))) + > p.CaselessKeyword('on').suppress() + > bracket_quoted.suppress() > ) > > which I'm using to parse: > > PRIMARY KEY CLUSTERED ([login_id]) ON [PRIMARY] > > All is well, and that gives: > > ['primary', 'key', 'clustered', ['login_id']] > > I'd like that first element to be 'primary key' so I do: > > primary_key = (p.Combine(p.CaselessKeyword('primary') + > p.CaselessKeyword('key'), joinString=' ') + > p.Or([p.CaselessKeyword('clustered'), p.CaselessKeyword('nonclustered')]) + > p.nestedExpr(content=p.delimitedList(bracket_quoted.setResultsName('key_column'))) + > p.CaselessKeyword('on').suppress() + > bracket_quoted.suppress() > ) > > But that gives me: > > ParseException: Expected "key" (at char 7), (line:1, col:8) > > So apparently I'm not using Group as intended, and the docs don't > provide an example of use. Might someone point me to the docs that > explain how to accomplish what I'm trying to do? > > Thanks! > > j > -- Joshua Kugler VOC/SigNet Provider (aka Web App Programmer) S&K Aerospace Alaska |
From: Joshua J. K. <jk...@sk...> - 2008-03-05 21:39:12
|
I have this construct: primary_key = (p.CaselessKeyword('primary') + p.CaselessKeyword('key') + p.Or([p.CaselessKeyword('clustered'), p.CaselessKeyword('nonclustered')]) + p.nestedExpr(content=p.delimitedList(bracket_quoted.setResultsName('key_column'))) + p.CaselessKeyword('on').suppress() + bracket_quoted.suppress() ) which I'm using to parse: PRIMARY KEY CLUSTERED ([login_id]) ON [PRIMARY] All is well, and that gives: ['primary', 'key', 'clustered', ['login_id']] I'd like that first element to be 'primary key' so I do: primary_key = (p.Combine(p.CaselessKeyword('primary') + p.CaselessKeyword('key'), joinString=' ') + p.Or([p.CaselessKeyword('clustered'), p.CaselessKeyword('nonclustered')]) + p.nestedExpr(content=p.delimitedList(bracket_quoted.setResultsName('key_column'))) + p.CaselessKeyword('on').suppress() + bracket_quoted.suppress() ) But that gives me: ParseException: Expected "key" (at char 7), (line:1, col:8) So apparently I'm not using Group as intended, and the docs don't provide an example of use. Might someone point me to the docs that explain how to accomplish what I'm trying to do? Thanks! j -- Joshua Kugler VOC/SigNet Provider (aka Web App Programmer) S&K Aerospace Alaska |
From: Joshua J. K. <jk...@sk...> - 2008-03-05 21:30:34
|
Warning: long First: thank for the great package! It greatly simplifies my life. :) I'm working on parsing some SQL create statements from MS SQL (sigh, I know, but it's what we're stuck with at the moment). Here is the create: CREATE TABLE [dbo].[auth_login] ( [login_id] [int] IDENTITY (1, 1) NOT NULL , [login] [varchar] (255) NULL , [password_hash] [varchar] (255) NULL , [login_name] [varchar] (255) NULL , PRIMARY KEY CLUSTERED ( [login_id] ) ON [PRIMARY] , CONSTRAINT [IX_auth_login] UNIQUE NONCLUSTERED ( [login] ) ON [PRIMARY] ) ON [PRIMARY] You'll notice that the column definitions as well as the primary key and constraints are all within one enclosing pair of parentheses. My code is as such: create_table = (p.CaselessKeyword('create').suppress() + p.CaselessKeyword('table').suppress() + bracket_quoted.setResultsName('schema') + "." + bracket_quoted.setResultsName('table_name') + p.nestedExpr(content=p.delimitedList(p.Or( [p.Group(column_def.setResultsName('columns')), p.Group(primary_key), p.Group(constraint)]))) + p.CaselessKeyword('on').suppress() + bracket_quoted.suppress() ) p denotes the pyparsing module (import pyparsing as p) bracket_quoted is QuotedString using '[' and ']' column_def, primary_key, and constraint are all pyparsing expressions. So, at any rate, running that code and outputting via asList() gives me: ['dbo', '.', 'auth_login', [['login_id', 'int', 'IDENTITY', '1, 1', 'not', 'null'], ['login', 'varchar', '255', 'null'], ['password_hash', 'varchar', '255', 'null'], ['login_name', 'varchar', '255', 'null'], ['paid_us_good', 'money', 'null'], ['asdfsdafsadf', 'nchar', '10', 'null'], ['primary', 'key', 'clustered', ['login_id']], ['constraint', 'IX_auth_login', 'unique', 'nonclustered', ['login']]]] The column defs are in the same list element as the primary key and constraint. Apparently I'm not understanding the Group class. What can I do to put each of those three things (column defs, primary key defs, and constraint defs) in their own list elements? I want to know that the elements of l[3] are columns, l[4] are primary keys, or similar. Thanks! j -- Joshua Kugler VOC/SigNet Provider (aka Web App Programmer) S&K Aerospace Alaska |
From: Paul M. <pt...@au...> - 2008-02-16 17:24:52
|
Dict is not meant as "here is a dict entry with this particular keyword and this value." It is more meant as "here is a list of grouped entries and values, to be returned as a dict; take the first item of each group as the key, and the remaining items in each group as that key's value." In your case, a more likely definition would be: keylabel = oneOf("hello world") p = Dict(OneOrMore(Group(keylabel + (Word(nums) | Word(alphas, alphanums))))) results = p.parseString("hello abc world 2134") print results.keys() print results.dump() print results.hello The entries *must* be explicitly grouped, else the tokens will just run together and Dict wont know where values stop and the next key starts. In a larger grammar, the Dict expression is usually given a results name (say "dictVals") and then the entries in the dict can be referenced as "dictVals.hello" or "dictVals['world']" (using the keys from your example). I tried to simplify the use of Dict by providing the dictOf helper method. It would change the above to: keylabel = oneOf("hello world") p = dictOf( keylabel, (Word(nums) | Word(alphas, alphanums)) ) Where dictOf gets called with two expressions - the first is the expression for matching keys in the dict, and the second expression is for matching the values. It is atypical (but not impossible) to have a list of known keywords that would be keys. In the dictExample.py script, which ships in the pyparsing examples directory, the keys are labels in a table of data statistics: min, max, etc. These could have been hardcoded as oneOf("min max ave sdev"), but I could just reference them as Word(alphas), since their placement in the table was unambiguous. The configParse.py example uses nested Dicts to permit the values in an INI file to be referenced as "config.section.subsection.subsubsection.etc" -- Paul Here is the text of dictExample.py - please download either the source or docs distributions from SourceForge, to get the complete documentation and examples directories (not included when using easy_install or the Windows installer): # # dictExample.py # # Illustration of using pyparsing's Dict class to process tabular data # # Copyright (c) 2003, Paul McGuire # from pyparsing import Literal, Word, Group, Dict, ZeroOrMore, alphas, nums, delimitedList import pprint testData = """ +-------+------+------+------+------+------+------+------+------+ | | A1 | B1 | C1 | D1 | A2 | B2 | C2 | D2 | +=======+======+======+======+======+======+======+======+======+ | min | 7 | 43 | 7 | 15 | 82 | 98 | 1 | 37 | | max | 11 | 52 | 10 | 17 | 85 | 112 | 4 | 39 | | ave | 9 | 47 | 8 | 16 | 84 | 106 | 3 | 38 | | sdev | 1 | 3 | 1 | 1 | 1 | 3 | 1 | 1 | +-------+------+------+------+------+------+------+------+------+ """ # define grammar for datatable heading = (Literal( "+-------+------+------+------+------+------+------+------+------+") + "| | A1 | B1 | C1 | D1 | A2 | B2 | C2 | D2 |" + "+=======+======+======+======+======+======+======+======+======+").suppres s() vert = Literal("|").suppress() number = Word(nums) rowData = Group( vert + Word(alphas) + vert + delimitedList(number,"|") + vert ) trailing = Literal( "+-------+------+------+------+------+------+------+------+------+").suppres s() datatable = heading + Dict( ZeroOrMore(rowData) ) + trailing # now parse data and print results data = datatable.parseString(testData) print data pprint.pprint(data.asList()) print "data keys=", data.keys() print "data['min']=", data['min'] print "data.max", data.max |
From: June K. <jun...@gm...> - 2008-02-15 16:47:08
|
Hi. The following code doesn't work: >>> p=Dict(Keyword("hello")+'abc')+Dict(Keyword("world")+Word(nums)) >>> p.parseString("hello abc world 2134") Traceback (most recent call last): File "<stdin>", line 1, in <module> File "c:\python25\lib\site-packages\pyparsing-1.4.11-py2.5-win32.egg\pyparsing .py", line 980, in parseString loc, tokens = self._parse( instring.expandtabs(), 0 ) File "c:\python25\lib\site-packages\pyparsing-1.4.11-py2.5-win32.egg\pyparsing .py", line 860, in _parseNoCache loc,tokens = self.parseImpl( instring, preloc, doActions ) File "c:\python25\lib\site-packages\pyparsing-1.4.11-py2.5-win32.egg\pyparsing .py", line 2174, in parseImpl loc, resultlist = self.exprs[0]._parse( instring, loc, doActions, callPrePar se=False ) File "c:\python25\lib\site-packages\pyparsing-1.4.11-py2.5-win32.egg\pyparsing .py", line 866, in _parseNoCache tokens = self.postParse( instring, loc, tokens ) File "c:\python25\lib\site-packages\pyparsing-1.4.11-py2.5-win32.egg\pyparsing .py", line 2831, in postParse dictvalue = tok.copy() #ParseResults(i) AttributeError: 'str' object has no attribute 'copy' If I enclose the expression inside Dict with something like Group, it works okay. That is, >>> p=Dict(Group(Keyword("hello")+'abc'))+Dict(Group(Keyword("world")+Word(nums))) >>> p.parseString("hello abc world 2134") ([(['hello', 'abc'], {}), (['world', '2134'], {})], {'world': [('2134', 1)], 'he llo': [('abc', 0)]}) Is this intended? I thought Dict should always implicitly mean Group. |
From: June K. <jun...@gm...> - 2008-02-15 03:26:49
|
Well, a much simpler solution just occurred to me: class EmptySuppress(Suppress): #suppress only empy tokens def postParse( self, instring, loc, tokenlist ): if len(tokenlist[0]): return tokenlist return [] cmpdstmt=EmptySuppress(Group(Literal('{').suppress()+ ZeroOrMore(stmt)+ Literal('}').suppress())) 2008/2/15, June Kim <jun...@gm...>: > Hello, > > It's been a long while since I last used pyparsing(Hi Paul). I have > good memories of pyparsing. Recently, I'm using it again, for parsing > a subset of C and PL/I syntax. > > First, look at the following code please: > > from pyparsing import * > from pprint import pprint > > t=""" > a=a+1; > if (a>b) { //if (1==2) gogogo(); > if (c>d) a=a+1; > else b=b+1; > } else { > c=c+1; > if (3==4) > printf("abc"); > else if (4==5) > dothis(); > else > dothat(); > if (a>b) {go();go();come();} > if (d>e) if (c>g) if(a>q) a=a+1; > } > """ > > ifst=Forward() > stmt=Forward() > > cmpdstmt=Group(Literal('{').suppress()+ > ZeroOrMore(stmt)+ > Literal('}').suppress()) > > stmt << (Literal(';').suppress() | > cmpdstmt | > ifst | > (NotAny('}')+SkipTo(';',include=True)).suppress()) > > ifst << Group(Keyword("if")+nestedExpr("(",")").suppress()+stmt+\ > Optional(Keyword("else")+stmt)) > > p=ifst.ignore(cStyleComment).setDebug(False) > pprint (list(p.scanString(t))[0][0].asList(),width=2,indent=2) > > ########################################### > > What I am trying to do is get the minimal tree-structure of if > statements. I am not interested anything other than if statements. > > The result is: > > [ [ 'if', > [ [ 'if', > 'else']], > 'else', > [ [ 'if', > 'else', > [ 'if', > 'else']], > [ 'if', > [ ]], > [ 'if', > [ 'if', > [ 'if']]]]]] > > I am fairly satisfied with the result, but would like the blank list > removed. That is, I want the {go();go();come();} part not present(not > even as an empty list) in the parseResult. However I can't totally > suppress the cmpdstmt since some of them might include 'if'. > > I played with setParseAction but couldn't get what I wanted. I also > branched a few new grammars for treating non-if-cmpdstmt as following: > > ifst=Forward() > stmt=Forward() > cmpdstmt=Forward() > > > stmt << (Literal(';').suppress() | > cmpdstmt | > ifst | > (NotAny('}')+SkipTo(';',include=True)).suppress()) > > restcmpdstmt=Group(Literal('{').suppress()+ > ZeroOrMore(stmt)+ > Literal('}').suppress()) > > nonifstmt=Forward() > nonifcmpdstmt=Group(Literal('{').suppress()+ > ZeroOrMore(nonifstmt)+ > Literal('}').suppress()) > > # a statement that doesn't include if-statement > nonifstmt << (Literal(';').suppress() | > nonifcmpdstmt | > (~oneOf('} if')+SkipTo(';',include=True)).suppress()) > > cmpdstmt << (nonifcmpdstmt.suppress() | restcmpdstmt) > > ifst << Group(Keyword("if")+nestedExpr("(",")").suppress()+stmt+\ > Optional(Keyword("else")+stmt)) > > > It returns the expected result. > > [ [ 'if', > [ [ 'if', > 'else']], > 'else', > [ [ 'if', > 'else', > [ 'if', > 'else']], > [ 'if'], > [ 'if', > [ 'if', > [ 'if']]]]]] > > Does the code look right? > > Yet, the problem is that the branched grammar version is too complex > to write, read, and maintain. > > What alternatives or improvements do you recommend? (maybe > post-processing the parseResult after the parsing's finished?) > > > June Kim > |
From: June K. <jun...@gm...> - 2008-02-15 02:36:21
|
Hello, It's been a long while since I last used pyparsing(Hi Paul). I have good memories of pyparsing. Recently, I'm using it again, for parsing a subset of C and PL/I syntax. First, look at the following code please: from pyparsing import * from pprint import pprint t=""" a=a+1; if (a>b) { //if (1==2) gogogo(); if (c>d) a=a+1; else b=b+1; } else { c=c+1; if (3==4) printf("abc"); else if (4==5) dothis(); else dothat(); if (a>b) {go();go();come();} if (d>e) if (c>g) if(a>q) a=a+1; } """ ifst=Forward() stmt=Forward() cmpdstmt=Group(Literal('{').suppress()+ ZeroOrMore(stmt)+ Literal('}').suppress()) stmt << (Literal(';').suppress() | cmpdstmt | ifst | (NotAny('}')+SkipTo(';',include=True)).suppress()) ifst << Group(Keyword("if")+nestedExpr("(",")").suppress()+stmt+\ Optional(Keyword("else")+stmt)) p=ifst.ignore(cStyleComment).setDebug(False) pprint (list(p.scanString(t))[0][0].asList(),width=2,indent=2) ########################################### What I am trying to do is get the minimal tree-structure of if statements. I am not interested anything other than if statements. The result is: [ [ 'if', [ [ 'if', 'else']], 'else', [ [ 'if', 'else', [ 'if', 'else']], [ 'if', [ ]], [ 'if', [ 'if', [ 'if']]]]]] I am fairly satisfied with the result, but would like the blank list removed. That is, I want the {go();go();come();} part not present(not even as an empty list) in the parseResult. However I can't totally suppress the cmpdstmt since some of them might include 'if'. I played with setParseAction but couldn't get what I wanted. I also branched a few new grammars for treating non-if-cmpdstmt as following: ifst=Forward() stmt=Forward() cmpdstmt=Forward() stmt << (Literal(';').suppress() | cmpdstmt | ifst | (NotAny('}')+SkipTo(';',include=True)).suppress()) restcmpdstmt=Group(Literal('{').suppress()+ ZeroOrMore(stmt)+ Literal('}').suppress()) nonifstmt=Forward() nonifcmpdstmt=Group(Literal('{').suppress()+ ZeroOrMore(nonifstmt)+ Literal('}').suppress()) # a statement that doesn't include if-statement nonifstmt << (Literal(';').suppress() | nonifcmpdstmt | (~oneOf('} if')+SkipTo(';',include=True)).suppress()) cmpdstmt << (nonifcmpdstmt.suppress() | restcmpdstmt) ifst << Group(Keyword("if")+nestedExpr("(",")").suppress()+stmt+\ Optional(Keyword("else")+stmt)) It returns the expected result. [ [ 'if', [ [ 'if', 'else']], 'else', [ [ 'if', 'else', [ 'if', 'else']], [ 'if'], [ 'if', [ 'if', [ 'if']]]]]] Does the code look right? Yet, the problem is that the branched grammar version is too complex to write, read, and maintain. What alternatives or improvements do you recommend? (maybe post-processing the parseResult after the parsing's finished?) June Kim |
From: <mr...@ke...> - 2008-02-06 19:11:26
|
For You....My Love http://92.114.212.28/ |
From: Paul M. <pt...@au...> - 2008-01-23 15:41:28
|
David - The naming schemes go like this (cf. http://www.python.org/dev/peps/pep-0008/, under "Naming Conventions"): __xxx__ : "magic" methods useful by convention by Python internals. Examples include __init__, __call__, __add__, __del__, __dict__ (pyparsing uses methods like these __add__, __or__, __xor__, etc. to do the operator overloading) _xxx : quasi-private names, these do not get imported when using "from module import *" xxx_ : convention for naming variables that conflict with Python keywords (class_, for_, etc.) __xxx : class attributes with leading double-underscore are name-mangled by the Python interpreter to "hide" them externally, a form of private but can be worked around if you really, really, really need to (reference __xxx in class Y as _Y__xxx, but to my mind, this is even a worse red flag than using an attribute with a leading underscore). As you've probably noticed, pyparsing doesn't fully comply with PEP8, mostly with respect to using camel case names instead of names_with_underscores. I think it is just my own personal history - I used to use names with underscores back in my C and PL/I days, and then "graduated" to mixed case when I moved to Smalltalk, C++ and Java. And I'm glad you were able to make some sense of my ramblings. :) -- Paul |
From: <dav...@l-...> - 2008-01-23 14:55:00
|
I think you answered my main concerns with using the framework. Some of the more "strategic" questions don't seem to be answered well in the documentation, and getting a third party perspective is certainly useful. I've heard of several things with the __somefunc__ naming. One main thing that I've heard, is it's more like the equivalent to "private" functions than "magic" functions. It's a way to make it glaringly obvious what a class user should keep their sticky little hands out of :). While not shown in my example, the Parser class does have "public" functions for subscription to the resultant objects. The whole framework comes about from the fact that this is really meant to be a generic parser. There can be quite a few different utility "things" that can be done w/ the parsed data, and it is really quite useful to have generic callbacks that different classes can use. Like I mentioned previously, the Grammar and file-size make the whole parsing of the file quite an ordeal, but pyparsing is much easier to use than the corresponding (ugly as hell) perl framework that we used to use. =20 P.S. I would like to personally thank you for one of the most well structured, thoughtful (as in you put a lot of thought into it :-P ), and useful responses to anything I've ever posted on *any* mailing list Thanks! > -----Original Message----- > From: Paul McGuire [mailto:pt...@au...]=20 > Sent: Tuesday, January 22, 2008 10:52 PM > To: Weber, David C @ Link; pyp...@li... > Subject: RE: [Pyparsing] Strategies for use with ParseFile >=20 > David - >=20 > This does seem fairly complicated, but I think your approach=20 > in using parse actions as parse-time callbacks to build a=20 > data structure is actually pretty typical. >=20 > To answer your specific questions: > 1. There is a parse action keepOriginalText which may do the=20 > trick for you. > Maybe this example would help: >=20 > from pyparsing import * >=20 > a_s =3D Word("a") > b_s =3D Word("b") > c_s =3D Word("c") >=20 > allwords =3D a_s + b_s + c_s > def showTokens(tokens): > print "Showing tokens:", tokens.asList() > =20 > allwords.setParseAction(showTokens, keepOriginalText, showTokens) > allwords.parseString("aaaaa bbbb cccc") >=20 >=20 > Prints: > Showing tokens: ['aaaaa', 'bbbb', 'cccc'] > Showing tokens: ['aaaaa bbbb cccc'] > =20 > When allwords is parsed, the 3 parse actions are called in=20 > turn. First showTokens is called with the individual tokens=20 > returned from matching a_s, b_s, and c_s. Then=20 > keepOriginalText is called that changes the matched tokens=20 > back to the original text. Then showTokens is called again=20 > to show the effect of calling keepOriginalText. Does this help? >=20 > 2. I don't really have much to go on to answer your second=20 > question. It > *is* possible that you don't need multiple callbacks to=20 > create Python objects and return them. Instead, you can just=20 > have the related class define __init__ to accept the tokens=20 > that are passed to a parse action, and just name the class as=20 > the parse action. This will cause the __init__ method to be=20 > called with the matched tokens, and the constructed object=20 > will be returned to the parser. There are examples of this=20 > in the Pycon presentation that ships with pyparsing,=20 > describing the interactive adventure game; there is an=20 > example in the pyparsing O'Reilly short cut, in which a query=20 > string getc converted to a sequence of classes. For example: >=20 > class XClass(object): > def __init__(self,tokens): > self.matchedText =3D tokens[0] > def __repr__(self): > return "%s:(%s)" % (self.__class__.__name__,self.matchedText) > class AClass(XClass): pass > class BClass(XClass): pass > class CClass(XClass): pass > a_s.setParseAction(AClass) > b_s.setParseAction(BClass) > c_s.setParseAction(CClass) >=20 > allwords =3D a_s + b_s + c_s >=20 > print allwords.parseString("aaaaa bbbb cccc").asList() >=20 > Prints: > [AClass:(aaaaa), BClass:(bbbb), CClass:(cccc)] >=20 >=20 > Also, your naming convention is a little distracting, leading=20 > and trailing double-underscores are usually reserved for=20 > "magic" functions, such as __str__, __call__, etc. So when=20 > you use them on your own class and method names, it looks=20 > confusing to me. >=20 > Also, I don't know if you are gaining anything by burying=20 > different pyparsing expressions/rules inside class variables.=20 > This sounds vaguely Java-esque to me. In Python, things=20 > *can* exist outside of a class... >=20 > I don't feel that I've really addressed all of your=20 > question/concern, can you distill this architecture down to=20 > some small examples, and repost? > Otherwise, I'd say this is pretty much in line with how you=20 > would parse this data and use it to construct an overall data=20 > structure with it. >=20 > -- Paul >=20 >=20 |
From: Paul M. <pt...@au...> - 2008-01-23 04:51:47
|
David - This does seem fairly complicated, but I think your approach in using parse actions as parse-time callbacks to build a data structure is actually pretty typical. To answer your specific questions: 1. There is a parse action keepOriginalText which may do the trick for you. Maybe this example would help: from pyparsing import * a_s = Word("a") b_s = Word("b") c_s = Word("c") allwords = a_s + b_s + c_s def showTokens(tokens): print "Showing tokens:", tokens.asList() allwords.setParseAction(showTokens, keepOriginalText, showTokens) allwords.parseString("aaaaa bbbb cccc") Prints: Showing tokens: ['aaaaa', 'bbbb', 'cccc'] Showing tokens: ['aaaaa bbbb cccc'] When allwords is parsed, the 3 parse actions are called in turn. First showTokens is called with the individual tokens returned from matching a_s, b_s, and c_s. Then keepOriginalText is called that changes the matched tokens back to the original text. Then showTokens is called again to show the effect of calling keepOriginalText. Does this help? 2. I don't really have much to go on to answer your second question. It *is* possible that you don't need multiple callbacks to create Python objects and return them. Instead, you can just have the related class define __init__ to accept the tokens that are passed to a parse action, and just name the class as the parse action. This will cause the __init__ method to be called with the matched tokens, and the constructed object will be returned to the parser. There are examples of this in the Pycon presentation that ships with pyparsing, describing the interactive adventure game; there is an example in the pyparsing O'Reilly short cut, in which a query string getc converted to a sequence of classes. For example: class XClass(object): def __init__(self,tokens): self.matchedText = tokens[0] def __repr__(self): return "%s:(%s)" % (self.__class__.__name__,self.matchedText) class AClass(XClass): pass class BClass(XClass): pass class CClass(XClass): pass a_s.setParseAction(AClass) b_s.setParseAction(BClass) c_s.setParseAction(CClass) allwords = a_s + b_s + c_s print allwords.parseString("aaaaa bbbb cccc").asList() Prints: [AClass:(aaaaa), BClass:(bbbb), CClass:(cccc)] Also, your naming convention is a little distracting, leading and trailing double-underscores are usually reserved for "magic" functions, such as __str__, __call__, etc. So when you use them on your own class and method names, it looks confusing to me. Also, I don't know if you are gaining anything by burying different pyparsing expressions/rules inside class variables. This sounds vaguely Java-esque to me. In Python, things *can* exist outside of a class... I don't feel that I've really addressed all of your question/concern, can you distill this architecture down to some small examples, and repost? Otherwise, I'd say this is pretty much in line with how you would parse this data and use it to construct an overall data structure with it. -- Paul |
From: <dav...@l-...> - 2008-01-22 17:43:43
|
All, Been using pyparsing for a long time, and I feel like I'm using it in a poor fashion, as it seems to be quite cumbersome to use. Some background: We need to parse text files that are routinely hundreds of thousands of lines long. The grammar is rather complicated (guesstimate of 300 rules). The grammer is stored in a class, with each rule a static class variable. I have another class (a parser) that subscribes to rule subsets through the usage of "setParseAction" for the interesting rules. When an interesting rule is encountered, my parser class is called. It then pulls out the interesting tokens, constructs a python object, and then it fires a callback function, where an interested user of this data can act upon it. Now, and "interesting" rule may be composed of say, 10 subrules. I don't need their info individually, but I can get it though the composite object. So, two questions: 1.) Any easy way to retrieve original text for an entire EDT below 2.) Any suggestions for better organization of the data. I've thought about some inheritence usage because the file has header data, and oneOrMore() of 6 different "things" (one of which is a EDT illustrated below), but seems like a bit of a shoehorn. Thanks ------------------------------------------------------------------------ --------------------------------------------- class Grammar: <snip> EnumeratedDataType =3D \ Keyword("(EnumeratedDataType") + \ EDT_Name + \ Optional(EDT_Description) + \ Optional(EDT_MomEnumeratedDataType) + \ EDT_AutoSequence + \ Optional(EDT_Description) + \ EDT_StartValue + \ OneOrMore(EDT_Enumeration) + \ ")";=20 ------------------------------------------------------------------------ --------------------------------------------- class Parser: <snip> def __EDT_setParseActions__(self): """Set the parse actions for the EDT elements""" Grammar.EnumeratedDataType.setParseAction(self.__EDT__); # These can all be handled identically. One of each only. Grammar.EDT_Name.setParseAction(self.__EDT_Element__); =20 Grammar.EDT_MomEnumeratedDataType.setParseAction(self.__EDT_Element__); Grammar.EDT_AutoSequence.setParseAction(self.__EDT_Element__); Grammar.EDT_Description.setParseAction(self.__EDT_Element__); Grammar.EDT_StartValue.setParseAction(self.__EDT_Element__); =20 # You can have one or more of these =20 Grammar.EDT_Enumeration.setParseAction(self.__EDT_Enumeration__); =20 Grammar.EDT_Enumerator.setParseAction(self.__EDT_Enum_Element__); =20 Grammar.EDT_Representation.setParseAction(self.__EDT_Enum_Element__); =20 def __EDT__(self, s, l, toks): # Fire the EDT callback and reset the parent. We've already stored the # data we care about self.__fireCallback__(OMDParser.EDT_TOKEN, self.__ParentElement__); self.__ResetParent__(); def __EDT_Element__(self, s, l, toks): """ This method is called whenever we encounter an EDT element. We add the element to the __ParentElement__ dictionary =20 """ # Init the parent, and add the parsed item self.__InitParent__(self.EDT_TOKEN); =20 self.__ParentElement__.addKey(toks[0], toks[1]); def __EDT_Enumeration__(self, s, l, toks): """=20 This method is called whenever an enumeration is fully parsed. We must now add it to the parent element and reset the child """ self.__ParentElement__.appendKey("Enumerations", self.__ChildElement__); self.__ResetChild__(); def __EDT_Enum_Element__(self, s, l, toks): """ This method is called whenever we encounter an Enumeration element We add the element to the CurrEnumeration dictionary=20 """ # Initialize the child element, and set the current element self.__InitChild__("Enumeration"); =20 self.__ChildElement__.addKey(toks[0], toks[1]);=20 ------------------------------------------------------------------------ --------------------------------------------- USAGE!!!: def gotEDT(EDT): print EDT; # Start of "Main" function =20 if __name__ =3D=3D "__main__": op =3D Parser(<fileName>); op.registerCallback(OMDParser.EDT_TOKEN, gotEDT); |
From: Andrew S. <agt...@ya...> - 2008-01-17 02:32:25
|
--- Paul McGuire <pt...@au...> wrote: > > > Thanks for the quick reply. So far using pe.loc < > len(input) works for me. > I'll reply to the list if I can find a counter > example. Can I count on loc > sticking around in the ParseException class? > > <PM> Great! Yes, loc is an important part of > ParseException, and has been > in pyparsing since version 0.5. What would give you > the idea that it might > not stick around? > (Grammar discussion snipped) Since the algorithm you suggested for maybeParseable() works well but is not a formal part API I kind of feel like I'm going in through the "back door" to get this done. It's probably just perception on my part, but relying on direct attribute access for the parse error location also feels like it could change. This is most likely a holdover from habits gained doing too much Java beans attribute access. Even though I'd like getters and setters to die a horrible death they did sort of provide a feeling of permanence. -a. ____________________________________________________________________________________ Be a better friend, newshound, and know-it-all with Yahoo! Mobile. Try it now. http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ |
From: Ralph C. <ra...@in...> - 2008-01-16 13:46:04
|
Hi Samuel, > from pyparsing import Literal, Combine > grammar = Combine(Literal("A") + Literal("B") + Literal("C")) > grammar.ignore(",") > grammar.parseString("A,,,,,B,,C") > > ...but I suddently got a ParseException: > pyparsing.ParseException: Expected "B" (at char 1), (line:1, col:2) > > It seems that the commas are not anymore ignored in the Combine > statement! Can anyone tell me why, or how to have the parser apply > ignore rules also in the Combine command ? You need to get the commas ignored before attempting to combine. >>> from pyparsing import Literal, Combine >>> grammar = Literal("A") + Literal("B") + Literal("C") >>> grammar.ignore(",") >>> grammar = Combine(grammar) >>> grammar.parseString("A,,,,,B,,C") (['ABC'], {}) >>> Cheers, Ralph. |
From: Samuel L. <sa...@ho...> - 2008-01-16 13:35:07
|
Hi, I have problems with setting ignores for a grammar. The following is a very simplified program to explain my problem. First I had a simple program like a) a) from pyparsing import Literal grammar =3D Literal("A") + Literal("B") + Literal("C") grammar.ignore(",") grammar.parseString("A,,,,,B,,C") It just worked perfectly. Commas where simply ignored (as defined) Later on I adapted my program to use the Combine command, which in my under= standing should just have an effect on how the parser results are represent= ed... b) from pyparsing import Literal, Combine grammar =3D Combine(Literal("A") + Literal("B") + Literal("C")) grammar.ignore(",") grammar.parseString("A,,,,,B,,C") ...but I suddently got a ParseException: pyparsing.ParseException: Expected "B" (at char 1), (line:1, col:2) It seems that the commas are not anymore ignored in the Combine statement! Can anyone tell me why, or how to have the parser apply ignore rules also i= n the Combine command ? Sam _________________________________________________________________ Jetzt 30 Gratis-Emoticons f=FCr Windows Live Messenger downloaden! http://www.livemessenger-emoticons.com/de-at= |
From: Paul M. <pt...@au...> - 2008-01-15 15:08:31
|
Here is the complete listing of the SQL expression parser I described in the last message. -- Paul from pyparsing import * sql1 = "select where foo = 1 and bar = 2 into result" sql2 = "select where foo = 1 and bar = 2" into_clause = Keyword('into') + restOfLine selectStmt = Keyword('select') + SkipTo(into_clause|StringEnd()).setResultsName('where_condition') identifier = Word(alphas) relationalOperator = oneOf("< = > >= <= != <>") integer = Word(nums) value = integer | sglQuotedString logicalComparison = identifier + relationalOperator + value AND_cl = CaselessLiteral("AND") OR_cl = CaselessLiteral("OR") NOT_cl = CaselessLiteral("NOT") complexComparison = operatorPrecedence( logicalComparison, [ (NOT_cl, 1, opAssoc.RIGHT), (OR_cl, 2, opAssoc.LEFT), (AND_cl, 2, opAssoc.LEFT), ]) where_clause = CaselessLiteral("where") + complexComparison selectStmt = Keyword('select') + where_clause('where_condition') + \ Optional(into_clause)('into_clause') print selectStmt.parseString(sql1).dump() print selectStmt.parseString(sql2).dump() Prints: ['select', 'where', [['foo', '=', '1'], 'AND', ['bar', '=', '2']], 'into', ' result'] - into_clause: ['into', ' result'] - where_condition: ['where', [['foo', '=', '1'], 'AND', ['bar', '=', '2']]] ['select', 'where', [['foo', '=', '1'], 'AND', ['bar', '=', '2']]] - where_condition: ['where', [['foo', '=', '1'], 'AND', ['bar', '=', '2']]] |
From: Paul M. <pt...@au...> - 2008-01-15 14:55:33
|
Thanks for the quick reply. So far using pe.loc < len(input) works for me. I'll reply to the list if I can find a counter example. Can I count on loc sticking around in the ParseException class? <PM> Great! Yes, loc is an important part of ParseException, and has been in pyparsing since version 0.5. What would give you the idea that it might not stick around? Not to push my luck, but I've got a grammar question too. I'm trying to define a grammar that uses a skipTo(Optional(xxx)) and not having much success. Is there a better way to go about this? select where foo = 1 and bar = 2 into result I was hoping to end up with something like the following: into_clause = Keyword('into') + restOfLine Keyword('select') + skipTo(Optional(into_clause)).setResultsName('where_condition') <PM> Hmm, SkipTo (leading "S" is capitalized) really wants to have a predictable target expression. SkipTo can be embedded inside an Optional, but you can't skip to an optional thing. You *can* skip to one thing or another, as in SkipTo(A | B), and the SkipTo will stop at whichever comes first. If A may or may not be present, then make B a StringEnd(). where_clause = SkipTo(into_clause | StringEnd()) This is actually almost readable English - "skip to either the into_clause or the end of the input string". But if all of your where clauses are this simple, you might take the time to define a where_clause expression, probably using operatorPrecedence to take care of things like nested parentheses. Here is how to use operatorPrecedence: 1. Identify the basic operand of your expression. In this case, each where clause is a boolean expression of logical comparisons. A logical comparison is of the form: identifer = Word(alphas) relationalOperator = oneOf("< = > >= <= != <>") integer = Word(nums) value = integer | sglQuotedString logicalComparison = identifier + relationalOperator + value 2. Identify the operators. Logical expressions usually allow AND, OR, and NOT. We'll define caseless versions of each: AND_cl = CaselessLiteral("AND") OR_cl = CaselessLiteral("OR") NOT_cl = CaselessLiteral("NOT") 3. Call operatorPrecedence with these operators, to compose a grammar. complexComparison = operatorPrecedence( logicalComparison, [ (NOT_cl, 1, opAssoc.RIGHT), (OR_cl, 2, opAssoc.LEFT), (AND_cl, 2, opAssoc.LEFT), ]) operatorPrecedence is called using the base operand, followed by a list of tuples describing each operator or group of operators. Each tuple contains the operator, the value 1 or 2 indicating whether it is a unary or binary operator, and the opAssoc.LEFT or opAssoc.RIGHT value indicating whether the operator is right or left associative. With the example you provided, this should be enough to define a where_clause expression: where_clause = CaselessLiteral("where") + complexComparison You may have to expand the value expression to support real numbers or identifiers, I hope this is clear how you would do so - what I've provided will match integers or single-quoted strings. Probably more than you asked for, if it is too much to deal with now, just go with the SkipTo alternative, and come back to the rest of this later. -- Paul |