Thread: [Pyparsing] ParseResults as AST?
Brought to you by:
ptmcg
From: Aubrey B. <ba...@cs...> - 2011-09-29 01:35:10
|
Pyparsing users, I am having trouble figuring out how to make an abstract syntax tree (AST) from a ParseResults object. I do not understand the data model that is the ParseResults object as it relates to holding the information of an AST. (I have spent a whole day on this so far, so it is time to ask for help.) I am writing a Prolog parser, so I really do need an AST as opposed to some flattened representation like ParseResults. Order matters. Type matters. Depth matters. Sample Input ------------ This is only Datalog with atoms, integers, and variables. ''' % Facts r1(a). r1(b). r1(c). r2(a, 1). r2(a, 2). r2(d, 3). r2(e, 4). % Clauses r3(A, B) :- r1(A), r2(A, B). ''' Desired Output -------------- I would like to get something back from parsing like the following: ('datalog', [ ('comment', '% Facts'), ('fact', [('name', 'r1'), ('body', [('atom', 'a')])]), ('fact', <result>), ... ('clause', [('head', <result>), ('body', [<result>...])]), ] ) where <result> stands for the parse result from deeper recursion. The structure is just lists of pairs. Each pair is the name (type) of the parsed object and its content, the result. Parsing ------- I believe I have my grammar implemented correctly in pyparsing as I get the expected AST back as XML by calling ParseResults.asXML(). (Except that I have an extra top-level element, "<datalog><datalog>...</datalog></datalog>", but that is easy to workaround.) Tried ----- I have tried to understand the ParseResults object. Clearly the AST information exists as ParseResults.asXML() gives me an AST in XML. However, some of the information used in asXML() is not part of the public API. I have tried to use parse actions to construct my own AST but I couldn't figure this out either (in part because the product of parsing is still a ParseResults object). Options ------- * Use asXML() and reparse the XML. Not performant. Too much code. Misses the point. * Write asAST() into pyparsing.py as a near clone of asXML(). This is my current favorite option. (I'm on a deadline.) * Use parse actions effectively. * Use a different parsing library (I have also tried pyPEG.) * Other? Any suggestions and/or help would be very much appreciated. Sincerely, Aubrey |
From: Paul M. <pt...@au...> - 2011-09-29 06:10:59
|
Aubrey - If all you want is an AST, then you should be able to construct this using Group's and parseActions. For instance, let's say you want a parser for an assignment statement, where the lvalue can be a variable and the rvalue can be simple expression, with terms that can be integers, variables, or function calls. """ Simple BNF: assignment :: lvalue '=' rvalue lvalue :: variableRef rvalue :: term [ op term ] term :: fnCall | variable | integer fnCall :: fnName '(' [rvalue[,rvalue]*] ') variable :: [a-zA-Z_][a-zA-Z0-9_]* integer :: \d+ op :: '+' | '-' """ from pyparsing import * LPAR,RPAR,EQ = map(Suppress, "()=") op = oneOf("+ -") integer = Word(nums) identifier = Word(alphas+'_', alphanums+'_') variableRef = identifier.copy() fnName = identifier.copy() rvalue = Forward() arglist = delimitedList(rvalue) fnCall = fnName + Group(LPAR + Optional(arglist, default=[]) + RPAR) rvalue << (fnCall | variableRef | integer) assignment = variableRef + EQ + Group(rvalue) print assignment.parseString("y = sin(60)-180").asList() Will print: ['y', ['sin', ['60']]] Knowing that this was an assignment statement, you could then use assignment unpacking to pull out the left and right hand sides of the assignment: lhs, rhs = assignment.parseString("...").asList() With some simple parse actions, you can decorate these structures similar to what you show in your AST example: assignment.setParseAction(lambda t : ['assign'] + t.asList()) fnCall.setParseAction(lambda t : ['function-call'] + t.asList()) print assignment.parseString("y = sin(60)-180").asList() will print: ['assign', 'y', ['function-call', 'sin', ['60']]] But now I have to ask, what do you plan to do with this AST? I suspect that the next step is to walk the structure, and perform actions depending on the decorating labels. Instead, I would suggest you use the parse actions to construct representational objects. At parse time, you already know what the substructures are going to be ("a function call will have a name and a potentially empty argument list" for instance), why not use a parse action to construct an object that gives a representation for this? class ASTNode(object): def __init__(self, tokens): self.tokens = tokens self.assignFields() def __str__(self): return self.__class__.__name__ + ':' + str(self.__dict__) __repr__ = __str__ class Assignment(ASTNode): def assignFields(self): self.lhs, self.rhs = self.tokens del self.tokens class FunctionCall(ASTNode): def assignFields(self): self.fnName, self.args = self.tokens del self.tokens fnCall.setParseAction(FunctionCall) assignment = (variableRef + EQ + rvalue).setParseAction(Assignment) print assignment.parseString("y = sin(60)-180")[0] Will print: Assignment:{'rhs': FunctionCall:{'fnName': 'sin', 'args': (['60'], {})}, 'lhs': 'y'} Now we have bypassed the intermediate AST step, and gone straight to an object representation. These classes can be extended to include an eval or exec method, which could be directly called, instead of retracing steps already taken during parse time - why test args with isdigit, for instance? At parse time, your grammar already defines when an integer is being encountered - just add a parse action that does the string-to-integer conversion. I think the SimpleBool.py example on the pyparsing wiki gives a more complete treatment of this concept. In the article I wrote for Python magazine, I showed how you could even pickle these representational objects, and then unpickle and execute them in varying VM's (GUI, console, server) without reparsing the original code. If you just want an AST, use Groups to add structure, and parse actions to decorate the groups with your AST structure labels. I think though that the representational objects are more powerful. -- Paul -----Original Message----- From: Aubrey Barnard [mailto:ba...@cs...] Sent: Wednesday, September 28, 2011 7:35 PM To: pyp...@li... Subject: [Pyparsing] ParseResults as AST? Pyparsing users, I am having trouble figuring out how to make an abstract syntax tree (AST) from a ParseResults object. I do not understand the data model that is the ParseResults object as it relates to holding the information of an AST. (I have spent a whole day on this so far, so it is time to ask for help.) I am writing a Prolog parser, so I really do need an AST as opposed to some flattened representation like ParseResults. Order matters. Type matters. Depth matters. Sample Input ------------ This is only Datalog with atoms, integers, and variables. ''' % Facts r1(a). r1(b). r1(c). r2(a, 1). r2(a, 2). r2(d, 3). r2(e, 4). % Clauses r3(A, B) :- r1(A), r2(A, B). ''' Desired Output -------------- I would like to get something back from parsing like the following: ('datalog', [ ('comment', '% Facts'), ('fact', [('name', 'r1'), ('body', [('atom', 'a')])]), ('fact', <result>), ... ('clause', [('head', <result>), ('body', [<result>...])]), ] ) where <result> stands for the parse result from deeper recursion. The structure is just lists of pairs. Each pair is the name (type) of the parsed object and its content, the result. Parsing ------- I believe I have my grammar implemented correctly in pyparsing as I get the expected AST back as XML by calling ParseResults.asXML(). (Except that I have an extra top-level element, "<datalog><datalog>...</datalog></datalog>", but that is easy to workaround.) Tried ----- I have tried to understand the ParseResults object. Clearly the AST information exists as ParseResults.asXML() gives me an AST in XML. However, some of the information used in asXML() is not part of the public API. I have tried to use parse actions to construct my own AST but I couldn't figure this out either (in part because the product of parsing is still a ParseResults object). Options ------- * Use asXML() and reparse the XML. Not performant. Too much code. Misses the point. * Write asAST() into pyparsing.py as a near clone of asXML(). This is my current favorite option. (I'm on a deadline.) * Use parse actions effectively. * Use a different parsing library (I have also tried pyPEG.) * Other? Any suggestions and/or help would be very much appreciated. Sincerely, Aubrey ---------------------------------------------------------------------------- -- All the data continuously generated in your IT infrastructure contains a definitive record of customers, application performance, security threats, fraudulent activity and more. Splunk takes this data and makes sense of it. Business sense. IT sense. Common sense. http://p.sf.net/sfu/splunk-d2dcopy1 _______________________________________________ Pyparsing-users mailing list Pyp...@li... https://lists.sourceforge.net/lists/listinfo/pyparsing-users |
From: Aubrey B. <ba...@cs...> - 2011-10-04 16:33:47
|
Paul, I want to thank you for providing such thorough help on this issue. It turns out I was 90% of the way there and just needed to connect a few more dots. What I Learned -------------- My main misunderstanding was related to how parse actions and ParseResults work together. I now understand that a parse action is passed a ParseResult that functions like a list of tokens. Actually, it is a list of whatever is returned from deeper calls of parsing, so it can be tokens or AST objects or whatever. This list of children is accessed via ParseResult.asList(). So all a parse action has to do to create an AST is package up its children appropriately into some AST object and return that object. If you take this approach groups are unnecessary. I only used a group once when I needed to handle multiplicity in one of the children of a pyparsing element and didn't want to create another pyparsing element specifically for that "sub-expression". (I effectively combined multiple levels/elements in my grammar into one pyparsing element and thus needed a bit of extra structure.) For example (sufficient excerpts to illustrate): def getAstFromParseResult(parseResult): return parseResult.asList()[0] # Packages a single token as an atomic AST object def astVariableParseAction(parseResult): return ('variable', getAstFromParseResult(parseResult)) # Packages named "sub-expressions" into an AST object # "head" is an AST object from a deeper call # "body" can have multiple children in the AST, so the whole ParseResult list is incorporated def astClauseParseAction(parseResult): return ('clause', ('head', parseResult.head), ('body', parseResult.body.asList())) clause.setParseAction(astClauseParseAction) Further response ---------------- In my case I really do need an AST and strings are my "object representation." My task is to parse the Prolog, gather some information about it, compile some related information from a different source, and spit out the original with annotations as comments. The annotations include both specific and aggregate information so I need to process the whole AST before starting output. In the process I do end up converting a few parts of the AST to objects (e.g. certain integers that are keys for other information), but the majority of the AST elements need no conversion and all have to be output as strings, so a "full" object representation would actually be more work. Prior to asking for help, I looked at some of the examples but most of them seemed to concentrate on implementing the grammar, not dealing with the results of parsing. No example description includes "ast" or "tree". Unfortunately, SimpleBool.py would not have helped clarify things for me as there is no mention of parse actions or parse results which were my main confusions. Feature Suggestion ------------------ Some form of incremental parsing would be a nice addition. I am thinking of situations (like Prolog) where you have many top-level elements to parse, but each is relatively simple. It would be nice to be able to parse them one-by-one in an iterative fashion: # lang :: ( comment | fact | clause )* for langElement in lang.parseIncremental(input): if langElement.name == 'fact': ... elif langElement.name == 'clause': ... ... This would allow parsing to scale well to large inputs as an entire AST would never have to be constructed. The only complication is handling when the repetition is a few levels down in the grammar (like class members in Java or body elements in HTML for example). (I could probably retrofit my task to the incremental model and see some fair efficiency gains.) Aubrey On 09/29/2011 01:10 AM, Paul McGuire wrote: > Aubrey - > > If all you want is an AST, then you should be able to construct this using > Group's and parseActions. For instance, let's say you want a parser for an > assignment statement, where the lvalue can be a variable and the rvalue can > be simple expression, with terms that can be integers, variables, or > function calls. > > """ > Simple BNF: > > assignment :: lvalue '=' rvalue > lvalue :: variableRef > rvalue :: term [ op term ] > term :: fnCall | variable | integer > fnCall :: fnName '(' [rvalue[,rvalue]*] ') > variable :: [a-zA-Z_][a-zA-Z0-9_]* > integer :: \d+ > op :: '+' | '-' > """ > > from pyparsing import * > > LPAR,RPAR,EQ = map(Suppress, "()=") > op = oneOf("+ -") > integer = Word(nums) > identifier = Word(alphas+'_', alphanums+'_') > variableRef = identifier.copy() > fnName = identifier.copy() > rvalue = Forward() > arglist = delimitedList(rvalue) > fnCall = fnName + Group(LPAR + Optional(arglist, default=[]) + RPAR) > rvalue<< (fnCall | variableRef | integer) > assignment = variableRef + EQ + Group(rvalue) > > print assignment.parseString("y = sin(60)-180").asList() > > Will print: > > ['y', ['sin', ['60']]] > > Knowing that this was an assignment statement, you could then use assignment > unpacking to pull out the left and right hand sides of the assignment: > > lhs, rhs = assignment.parseString("...").asList() > > > With some simple parse actions, you can decorate these structures similar to > what you show in your AST example: > > assignment.setParseAction(lambda t : ['assign'] + t.asList()) > fnCall.setParseAction(lambda t : ['function-call'] + t.asList()) > print assignment.parseString("y = sin(60)-180").asList() > > will print: > > ['assign', 'y', ['function-call', 'sin', ['60']]] > > > But now I have to ask, what do you plan to do with this AST? I suspect that > the next step is to walk the structure, and perform actions depending on the > decorating labels. Instead, I would suggest you use the parse actions to > construct representational objects. At parse time, you already know what > the substructures are going to be ("a function call will have a name and a > potentially empty argument list" for instance), why not use a parse action > to construct an object that gives a representation for this? > > class ASTNode(object): > def __init__(self, tokens): > self.tokens = tokens > self.assignFields() > def __str__(self): > return self.__class__.__name__ + ':' + str(self.__dict__) > __repr__ = __str__ > > class Assignment(ASTNode): > def assignFields(self): > self.lhs, self.rhs = self.tokens > del self.tokens > > class FunctionCall(ASTNode): > def assignFields(self): > self.fnName, self.args = self.tokens > del self.tokens > fnCall.setParseAction(FunctionCall) > assignment = (variableRef + EQ + rvalue).setParseAction(Assignment) > > print assignment.parseString("y = sin(60)-180")[0] > > Will print: > > Assignment:{'rhs': FunctionCall:{'fnName': 'sin', 'args': (['60'], {})}, > 'lhs': 'y'} > > > Now we have bypassed the intermediate AST step, and gone straight to an > object representation. These classes can be extended to include an eval or > exec method, which could be directly called, instead of retracing steps > already taken during parse time - why test args with isdigit, for instance? > At parse time, your grammar already defines when an integer is being > encountered - just add a parse action that does the string-to-integer > conversion. > > I think the SimpleBool.py example on the pyparsing wiki gives a more > complete treatment of this concept. In the article I wrote for Python > magazine, I showed how you could even pickle these representational objects, > and then unpickle and execute them in varying VM's (GUI, console, server) > without reparsing the original code. > > If you just want an AST, use Groups to add structure, and parse actions to > decorate the groups with your AST structure labels. I think though that the > representational objects are more powerful. > > -- Paul > > > > > -----Original Message----- > From: Aubrey Barnard [mailto:ba...@cs...] > Sent: Wednesday, September 28, 2011 7:35 PM > To: pyp...@li... > Subject: [Pyparsing] ParseResults as AST? > > Pyparsing users, > > I am having trouble figuring out how to make an abstract syntax tree > (AST) from a ParseResults object. I do not understand the data model > that is the ParseResults object as it relates to holding the information > of an AST. (I have spent a whole day on this so far, so it is time to > ask for help.) > > I am writing a Prolog parser, so I really do need an AST as opposed to > some flattened representation like ParseResults. Order matters. Type > matters. Depth matters. > > Sample Input > ------------ > > This is only Datalog with atoms, integers, and variables. > > ''' > % Facts > r1(a). r1(b). r1(c). > r2(a, 1). r2(a, 2). r2(d, 3). r2(e, 4). > % Clauses > r3(A, B) :- > r1(A), > r2(A, B). > ''' > > Desired Output > -------------- > > I would like to get something back from parsing like the following: > > ('datalog', > [ > ('comment', '% Facts'), > ('fact', [('name', 'r1'), ('body', [('atom', 'a')])]), > ('fact',<result>), > ... > ('clause', [('head',<result>), ('body', [<result>...])]), > ] > ) > > where<result> stands for the parse result from deeper recursion. The > structure is just lists of pairs. Each pair is the name (type) of the > parsed object and its content, the result. > > Parsing > ------- > > I believe I have my grammar implemented correctly in pyparsing as I get > the expected AST back as XML by calling ParseResults.asXML(). (Except > that I have an extra top-level element, > "<datalog><datalog>...</datalog></datalog>", but that is easy to > workaround.) > > Tried > ----- > > I have tried to understand the ParseResults object. Clearly the AST > information exists as ParseResults.asXML() gives me an AST in XML. > However, some of the information used in asXML() is not part of the > public API. > > I have tried to use parse actions to construct my own AST but I couldn't > figure this out either (in part because the product of parsing is still > a ParseResults object). > > Options > ------- > > * Use asXML() and reparse the XML. Not performant. Too much code. Misses > the point. > * Write asAST() into pyparsing.py as a near clone of asXML(). This is my > current favorite option. (I'm on a deadline.) > * Use parse actions effectively. > * Use a different parsing library (I have also tried pyPEG.) > * Other? > > Any suggestions and/or help would be very much appreciated. > > Sincerely, > > Aubrey > > ---------------------------------------------------------------------------- > -- > All the data continuously generated in your IT infrastructure contains a > definitive record of customers, application performance, security > threats, fraudulent activity and more. Splunk takes this data and makes > sense of it. Business sense. IT sense. Common sense. > http://p.sf.net/sfu/splunk-d2dcopy1 > _______________________________________________ > Pyparsing-users mailing list > Pyp...@li... > https://lists.sourceforge.net/lists/listinfo/pyparsing-users > |