Thread: [Pyparsing] How can I determine if a fragment is valid?
Brought to you by:
ptmcg
From: Andrew S. <agt...@ya...> - 2008-01-14 06:32:23
|
I'd like to have my application accept incremental input until a statement conforming to my grammar has been entered. My current approach is simplistic but works well enough. I create a small "Interpreter" to buffer input until it's ready. Something like this: class Interpreter(object): def __init__(self, grammar): self.buffer = [] self.grammar = grammar def push(self, line): result = None self.buffer.append(line) try: result = self.grammar.parse(self.buffer) except ParseException, pe: pass if result: self.buffer = [] return result Now I'd like to add the following feature to my Interpreter. If the buffer has enough input to determine that the contents will _never_ be valid I'd like to re-raise the ParseException so the clients of the Interpreter can stop accepting input. Is there a way to determine if a "fragment" has the potential to be parsed by my grammar? I thought about comparing positions of the parse element in the exception with the elements in the grammar but wanted to check with the list to find out if there is an accepted way of doing this. -a. ____________________________________________________________________________________ Never miss a thing. Make Yahoo your home page. http://www.yahoo.com/r/hs |
From: Paul M. <pt...@au...> - 2008-01-14 08:07:21
|
Andrew - You may be able to infer something like this based on the location of the raised exception. For instance, consider this grammar: grmr = Literal("A") + ( Literal("B") + Literal("C") | Literal("D") + Literal("E") ) A full match would require one of these input strings (with whitespace allowed, of course): ABC ADE So any of these partial strings would indicate that there could still be a match: A AB AD but they would raise a ParseException since they are not complete. The thing to note is, that the loc field of the raised exception is equal to the length of the input string, telling you that the missing piece would be found at the end of what was given, but that the input so far did match the grammar. By contrast, these strings are not partially valid: B AC ABX In these cases, the loc field of the raised exception is less than the length of the input string, telling you that the provided string is not a partial match. This is a very simplistic example, you should test this idea a bit more thoroughly with your own specific grammar. I haven't thought it through myself for more than about 10 minutes, so please let me know how it works out. -- Paul |
From: Andrew S. <agt...@ya...> - 2008-01-15 09:47:12
|
Paul, Thanks for the quick reply. So far using pe.loc < len(input) works for me. I'll reply to the list if I can find a counter example. Can I count on loc sticking around in the ParseException class? Not to push my luck, but I've got a grammar question too. I'm trying to define a grammar that uses a skipTo(Optional(xxx)) and not having much success. Is there a better way to go about this? An example select where foo = 1 and bar = 2 into result I was hoping to end up with something like the following: into_clause = Keyword('into') + restOfLine Keyword('select') + skipTo(Optional(into_clause)).setResultsName('where_condition') Is there a way to do this without forcing some delimiters onto the where clause? Thanks again for such a useful library. -a. --- Paul McGuire <pt...@au...> wrote: > Andrew - > > You may be able to infer something like this based > on the location of the > raised exception. For instance, consider this > grammar: > > grmr = Literal("A") + ( Literal("B") + Literal("C") > | Literal("D") + > Literal("E") ) > > A full match would require one of these input > strings (with whitespace > allowed, of course): > > ABC > ADE > > So any of these partial strings would indicate that > there could still be a > match: > > A > AB > AD > > but they would raise a ParseException since they are > not complete. The > thing to note is, that the loc field of the raised > exception is equal to the > length of the input string, telling you that the > missing piece would be > found at the end of what was given, but that the > input so far did match the > grammar. > > By contrast, these strings are not partially valid: > > B > AC > ABX > > In these cases, the loc field of the raised > exception is less than the > length of the input string, telling you that the > provided string is not a > partial match. > > This is a very simplistic example, you should test > this idea a bit more > thoroughly with your own specific grammar. I > haven't thought it through > myself for more than about 10 minutes, so please let > me know how it works > out. > > -- Paul > > > > ____________________________________________________________________________________ Never miss a thing. Make Yahoo your home page. http://www.yahoo.com/r/hs |
From: Paul M. <pt...@au...> - 2008-01-15 14:55:33
|
Thanks for the quick reply. So far using pe.loc < len(input) works for me. I'll reply to the list if I can find a counter example. Can I count on loc sticking around in the ParseException class? <PM> Great! Yes, loc is an important part of ParseException, and has been in pyparsing since version 0.5. What would give you the idea that it might not stick around? Not to push my luck, but I've got a grammar question too. I'm trying to define a grammar that uses a skipTo(Optional(xxx)) and not having much success. Is there a better way to go about this? select where foo = 1 and bar = 2 into result I was hoping to end up with something like the following: into_clause = Keyword('into') + restOfLine Keyword('select') + skipTo(Optional(into_clause)).setResultsName('where_condition') <PM> Hmm, SkipTo (leading "S" is capitalized) really wants to have a predictable target expression. SkipTo can be embedded inside an Optional, but you can't skip to an optional thing. You *can* skip to one thing or another, as in SkipTo(A | B), and the SkipTo will stop at whichever comes first. If A may or may not be present, then make B a StringEnd(). where_clause = SkipTo(into_clause | StringEnd()) This is actually almost readable English - "skip to either the into_clause or the end of the input string". But if all of your where clauses are this simple, you might take the time to define a where_clause expression, probably using operatorPrecedence to take care of things like nested parentheses. Here is how to use operatorPrecedence: 1. Identify the basic operand of your expression. In this case, each where clause is a boolean expression of logical comparisons. A logical comparison is of the form: identifer = Word(alphas) relationalOperator = oneOf("< = > >= <= != <>") integer = Word(nums) value = integer | sglQuotedString logicalComparison = identifier + relationalOperator + value 2. Identify the operators. Logical expressions usually allow AND, OR, and NOT. We'll define caseless versions of each: AND_cl = CaselessLiteral("AND") OR_cl = CaselessLiteral("OR") NOT_cl = CaselessLiteral("NOT") 3. Call operatorPrecedence with these operators, to compose a grammar. complexComparison = operatorPrecedence( logicalComparison, [ (NOT_cl, 1, opAssoc.RIGHT), (OR_cl, 2, opAssoc.LEFT), (AND_cl, 2, opAssoc.LEFT), ]) operatorPrecedence is called using the base operand, followed by a list of tuples describing each operator or group of operators. Each tuple contains the operator, the value 1 or 2 indicating whether it is a unary or binary operator, and the opAssoc.LEFT or opAssoc.RIGHT value indicating whether the operator is right or left associative. With the example you provided, this should be enough to define a where_clause expression: where_clause = CaselessLiteral("where") + complexComparison You may have to expand the value expression to support real numbers or identifiers, I hope this is clear how you would do so - what I've provided will match integers or single-quoted strings. Probably more than you asked for, if it is too much to deal with now, just go with the SkipTo alternative, and come back to the rest of this later. -- Paul |
From: Andrew S. <agt...@ya...> - 2008-01-17 02:32:25
|
--- Paul McGuire <pt...@au...> wrote: > > > Thanks for the quick reply. So far using pe.loc < > len(input) works for me. > I'll reply to the list if I can find a counter > example. Can I count on loc > sticking around in the ParseException class? > > <PM> Great! Yes, loc is an important part of > ParseException, and has been > in pyparsing since version 0.5. What would give you > the idea that it might > not stick around? > (Grammar discussion snipped) Since the algorithm you suggested for maybeParseable() works well but is not a formal part API I kind of feel like I'm going in through the "back door" to get this done. It's probably just perception on my part, but relying on direct attribute access for the parse error location also feels like it could change. This is most likely a holdover from habits gained doing too much Java beans attribute access. Even though I'd like getters and setters to die a horrible death they did sort of provide a feeling of permanence. -a. ____________________________________________________________________________________ Be a better friend, newshound, and know-it-all with Yahoo! Mobile. Try it now. http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ |
From: Paul M. <pt...@au...> - 2008-01-15 15:08:31
|
Here is the complete listing of the SQL expression parser I described in the last message. -- Paul from pyparsing import * sql1 = "select where foo = 1 and bar = 2 into result" sql2 = "select where foo = 1 and bar = 2" into_clause = Keyword('into') + restOfLine selectStmt = Keyword('select') + SkipTo(into_clause|StringEnd()).setResultsName('where_condition') identifier = Word(alphas) relationalOperator = oneOf("< = > >= <= != <>") integer = Word(nums) value = integer | sglQuotedString logicalComparison = identifier + relationalOperator + value AND_cl = CaselessLiteral("AND") OR_cl = CaselessLiteral("OR") NOT_cl = CaselessLiteral("NOT") complexComparison = operatorPrecedence( logicalComparison, [ (NOT_cl, 1, opAssoc.RIGHT), (OR_cl, 2, opAssoc.LEFT), (AND_cl, 2, opAssoc.LEFT), ]) where_clause = CaselessLiteral("where") + complexComparison selectStmt = Keyword('select') + where_clause('where_condition') + \ Optional(into_clause)('into_clause') print selectStmt.parseString(sql1).dump() print selectStmt.parseString(sql2).dump() Prints: ['select', 'where', [['foo', '=', '1'], 'AND', ['bar', '=', '2']], 'into', ' result'] - into_clause: ['into', ' result'] - where_condition: ['where', [['foo', '=', '1'], 'AND', ['bar', '=', '2']]] ['select', 'where', [['foo', '=', '1'], 'AND', ['bar', '=', '2']]] - where_condition: ['where', [['foo', '=', '1'], 'AND', ['bar', '=', '2']]] |