Thread: [Pyparsing] Using pyparsing to implement a matching engine for a regexp-like matching language?
Brought to you by:
ptmcg
From: Andrew W. <and...@gm...> - 2008-04-09 12:44:49
|
I am trying to write a matching engine for a matching language for a filtering proxy compatible with that of The Proxomitron. The matching language is basically an extended superset of shell-style globs, with functionality comparable to regexps (see http://www.proxomitron.info/45/help/Matching%20Rules.html). I think that I can implement the parser for the match and replace patterns themselves using pyparsing, but I am not exactly sure of how to implement the "core" of the matching engine - the part that takes an input string, finds a match, and extracts any parts specified in the pattern. I could almost use pyparsing as a matching engine as well, by dynamically generating a parser from the pattern and parsing the data to be filtered with it, but pyparsing appears to be a pull parser (one that takes all its input at once), as opposed to a push parser (which can be fed multiple chunks of input). Also, non-matching input might cause parse errors. A matching engine for a filtering proxy has to be able to handle partial input and "hold" data until enough is received to determine whether there is a match (or else the entire document would have to be held until the end is reached, filtered, and then sent all at once to the remote client, and that would make it appear much less responsive and possibly break some applications). Is there a way to use pyparsing as a push parser, or can anyone recommend any other Python libraries that would be suitable for my purposes? I don't want to reinvent the wheel unless no suitable library exists. |
From: Andrew W. <and...@gm...> - 2008-04-09 12:51:00
|
I am trying to write a matching engine for a matching language for a filtering proxy compatible with that of The Proxomitron. The matching language is basically an extended superset of shell-style globs, with functionality comparable to regexps (see http://www.proxomitron.info/45/help/Matching%20Rules.html). I think that I can implement the parser for the match and replace patterns themselves using pyparsing, but I am not exactly sure of how to implement the "core" of the matching engine - the part that takes an input string, finds a match, and extracts any parts specified in the pattern. I could almost use pyparsing as a matching engine as well, by dynamically generating a parser from the pattern and parsing the data to be filtered with it, but pyparsing appears to be a pull parser (one that takes all its input at once), as opposed to a push parser (which can be fed multiple chunks of input). Also, non-matching input might cause parse errors. A matching engine for a filtering proxy has to be able to handle partial input and "hold" data until enough is received to determine whether there is a match (or else the entire document would have to be held until the end is reached, filtered, and then sent all at once to the remote client, and that would make it appear much less responsive and possibly break some applications). Is there a way to use pyparsing as a push parser, or can anyone recommend any other Python libraries that would be suitable for my purposes? I don't want to reinvent the wheel unless no suitable library exists. |
From: Paul M. <pt...@au...> - 2008-04-09 14:17:50
|
Andrew - Welcome to Pyparsing! This is a very interesting application. There are two examples in the pyparsing wiki that may help you in your work: an EBNF parser and generator - parses EBNF and generates a pyparsing grammar that implements that BNF (http://pyparsing.wikispaces.com/space/showimage/ebnf.py and http://pyparsing.wikispaces.com/space/showimage/ebnftest.py) a regex inverter - parses a regex (restricted forms only) and returns a generator that generates all input strings that would match that regex (http://pyparsing-public.wikispaces.com/space/showimage/invRegex.py) Unfortunately, I'd say these are both fairly advanced examples, and maybe a bit overwhelming to a new pyparsing user. Please work through some of the simpler examples (marked with a "Start" icon) and then take a look at these others. The basic "engine" or structure of a pyparsing program is: - compose the parsing expressions - call parseString - process the returned ParseResults object (can be thought of like a list, but advanced usages will return nested lists, and also dict-style results, with matched tokens retrievable by name) Here is the basic "Hello, World!" (or any greeting of the form "<word> , <word> <punctuation>") parser: from pyparsing import Word, alphas, oneOf # compose the parsing expression, using pyparsing classes and builtins greet = Word(alphas) + "," + Word(alphas) + oneOf("! . ?") # parse the input string greetingTokens = greet.parseString("Howdy, Pardner!") # process the returned results for i,token in enumerate(greetingTokens): print i,token Prints: 0 Howdy 1 , 2 Pardner 3 ! I'd also recommend following some of the documentation links for more detailed examples and discussions. I have considered converting pyparsing to a streaming-type (or push) parser, but as it turns out, this would be a radical redesign, maybe in version 2 someday. I don't know of any other modules to refer you to that might support push parsing. -- Paul |