Thread: [Pyparsing] results typing
Brought to you by:
ptmcg
From: spir <den...@fr...> - 2008-11-16 18:27:59
|
Hello, pyParsing world! [rough version -- I have not read over this message] -0- intro ======== Now is time for serious things! Below a kind of study on this subject I have brought up & approached several times already: result typing. Here is 'type' primarly used in its non-technical sense. Probably there are whole parsing use fields where result types are not such important. Each time, in fact, the types of results is predictable, they need not be explicitely defined. For instance, a file may contain, or one may extract, only data of a single type. Or a file format may define types of data in a constant repeated order, such as x y color x y color... Still the general situation, I guess, is so that we cannot predict in which order types of valuable data will happen in source texts, so that having them in the results would be highly helpful. This especially applies when parsing texts written in any king of /language/. The type of a result is similar to the one of any kind of data item: it carries the sense of the result. Without it, we are unable to do anything out of it; like without a type we, as well as the language "decoder" itself, do not even know what kind of operation may apply to a bit of data. If the results do not hold their type, we are obliged to re-parse them only to determine what kind of thing they are. pyParsing provides two functions to give patterns names -- I will talk about them later. -1- pattern idS, result typeS ======== What is the name of a pattern?; what is the type of a result? What is, actually, the link between patterns & results? Patterns define or generate results. They are /classes/ of results, in a similar manner as (programming) types are classes of instances. Actually, patterns could be (programming) types -- but this wouldn't fit in pyParsing. Results are like pattern samples, they share characteristics which are specified in/by patterns. A pattern identifier (name, id) thus defines its potential results' type. pattern object <--> result type pattern object id <--> result type id Now, there are actually several kinds of patterns IDs matching several kinds of result types: integer = Word(nums).setName("int") decimal = integer("int_part") + '.' + integer("dec_part") Basically, a pattern usually defines the /nature/ of results, like in the first line above. Now, a single pattern may have several use cases, like in the second, like, which define several results' /roles/. I intentionally used setName to define pattern names and setResultName (abbreviated in call) to define use cases -- but obviously nothing forces us to de that. The example can be extended to show the difference between result nature and role more accutely: integer = Word(nums).setName("integer") decimal = Combine(integer("int_part") + '.' + integer("dec_part")).setName("decimal") num = decimal | integer mult = num("left-num") + '*' + num("right-num") Both integers & decimals (nature) may be left-nums or right-nums (role). pattern id <--> result nature pattern use <--> result role Depending on the application, results nature, role or both may be relevant information. -2- pyParsing ======================= As I have used pyParsing for a few weeks only, I may say stuppid things. But, i have tried hard to find friendly ways to get such info from parse results -- I could not find any. Actually, I ended up with: * additional data to patterns * a custom result type * changes in pyParsing code First, patterns basicaly do not know anything about themselves. Especially, they do not know they are, not even their (variable) name. If patterns would be types, they would know it; but custom type do not have a __name__ attribute to receive their (variable) name. Pity. We nevertheless can give a pattern a name with setName, or setResultsName. The main problem anyway is that there no interconnection between patterns and results. A result have no access to the pattern that yielded it, not even a simple reference. A pattern only passes ResultsName at result init time. ResultsName only pattern --o--> results pattern <--x-- results nothing An additional obstacle comes from the protection of results access by __slots__, for performance reasons, which prevents setting/reading custom attributes. Fortunately, patterns are not protected. -3- letting patterns know =============== We can use a simple trick to let patterns know a bit about themselves. If they are put in a scope (e.g. separate module or class), we have access to a dict that holds together names and objects. With that information, we have all we need to tweak in patterns guts. Assuming the Grammar is in a class, we could even have a class method to do the job. [Note: the name can't be called '.name', as this name (!) is used by pyParsing to format pattern repr output, esp. for error display.] It may look like that: class Grammar(object): ''' pyParsing grammar ''' integer = Word(nums) decimal = Combine(integer("int_part") + '.' + integer("dec_part")) num = (decimal | integer).setName("num") mult = Group(num("left-num") + '*' + num("right-num")) calc = OneOrMore(mult) @classmethod def _setNames(Grammar): ''' give patterns their name ''' # exclude '-*' names attribs = Grammar.__dict__.items() namedPatterns = filter(lambda (name,pattern): name[0]!='_', attribs) # set .id attributes for (name,pattern) in namedPatterns: pattern.id=name Grammar.patterns = [pattern for (name,pattern) in namedPatterns] Grammar._setNames() for pattern in Grammar.patterns: print "%s: %s" %(pattern.id,pattern) ===>> num: num integer: W:(0123...) calc: {Group:({{num "*"} num})}... decimal: Combine:({{W:(0123...) "."} W:(0123...)}) mult: Group:({{num "*"} num}) Now, we have a proper tool to automatically name patterns. Manual setName is no more necessary, it can serve more specific needs such as delivering clearer info to users. We are ready to transmit results information about their use. ResultsName can set info about use cases. -4- results structure ==================== I have posted a message displaying a type called 'Data'. [Still not really allright, I have discovered a bug.] If ever results magically could receive the information that patterns now hold, about their nature and role, we could use Data objects to properly hold and display typed results. Output may then look like that: calc:[mult:[dec:1.1 <str>:* int:2] mult:[int:1 <str>:* dec:2.2]] The types shown like prefixes would be taken from the most accurate information available: * role = pattern use (e.g. left_num) * nature = pattern id (e.g. integer) * pattern format (as presently held in .name, e.g. W:(0123...)) * result type (e.g. <str> or <int>) Now, we have to find a way to let the results know about that. -5- passing info to results ===================== As results have no access to patterns, we are presently blocked. If we just gave them a reference to the patterns, we would be unblocked. I did some explorations & trials, and it seems allright. Things to do: * Add specific fields to patterns: id, nature * Add a reference to pattern at result's instanciation. This happens 3 times in the method _parseNobuffer of the class ParserElement. 'self' can be added there as new argument for result initialisation. For instance: retTokens = ParseResults(tokens, self.resultsName, asList=self.saveAsList ,modal=self.modalResults, pattern=self) this arg becomes a 'pattern' param in ParseResult's __new__ & __init__ methods. * Add private attribs to ParseResult. In __init__: self.__pattern = pattern self.__nature = pattern.id self.__role = pattern.use And matching accessors (because access is protected). E.g.: def pattern(): return self.__pattern denis |
From: Paul M. <pt...@au...> - 2008-11-16 18:52:29
|
Denis - Thanks for your contributions on this list - please don't be discouraged if you don't get many replies to your messages. The readers here are mostly lurkers (not that there's anything wrong with that!), or folks who post with particular questions they need answered. I too look at the list as something of an archive of past pyparsing discussions. I've had a chance to look over your notes in brief, but I really want to give them more thought and consideration than I can spare just now. In general, I would say that the best way to start prototyping the integration of your ideas with pyparsing as it exists is through parse actions (for embellishing parse results) and helper methods (to simplify the construction of expressions, or linking them to parse actions). Whatever you do, I *strongly* suggest that you make these enhancements under the control of the developer, and not automatically applied across the board. Pyparsing creates many intermediate parse expressions when building the grammar, and parse results when parsing the source text, many of which get either discarded or absorbed into a larger expressions. Also, if your enhancements stay in the realm of extensions that users may add or not of their own choosing, then forward compatibility of existing code will be preserved, and it will be easier to add your ideas to the core pyparsing code. Here is one idea that might address your question about linking results to the original pyparsing expression (untested): from pyparsing import * def linkToResults(expr): def parseAction(tokens): tokens["expr"] = expr expr.addParseAction(parseAction) integer = Word(nums).setName("integer") decimal = Combine(integer("int_part") + '.' + integer("dec_part")) linkToResults(decimal) print decimal.parseString("3.14159").dump() Which prints: ['3.14159'] - dec_part: 14159 - expr: Combine:({integer "." integer}) - int_part: 3 Now the parse results that get created by expr will contain an added field named "expr" that points back to the original expression (as shown by the dump() output). If this works well as a prototype, then it may be just as easy to add it as a member function of ParserElement, so that any expression can get linked back to from its results. -- Paul |