[Pyparsing] the typed Data type
Brought to you by:
ptmcg
From: spir <den...@fr...> - 2008-11-16 10:14:49
|
Hello, [As I don't if there anybody else on this list, well... I use it like a log for ideas and trials using pyParsing; and an oppportunity to express them clearly (?). denis] Here is an implementation of a custom type used to give parse results a alternative structure, and an illustration of what it is intended to. Data (sic!) is primarly used to natively give (nested) parse results a /type/. I will come back to this point of view that results should be typed in a further message. So, Data allow results to have a type -- like ordinary dat, hence the name -- and show in a type:content format. Actually, the implementation i uselessly complicated because presently it is able to receive content from several kinds of sources: final parse result, parse results created during the parsing process, ordinary data, objects that already are of type Data. This makes its typing and content reading overly complex. (ToDo: implement __new__ for the case when content of Data object.) Additionaly, it holds currently useless list-like operator overloading. For more specific use, it could be written in a dozen of lines, as it was before. I also added a Seq type to avoid a problem with built-in lists. Additionally, Data is able to receive content from any kind of simple or sequential object. The type property may be defined from several source, here listed from the most specific to the least one: * arg passed at init [ParseResults object only] * ResultsType retrieved from getName() [ditto] * pattern's .use or .ResultsType * pattern's .id or .name * pattern's type_name * result's own type_name Some sources of info listed above for typing an object belong in fact to further exploration about pattern naming that I will present in another post. Here is the Data thing: def typ(obj): return obj.__class__.__name__ class Seq(list): ''' specialized sequence type with improved str Override list's behaviour that str(list) calls repr instead of str on items. ''' def __str__(self): if len(self) == 0: return '[]' text = str(self[0]) for item in self[1:]: if isinstance(item, list): item = Seq(item) text += " ,%s" %item return "[%s]" %text class Data(object): ''' nestable type:content object with built_in toolset ''' def __init__(self, content, type=None, pattern=None): ''' store startup data ''' self.type = type # read info from pattern, if available self.read_pattern(pattern) # case content is ParseResults: extract proper info if isinstance(content,ParseResults): content, self.type = self.from_result(content, type) # case (new) content is Data object: copy if isinstance(content,Data): self.type, self.pattern = content.type, content.pattern self.content, self.isSimple = content.content, content.isSimple # case content is ordinary data: record it else: self.content = self.recursive_record(content) # define type if not given by user, nore read from pattern if not self.type: self.type = "<%s>" %typ(self.content) #print "* new Data - %s" %self def read_pattern(self,pattern): ''' if available, read info from pattern about source of result ''' self.pattern = pattern self.nature = self.role = self.pattern_type_name = None # get info about source of result if pattern: # pattern_type_name (e.g. Literal, MatchFirst, Group...) self.pattern_type_name = typ(pattern) # role <-- pattern.use: pattern use case try: self.role = pattern.use except AttributeError: self.role = pattern.ResultsName # nature <-- pattern name/id : pattern naming try: self.nature = pattern.id except AttributeError: try: self.nature = pattern.name except AttributeError: pass # if not yet set, try and define type from this info if not self.type: if self.role: self.type = self.role elif self.nature: self.type = self.nature elif self.pattern_type_name: self.type = "<%s>" %self.pattern_type_name def from_result(self,content,type): ''' define properties from result data ''' # try & set type from user-defined info if (not type) and content.getName(): type = content.getName() # jump inside Group if not sequence if len(content)==1: content = content[0] # take result as list if isinstance(content,ParseResults): content = content.asList() return content, type def recursive_record(self,content): ''' record content according to its structure ''' # return if isSimple if not isinstance(content,list): self.isSimple = True return content # === case complex / nested # mutate each nested item to Data object # may already be a Data object -- or not content = Seq(content) self.isSimple = False seq = Seq() for item in content: if isinstance(item,Data): seq.append(item) else: seq.append(Data(item)) return seq def treeView(self, noType=False, showGroup=False, level=0): ''' return full & legible tree view of object's data ''' tree = '' # this level's line tree += level * '\t' if not noType: tree+= "%s: " %self.type if self.isSimple or showGroup: tree+= "%s" %self.content_text() tree += "\n" # recursion for nested results if not self.isSimple: for item in self.content: tree += item.treeView(noType, showGroup, level+1) # final result return tree def leaves(self, noType=False): ''' return a flat list of 'terminal', low-level, object items -- actually called 'leaves' ''' seq = Seq() # case simple result : add content to seq if self.isSimple: if noType: seq.append(self.content) else: seq.append(self) # case compound result: recursively explore nested result else: for item in self.content: seq.extend(item.leaves(noType)) return seq def allFlat(self, noType=False): ''' return full flat list of object'items -- either compound or simple ''' seq = Seq() # in all cases : add content to seq if noType: seq.append(self.content) else: seq.append(self) # case compound result: recursively explore nested result if not self.isSimple: for item in self.content: seq.extend(item.allFlat(noType)) return seq def __len__(self): try: return len(self.content) except TypeError: return 0 def __getitem__(self,index): return self.content[index] def __getslice__(self,i1,i2): return self.content[i1:i2] def __repr__(self): ''' type:content format ''' return "%s:%s" %(self.type, self.content_text()) def content_text(self): ''' content expression for either simple or sequential content ''' # case simple content: just output as is if self.isSimple: return str(self.content) # case compound content: resursive text seq in [] else: text = str(self.content[0]) for item in self.content[1:]: text += " %s" %item return "[%s]" %text Below are illustrations for two use cases: -1- Parsing is done normally. The results feed a Data object. Both normal and Data results are printed, so that the difference is made clear. Additionally, tree view, 'leaves' & flat list of all-level nested results are also shown -- see my previous for more info about these latter things. Contained results are recursively converted into Data objects, so that all end up typed. -2- The parser is cheated to make it 'natively' return Data object instead of ParseResults one. Actually, for the sake of illustration, only important (named) result are converted. But it makes no difference to convert all, for anymay nested results will be recursively converted into Data objects. # === Data retrieved from final parse results ================== class Grammar(object): # tokens integer = Word(nums) integer.setParseAction(lambda i: int(i[0])) point = Literal('.') decimal = Combine(integer + point + integer) decimal.setParseAction(lambda x: float(x[0])) #decimal = Group(decimal)("dec") add = Literal('+') mult = Literal('*') # symbols num = decimal | integer mult_op = Group(num + mult + num)("mult_op") add_op = Group((mult_op|num) + add + (mult_op|num))("add_op") #group = Group(l_paren + in_op + r_paren)("group") operation = (add_op|mult_op) calcs = OneOrMore(operation)("calcs") calcs = Grammar.calcs # source text text = "1+2.2*3 4.4*5+6.6" print text # standard result results = calcs.parseString(text) print "=== standard results:", results # custom use & output data = Data(results) print "\n=== data:\n", data print "\n=== default treeview :\n", data.treeView() print "\n=== treeview with group w/o lead type:\n", data.treeView(showGroup=True, noType=True) print "\n=== show lowest-level flat sequence:\n", data.leaves() print "\n=== show lowest-level flat sequence w/o type:\n", data.leaves(noType=True) print "\n=== show flat sequence of items on all levels /lines :" for item in data.allFlat(): print item # === Data 'natively' returned by parsing process =============== class Grammar(object): # tokens integer = Word(nums) integer.setParseAction(lambda i: int(i[0])) point = Literal('.') decimal = Combine(integer + point + integer) decimal.setParseAction(lambda x: float(x[0])) #decimal = Group(decimal)("dec") add = Literal('+') mult = Literal('*') # symbols num = (decimal | integer)("num") mult_op = Group(num + mult + num)("mult_op") add_op = Group((mult_op|num) + add + (mult_op|num))("add_op") #group = Group(l_paren + in_op + r_paren)("group") operation = (add_op|mult_op) calcs = OneOrMore(operation)("calcs") #integer.addParseAction(toData) #decimal.addParseAction(toData) #mult_op.setParseAction(toData) #add_op.setParseAction(toData) #calcs.setParseAction(toData) @classmethod def _setToData(Grammar): patterns = filter(lambda(n,p): n[0]!='_', Grammar.__dict__.items()) print "patterns: %s" %([name for (name,pattern) in patterns]) named_patterns = filter(lambda(n,p): p.resultsName, patterns) print "named patterns: %s" %([name for (name,pattern) in named_patterns]) for name, pattern in named_patterns: pattern.setParseAction(lambda result: Data(result)) print print "\n========================================\n" Grammar._setToData() calcs = Grammar.calcs # standard result results = calcs.parseString(text) print "=== standard results holding data: %s:\n%s" %(results.__class__, results) data = Data(results) print "\n=== data: %s:\n%s" %(data.__class__,data) print "\n=== data treeview :\n", data.treeView() print "\n=== data leaves:\n", data.leaves() print "\n=== show flat sequence of items on all levels /lines :" for item in data.allFlat(): print item ====================================================== O U T P U T ====================================================== C:/prog/ACTIVE~1/pythonw.exe -u "D:/prog/parsing/Data.pyw" 1+2.2*3 4.4*5+6.6 === standard results: [[1, '+', [2.2000000000000002, '*', 3]], [[4.4000000000000004, '*', 5], '+', 6.5999999999999996]] === data: calcs:[<Seq>:[<int>:1 <str>:+ <Seq>:[<float>:2.2 <str>:* <int>:3]] <Seq>:[<Seq>:[<float>:4.4 <str>:* <int>:5] <str>:+ <float>:6.6]] === default treeview : calcs: <Seq>: <int>: 1 <str>: + <Seq>: <float>: 2.2 <str>: * <int>: 3 <Seq>: <Seq>: <float>: 4.4 <str>: * <int>: 5 <str>: + <float>: 6.6 === treeview with group w/o lead type: [<Seq>:[<int>:1 <str>:+ <Seq>:[<float>:2.2 <str>:* <int>:3]] <Seq>:[<Seq>:[<float>:4.4 <str>:* <int>:5] <str>:+ <float>:6.6]] [<int>:1 <str>:+ <Seq>:[<float>:2.2 <str>:* <int>:3]] 1 + [<float>:2.2 <str>:* <int>:3] 2.2 * 3 [<Seq>:[<float>:4.4 <str>:* <int>:5] <str>:+ <float>:6.6] [<float>:4.4 <str>:* <int>:5] 4.4 * 5 + 6.6 === show lowest-level flat sequence: [<int>:1 ,<str>:+ ,<float>:2.2 ,<str>:* ,<int>:3 ,<float>:4.4 ,<str>:* ,<int>:5 ,<str>:+ ,<float>:6.6] === show lowest-level flat sequence w/o type: [1 ,+ ,2.2 ,* ,3 ,4.4 ,* ,5 ,+ ,6.6] === show flat sequence of items on all levels /lines : calcs:[<Seq>:[<int>:1 <str>:+ <Seq>:[<float>:2.2 <str>:* <int>:3]] <Seq>:[<Seq>:[<float>:4.4 <str>:* <int>:5] <str>:+ <float>:6.6]] <Seq>:[<int>:1 <str>:+ <Seq>:[<float>:2.2 <str>:* <int>:3]] <int>:1 <str>:+ <Seq>:[<float>:2.2 <str>:* <int>:3] <float>:2.2 <str>:* <int>:3 <Seq>:[<Seq>:[<float>:4.4 <str>:* <int>:5] <str>:+ <float>:6.6] <Seq>:[<float>:4.4 <str>:* <int>:5] <float>:4.4 <str>:* <int>:5 <str>:+ <float>:6.6 ======================================== patterns: ['mult_op', 'point', 'decimal', 'calcs', 'add', 'num', 'add_op', 'integer', 'operation', 'mult'] named patterns: ['mult_op', 'calcs', 'num', 'add_op'] === standard results holding data: <class 'pyparsing.ParseResults'>: [calcs:[add_op:[num:1 <str>:+ mult_op:[num:2.2 <str>:* num:3]] add_op:[mult_op:[num:4.4 <str>:* num:5] <str>:+ num:6.6]]] === data: <class '__main__.Data'>: calcs:[add_op:[num:1 <str>:+ mult_op:[num:2.2 <str>:* num:3]] add_op:[mult_op:[num:4.4 <str>:* num:5] <str>:+ num:6.6]] === data treeview : calcs: add_op: num: 1 <str>: + mult_op: num: 2.2 <str>: * num: 3 add_op: mult_op: num: 4.4 <str>: * num: 5 <str>: + num: 6.6 === data leaves: [num:1 ,<str>:+ ,num:2.2 ,<str>:* ,num:3 ,num:4.4 ,<str>:* ,num:5 ,<str>:+ ,num:6.6] === show flat sequence of items on all levels /lines : calcs:[add_op:[num:1 <str>:+ mult_op:[num:2.2 <str>:* num:3]] add_op:[mult_op:[num:4.4 <str>:* num:5] <str>:+ num:6.6]] add_op:[num:1 <str>:+ mult_op:[num:2.2 <str>:* num:3]] num:1 <str>:+ mult_op:[num:2.2 <str>:* num:3] num:2.2 <str>:* num:3 add_op:[mult_op:[num:4.4 <str>:* num:5] <str>:+ num:6.6] mult_op:[num:4.4 <str>:* num:5] num:4.4 <str>:* num:5 <str>:+ num:6.6 |