[Pyparsing] the typed Data type

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

Hello,

[As I don't if there anybody else on this list, well... I use it like a log for 
ideas and trials using pyParsing; and an oppportunity to express them clearly 
(?). denis]

Here is an implementation of a custom type used to give parse results a 
alternative structure, and an illustration of what it is intended to.
Data (sic!) is primarly used to natively give (nested) parse results a /type/. 
I will come back to this point of view that results should be typed in a 
further message. So, Data allow results to have a type -- like ordinary dat, 
hence the name -- and show in a type:content format.
Actually, the implementation i uselessly complicated because presently it is 
able to receive content from several kinds of sources: final parse result, 
parse results created during the parsing process, ordinary data, objects that 
already are of type Data. This makes its typing and content reading overly 
complex. (ToDo: implement __new__ for the case when content of Data object.) 
Additionaly, it holds currently useless list-like operator overloading. For 
more specific use, it could be written in a dozen of lines, as it was before. I 
also added a Seq type to avoid a problem with built-in lists.
Additionally, Data is able to receive content from any kind of simple or 
sequential object.
The type property may be defined from several source, here listed from the most 
specific to the least one:
	* arg passed at init [ParseResults object only]
	* ResultsType retrieved from getName() [ditto]
	* pattern's .use or .ResultsType
	* pattern's .id or .name
	* pattern's type_name
	* result's own type_name
Some sources of info listed above for typing an object belong in fact to 
further exploration about pattern naming that I will present in another post.
Here is the Data thing:

def typ(obj): return obj.__class__.__name__
class Seq(list):
	''' specialized sequence type with improved str
		Override list's behaviour that str(list) calls repr instead of str on items.
		'''
	def __str__(self):
		if len(self) == 0:
			return '[]'
		text = str(self[0])
		for item in self[1:]:
			if isinstance(item, list):
				item = Seq(item)
			text += " ,%s" %item
		return "[%s]" %text
class Data(object):
	''' nestable type:content object
		with built_in toolset
		'''
	def __init__(self, content, type=None, pattern=None):
		''' store startup data '''
		self.type = type
		# read info from pattern, if available
		self.read_pattern(pattern)
		# case content is ParseResults: extract proper info
		if isinstance(content,ParseResults):
			content, self.type = self.from_result(content, type)
		# case (new) content is Data object: copy
		if isinstance(content,Data):
			self.type, self.pattern = content.type, content.pattern
			self.content, self.isSimple = content.content, content.isSimple
		# case content is ordinary data: record it
		else:
			self.content = self.recursive_record(content)
		# define type if not given by user, nore read from pattern
		if not self.type:
			self.type = "<%s>" %typ(self.content)
		#print "* new Data  - %s" %self
	def read_pattern(self,pattern):
		''' if available, read info from pattern
			about source of result '''
		self.pattern = pattern
		self.nature = self.role = self.pattern_type_name = None
		# get info about source of result
		if pattern:
			# pattern_type_name (e.g. Literal, MatchFirst, Group...)
			self.pattern_type_name = typ(pattern)
			# role <-- pattern.use: pattern use case
			try:
				self.role = pattern.use
			except AttributeError:
				self.role = pattern.ResultsName
			# nature <-- pattern name/id : pattern naming
			try:
				self.nature = pattern.id
			except AttributeError:
				try:
					self.nature = pattern.name
				except AttributeError:
					pass
		# if not yet set, try and define type from this info
		if not self.type:
			if self.role:
				self.type = self.role
			elif self.nature:
				self.type = self.nature
			elif self.pattern_type_name:
				self.type = "<%s>" %self.pattern_type_name
	def from_result(self,content,type):
		''' define properties from result data '''
		# try & set type from user-defined info
		if (not type) and content.getName():
			type = content.getName()
		# jump inside Group if not sequence
		if len(content)==1:
			content = content[0]
		# take result as list
		if isinstance(content,ParseResults):
			content = content.asList()
		return content, type
	def recursive_record(self,content):
		''' record content according to its structure '''
		# return if isSimple
		if not isinstance(content,list):
			self.isSimple = True
			return content
		# === case complex / nested
		# mutate each nested item to Data object
		# may already be a Data object -- or not
		content = Seq(content)
		self.isSimple = False
		seq = Seq()
		for item in content:
			if isinstance(item,Data):
				seq.append(item)
			else:
				seq.append(Data(item))
		return seq
	def treeView(self, noType=False, showGroup=False, level=0):
		''' return full & legible tree view of object's data '''
		tree = ''
		# this level's line
		tree += level * '\t'
		if not noType:
			tree+= "%s: " %self.type
		if self.isSimple or showGroup:
			tree+= "%s" %self.content_text()
		tree += "\n"
		# recursion for nested results
		if not self.isSimple:
			for item in self.content:
				tree += item.treeView(noType, showGroup, level+1)
		# final result
		return tree
	def leaves(self, noType=False):
		''' return a flat list of 'terminal', low-level,
			object items -- actually called 'leaves' '''
		seq = Seq()
		# case simple result : add content to seq
		if self.isSimple:
			if noType:
				seq.append(self.content)
			else:
				seq.append(self)
		# case compound result: recursively explore nested result
		else:
			for item in self.content:
				seq.extend(item.leaves(noType))
		return seq
	def allFlat(self, noType=False):
		''' return full flat list of object'items
			-- either compound or simple '''
		seq = Seq()
		# in all cases : add content to seq
		if noType:
			seq.append(self.content)
		else:
			seq.append(self)
		# case compound result: recursively explore nested result
		if not self.isSimple:
			for item in self.content:
				seq.extend(item.allFlat(noType))
		return seq
	def __len__(self):
		try:
			return len(self.content)
		except TypeError:
			return 0
	def __getitem__(self,index):
		return self.content[index]
	def __getslice__(self,i1,i2):
		return self.content[i1:i2]
	def __repr__(self):
		''' type:content format '''
		return "%s:%s" %(self.type, self.content_text())
	def content_text(self):
		'''	content expression
			for either simple or sequential content '''
		# case simple content: just output as is
		if self.isSimple:
			return str(self.content)
		# case compound content: resursive text seq in []
		else:
			text = str(self.content[0])
			for item in self.content[1:]:
				text += "  %s" %item
			return "[%s]" %text

Below are illustrations for two use cases:

-1- Parsing is done normally. The results feed a Data object. Both normal and 
Data results are printed, so that the difference is made clear. Additionally, 
tree view, 'leaves' & flat list of all-level nested results are also shown -- 
see my previous for more info about these latter things. Contained results are 
recursively converted into Data objects, so that all end up typed.

-2- The parser is cheated to make it 'natively' return Data object instead of 
ParseResults one. Actually, for the sake of illustration, only important 
(named) result are converted. But it makes no difference to convert all, for 
anymay nested results will be recursively converted into Data objects.

# === Data retrieved from final parse results ==================
class Grammar(object):
	# tokens
	integer 	= Word(nums)
	integer.setParseAction(lambda i: int(i[0]))
	point		= Literal('.')
	decimal		= Combine(integer + point + integer)
	decimal.setParseAction(lambda x: float(x[0]))
	#decimal		= Group(decimal)("dec")
	add			= Literal('+')
	mult		= Literal('*')
	# symbols
	num			= decimal | integer
	mult_op		= Group(num + mult + num)("mult_op")
	add_op		= Group((mult_op|num) + add + (mult_op|num))("add_op")
	#group		= Group(l_paren + in_op + r_paren)("group")
	operation	= (add_op|mult_op)
	calcs		= OneOrMore(operation)("calcs")
calcs = Grammar.calcs
# source text
text = "1+2.2*3 4.4*5+6.6"
print text
# standard result
results = calcs.parseString(text)
print "=== standard results:", results
# custom use & output
data = Data(results)
print "\n=== data:\n", data
print "\n=== default treeview :\n", data.treeView()
print "\n=== treeview with group w/o lead type:\n", 
data.treeView(showGroup=True, noType=True)
print "\n=== show lowest-level flat sequence:\n", data.leaves()
print "\n=== show lowest-level flat sequence w/o type:\n", data.leaves(noType=True)
print "\n=== show flat sequence of items on all levels /lines :"
for item in data.allFlat():
	print item

# === Data 'natively' returned by parsing process ===============
class Grammar(object):
	# tokens
	integer 	= Word(nums)
	integer.setParseAction(lambda i: int(i[0]))
	point		= Literal('.')
	decimal		= Combine(integer + point + integer)
	decimal.setParseAction(lambda x: float(x[0]))
	#decimal		= Group(decimal)("dec")
	add			= Literal('+')
	mult		= Literal('*')
	# symbols
	num			= (decimal | integer)("num")
	mult_op		= Group(num + mult + num)("mult_op")
	add_op		= Group((mult_op|num) + add + (mult_op|num))("add_op")
	#group		= Group(l_paren + in_op + r_paren)("group")
	operation	= (add_op|mult_op)
	calcs		= OneOrMore(operation)("calcs")
	#integer.addParseAction(toData)
	#decimal.addParseAction(toData)
	#mult_op.setParseAction(toData)
	#add_op.setParseAction(toData)
	#calcs.setParseAction(toData)
	@classmethod
	def _setToData(Grammar):
		patterns = filter(lambda(n,p): n[0]!='_', Grammar.__dict__.items())
		print "patterns: %s" %([name for (name,pattern) in patterns])
		named_patterns = filter(lambda(n,p): p.resultsName, patterns)
		print "named patterns: %s" %([name for (name,pattern) in named_patterns])
		for name, pattern in named_patterns:
			pattern.setParseAction(lambda result: Data(result))
		print
print "\n========================================\n"
Grammar._setToData()
calcs = Grammar.calcs
# standard result
results = calcs.parseString(text)
print "=== standard results holding data: %s:\n%s" %(results.__class__, results)
data = Data(results)
print "\n=== data: %s:\n%s" %(data.__class__,data)
print "\n=== data treeview :\n", data.treeView()
print "\n=== data leaves:\n", data.leaves()
print "\n=== show flat sequence of items on all levels /lines :"
for item in data.allFlat():
	print item

======================================================
					O U T P U T
======================================================
C:/prog/ACTIVE~1/pythonw.exe -u  "D:/prog/parsing/Data.pyw"
1+2.2*3 4.4*5+6.6
=== standard results: [[1, '+', [2.2000000000000002, '*', 3]], 
[[4.4000000000000004, '*', 5], '+', 6.5999999999999996]]

=== data:
calcs:[<Seq>:[<int>:1  <str>:+  <Seq>:[<float>:2.2  <str>:*  <int>:3]] 
<Seq>:[<Seq>:[<float>:4.4  <str>:*  <int>:5]  <str>:+  <float>:6.6]]

=== default treeview :
calcs:
	<Seq>:
		<int>: 1
		<str>: +
		<Seq>:
			<float>: 2.2
			<str>: *
			<int>: 3
	<Seq>:
		<Seq>:
			<float>: 4.4
			<str>: *
			<int>: 5
		<str>: +
		<float>: 6.6

=== treeview with group w/o lead type:
[<Seq>:[<int>:1  <str>:+  <Seq>:[<float>:2.2  <str>:*  <int>:3]] 
<Seq>:[<Seq>:[<float>:4.4  <str>:*  <int>:5]  <str>:+  <float>:6.6]]
	[<int>:1  <str>:+  <Seq>:[<float>:2.2  <str>:*  <int>:3]]
		1
		+
		[<float>:2.2  <str>:*  <int>:3]
			2.2
			*
			3
	[<Seq>:[<float>:4.4  <str>:*  <int>:5]  <str>:+  <float>:6.6]
		[<float>:4.4  <str>:*  <int>:5]
			4.4
			*
			5
		+
		6.6

=== show lowest-level flat sequence:
[<int>:1 ,<str>:+ ,<float>:2.2 ,<str>:* ,<int>:3 ,<float>:4.4 ,<str>:* ,<int>:5 
,<str>:+ ,<float>:6.6]

=== show lowest-level flat sequence w/o type:
[1 ,+ ,2.2 ,* ,3 ,4.4 ,* ,5 ,+ ,6.6]

=== show flat sequence of items on all levels /lines :
calcs:[<Seq>:[<int>:1  <str>:+  <Seq>:[<float>:2.2  <str>:*  <int>:3]] 
<Seq>:[<Seq>:[<float>:4.4  <str>:*  <int>:5]  <str>:+  <float>:6.6]]
<Seq>:[<int>:1  <str>:+  <Seq>:[<float>:2.2  <str>:*  <int>:3]]
<int>:1
<str>:+
<Seq>:[<float>:2.2  <str>:*  <int>:3]
<float>:2.2
<str>:*
<int>:3
<Seq>:[<Seq>:[<float>:4.4  <str>:*  <int>:5]  <str>:+  <float>:6.6]
<Seq>:[<float>:4.4  <str>:*  <int>:5]
<float>:4.4
<str>:*
<int>:5
<str>:+
<float>:6.6

========================================

patterns: ['mult_op', 'point', 'decimal', 'calcs', 'add', 'num', 'add_op', 
'integer', 'operation', 'mult']
named patterns: ['mult_op', 'calcs', 'num', 'add_op']

=== standard results holding data: <class 'pyparsing.ParseResults'>:
[calcs:[add_op:[num:1  <str>:+  mult_op:[num:2.2  <str>:*  num:3]] 
add_op:[mult_op:[num:4.4  <str>:*  num:5]  <str>:+  num:6.6]]]

=== data: <class '__main__.Data'>:
calcs:[add_op:[num:1  <str>:+  mult_op:[num:2.2  <str>:*  num:3]] 
add_op:[mult_op:[num:4.4  <str>:*  num:5]  <str>:+  num:6.6]]

=== data treeview :
calcs:
	add_op:
		num: 1
		<str>: +
		mult_op:
			num: 2.2
			<str>: *
			num: 3
	add_op:
		mult_op:
			num: 4.4
			<str>: *
			num: 5
		<str>: +
		num: 6.6

=== data leaves:
[num:1 ,<str>:+ ,num:2.2 ,<str>:* ,num:3 ,num:4.4 ,<str>:* ,num:5 ,<str>:+ 
,num:6.6]

=== show flat sequence of items on all levels /lines :
calcs:[add_op:[num:1  <str>:+  mult_op:[num:2.2  <str>:*  num:3]] 
add_op:[mult_op:[num:4.4  <str>:*  num:5]  <str>:+  num:6.6]]
add_op:[num:1  <str>:+  mult_op:[num:2.2  <str>:*  num:3]]
num:1
<str>:+
mult_op:[num:2.2  <str>:*  num:3]
num:2.2
<str>:*
num:3
add_op:[mult_op:[num:4.4  <str>:*  num:5]  <str>:+  num:6.6]
mult_op:[num:4.4  <str>:*  num:5]
num:4.4
<str>:*
num:5
<str>:+
num:6.6