Thread: [Pyparsing] results typing

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

Hello, pyParsing world!

[rough version -- I have not read over this message]

-0- intro ========

Now is time for serious things! Below a kind of study on this subject I have 
brought up & approached several times already: result typing.

Here is 'type' primarly used in its non-technical sense. Probably there are 
whole parsing use fields where result types are not such important. Each time, 
in fact, the types of results is predictable, they need not be explicitely 
defined. For instance, a file may contain, or one may extract, only data of a 
single type. Or a file format may define types of data in a constant repeated 
order, such as x y color x y color... Still the general situation, I guess, is 
so that we cannot predict in which order types of valuable data will happen in 
source texts, so that having them in the results would be highly helpful. This 
especially applies when parsing texts written in any king of /language/.

The type of a result is similar to the one of any kind of data item: it carries 
the sense of the result. Without it, we are unable to do anything out of it; 
like without a type we, as well as the language "decoder" itself, do not even 
know what kind of operation may apply to a bit of data. If the results do not 
hold their type, we are obliged to re-parse them only to determine what kind of 
thing they are. pyParsing provides two functions to give patterns names -- I 
will talk about them later.

-1- pattern idS, result typeS ========

What is the name of a pattern?; what is the type of a result? What is, 
actually, the link between patterns & results? Patterns define or generate 
results. They are /classes/ of results, in a similar manner as (programming) 
types are classes of instances. Actually, patterns could be (programming) types 
-- but this wouldn't fit in pyParsing. Results are like pattern samples, they 
share characteristics which are specified in/by patterns. A pattern identifier 
(name, id) thus defines its potential results' type.

pattern object		<-->		result type
pattern object id	<-->		result type id

Now, there are actually several kinds of patterns IDs matching several kinds of 
result types:

integer		= Word(nums).setName("int")
decimal		= integer("int_part") + '.' + integer("dec_part")

Basically, a pattern usually defines the /nature/ of results, like in the first 
line above. Now, a single pattern may have several use cases, like in the 
second, like, which define several results' /roles/. I intentionally used 
setName to define pattern names and setResultName (abbreviated in call) to 
define use cases -- but obviously nothing forces us to de that. The example can 
be extended to show the difference between result nature and role more accutely:

integer		= Word(nums).setName("integer")
decimal		= Combine(integer("int_part") + '.' + 
integer("dec_part")).setName("decimal")
num			= decimal | integer
mult		= num("left-num") + '*' + num("right-num")

Both integers & decimals (nature) may be left-nums or right-nums (role).

pattern id			<-->		result nature
pattern use			<-->		result role

Depending on the application, results nature, role or both may be relevant 
information.

-2- pyParsing	=======================

As I have used pyParsing for a few weeks only, I may say stuppid things. But, i 
have tried hard to find friendly ways to get such info from parse results -- I 
could not find any. Actually, I ended up with:
* additional data to patterns
* a custom result type
* changes in pyParsing code

First, patterns basicaly do not know anything about themselves. Especially, 
they do not know they are, not even their (variable) name. If patterns would be 
types, they would know it; but custom type do not have a __name__ attribute to 
receive their (variable) name. Pity. We nevertheless can give a pattern a name 
with setName, or setResultsName.

The main problem anyway is that there no interconnection between patterns and 
results. A result have no access to the pattern that yielded it, not even a 
simple reference. A pattern only passes ResultsName at result init time.
			ResultsName only
pattern			--o-->			results
pattern			<--x--			results
				nothing
An additional obstacle comes from the protection of results access by 
__slots__, for performance reasons, which prevents setting/reading custom 
attributes. Fortunately, patterns are not protected.

-3- letting patterns know	===============

We can use a simple trick to let patterns know a bit about themselves. If they 
are put in a scope (e.g. separate module or class), we have access to a dict 
that holds together names and objects. With that information, we have all we 
need to tweak in patterns guts. Assuming the Grammar is in a class, we could 
even have a class method to do the job. [Note: the name can't be called 
'.name', as this name (!) is used by pyParsing to format pattern repr output, 
esp. for error display.] It may look like that:

class Grammar(object):
	''' pyParsing grammar '''
	integer	= Word(nums)
	decimal	= Combine(integer("int_part") + '.' + integer("dec_part"))
	num		= (decimal | integer).setName("num")
	mult		= Group(num("left-num") + '*' + num("right-num"))
	calc		= OneOrMore(mult)
	@classmethod
	def _setNames(Grammar):
		''' give patterns their name '''
		# exclude '-*' names
		attribs = Grammar.__dict__.items()
		namedPatterns = filter(lambda (name,pattern):
							name[0]!='_', attribs)
		# set .id attributes
		for (name,pattern) in namedPatterns:
			pattern.id=name
		Grammar.patterns = [pattern for (name,pattern) in namedPatterns]
Grammar._setNames()
for pattern in Grammar.patterns:
	print "%s: %s" %(pattern.id,pattern)
===>>
num: num
integer: W:(0123...)
calc: {Group:({{num "*"} num})}...
decimal: Combine:({{W:(0123...) "."} W:(0123...)})
mult: Group:({{num "*"} num})

Now, we have a proper tool to automatically name patterns. Manual setName is no 
more necessary, it can serve more specific needs such as delivering clearer 
info to users. We are ready to transmit results information about their use. 
ResultsName can set info about use cases.

-4- results structure	====================

I have posted a message displaying a type called 'Data'. [Still not really 
allright, I have discovered a bug.] If ever results magically could receive the 
information that patterns now hold, about their nature and role, we could use 
Data objects to properly hold and display typed results. Output may then look 
like that:

calc:[mult:[dec:1.1  <str>:*  int:2]  mult:[int:1  <str>:*  dec:2.2]]

The types shown like prefixes would be taken from the most accurate information 
available:
* role		= pattern use (e.g. left_num)
* nature	= pattern id (e.g. integer)
* pattern format (as presently held in .name, e.g. W:(0123...))
* result type (e.g. <str> or <int>)

Now, we have to find a way to let the results know about that.

-5- passing info to results 	=====================

As results have no access to patterns, we are presently blocked. If we just 
gave them a reference to the patterns, we would be unblocked. I did some 
explorations & trials, and it seems allright. Things to do:
* Add specific fields to patterns: id, nature
* Add a reference to pattern at result's instanciation. This happens 3 times in 
the method _parseNobuffer of the class ParserElement. 'self' can be added there 
as new argument for result initialisation. For instance:
retTokens = ParseResults(tokens, self.resultsName, asList=self.saveAsList 
,modal=self.modalResults, pattern=self)
this arg becomes a 'pattern' param in ParseResult's __new__ & __init__ methods.
* Add private attribs to ParseResult. In __init__:
self.__pattern	= pattern
self.__nature	= pattern.id
self.__role		= pattern.use
And matching accessors (because access is protected). E.g.:
def pattern(): return self.__pattern

denis

Thread: [Pyparsing] results typing

pyparsing-users