scanString start and end offsets incorrect.
Brought to you by:
ptmcg
Hi,
I'm using pyparsing to parse some PHP code and im trying to use scanString so that i can reference parsed components from the input text. For some reason im getting offsets for start and stop that are beyond the length of the string. Here is a simple example;
def test_foreach_bug_standalong():
'''
test to diagnose a scanStringOffsetBug
'''
import pyparsing as pp
nested_block = pp.nestedExpr(opener="{", closer="}").setResultsName("block_code")
foreach = pp.Group(pp.Literal("foreach") + "(" + ")" + \
nested_block
).setResultsName("foreach")
variable = pp.Regex(r'\$[a-zA-Z_\x7f-\xff][a-zA-Z0-9_\x7f-\xff]*').setResultsName('variable')
grammar = foreach | variable
SAMPLE_PHP = r'''
foreach(){
eval($item)
}
$test;
}'''
for token, start, stop in grammar.scanString(SAMPLE_PHP):
print "{} [{}:{}] from a string length of {}".format(token,start,stop,len(SAMPLE_PHP))
The output generated from this is as follows;
[['foreach', '(', ')', ['eval($item)']]] [6:72] from a string length of 88
['$test'] [112:117] from a string length of 88
As you can see, the second match points to a start if 112 and ends 117 which is longer than the string.
Am i doing something wrong here?
Thanks
Gary
Here is another example with simple grammar;
import pyparsing as pp
variable = pp.Regex(r'\$[a-zA-Z_\x7f-\xff][a-zA-Z0-9_\x7f-\xff]*').setResultsName('variable')
grammar = variable
SAMPLE_PHP = r'''
$test;
$test1;
$test2;
$test3;
}'''
for token, start, stop in grammar.scanString(SAMPLE_PHP):
print "{} [{}:{}] from a string length of {}".format(token,start,stop,len(SAMPLE_PHP))
I would have expected each token match to provide the offset of the match but its not working that way for me.
parseWithTabs() was the solution...
Thanks
Gary