Menu

#11 scanString start and end offsets incorrect.

v1.0 (example)
open
None
5
2015-10-23
2015-10-23
No

Hi,

I'm using pyparsing to parse some PHP code and im trying to use scanString so that i can reference parsed components from the input text. For some reason im getting offsets for start and stop that are beyond the length of the string. Here is a simple example;

def test_foreach_bug_standalong():
'''
test to diagnose a scanStringOffsetBug
'''
import pyparsing as pp

nested_block = pp.nestedExpr(opener="{", closer="}").setResultsName("block_code")
foreach = pp.Group(pp.Literal("foreach") + "(" + ")" + \
                nested_block
                ).setResultsName("foreach")
variable = pp.Regex(r'\$[a-zA-Z_\x7f-\xff][a-zA-Z0-9_\x7f-\xff]*').setResultsName('variable')

grammar = foreach | variable

SAMPLE_PHP = r'''
        foreach(){          
                eval($item)
            }                   
        $test;

}'''
for token, start, stop in grammar.scanString(SAMPLE_PHP):
    print "{} [{}:{}] from a string length of {}".format(token,start,stop,len(SAMPLE_PHP))

The output generated from this is as follows;

[['foreach', '(', ')', ['eval($item)']]] [6:72] from a string length of 88
['$test'] [112:117] from a string length of 88

As you can see, the second match points to a start if 112 and ends 117 which is longer than the string.

Am i doing something wrong here?

Thanks
Gary

Discussion

  • Gary O'Leary-Steele

    Here is another example with simple grammar;

    import pyparsing as pp
    variable = pp.Regex(r'\$[a-zA-Z_\x7f-\xff][a-zA-Z0-9_\x7f-\xff]*').setResultsName('variable')

    grammar = variable

    SAMPLE_PHP = r'''
    $test;
    $test1;
    $test2;
    $test3;
    }'''

    for token, start, stop in grammar.scanString(SAMPLE_PHP):
    print "{} [{}:{}] from a string length of {}".format(token,start,stop,len(SAMPLE_PHP))

    I would have expected each token match to provide the offset of the match but its not working that way for me.

     
  • Gary O'Leary-Steele

    parseWithTabs() was the solution...

    Thanks
    Gary

     

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.