Python parsing module / Discussion / Help/Open Discussion: Help Needed Parsing Quoted and UnQuoted input

Steve Reiss - 2012-01-31

Hi All,

I have started a little PyParsing project that needs a little sophistication added to the grammar.   I need to be able to parse quoted and unquoted strings in a line of input.

Here is my Python code so far :

import pyparsing

#valid commands

commandWords =

alphaWord = pyparsing.Word(pyparsing.alphanums + '_$:\\.')
args = pyparsing.OneOrMore(alphaWord) | pyparsing.quotedString.setParseAction(pyparsing.removeQuotes)
knownWord = pyparsing.oneOf( commandWords, caseless=True )
sentence = pyparsing.OneOrMore( knownWord + args )

# test input list

cmd_list = ['copyfile c:\\temp\\file1.txt c:\\temp\\file2.txt',
            'copyfile "c:\\temp\\file1.txt" to "c:\\temp\\file2.txt"',
            'createdir source in "test folder"',
            'movefile c:\\temp\\file1.txt c:\\temp\\file2.txt',
            'createfolder source'
            ]

# run tests through grammar

for current_cmd in cmd_list:
print("\nTest Command = %s") % current_cmd

for the_cmd,start,end in sentence.scanString(current_cmd):
    print("%s %s") % (the_cmd, the_cmd)

The quoted tokens are not fully parsed :

This produces the output :

Test Command = copyfile c:\temp\file1.txt c:\temp\file2.txt
CopyFile

Test Command = copyfile "c:\temp\file1.txt" to "c:\temp\file2.txt"
CopyFile      <<---- didn't get final 2 tokens

Test Command = createdir source in "test folder"
CreateDir    <<--- didn't get last quoted token

Test Command = movefile c:\temp\file1.txt c:\temp\file2.txt
MoveFile

Test Command = createfolder source
CreateFolder

As an added bonus, can the quotes be removed from the tokens?

Thanks!

Steve

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Paul McGuire - 2012-02-01

Try changing args from:

args = pyparsing.OneOrMore(alphaWord) | pyparsing.quotedString.setParseAction(pyparsing.removeQuotes)

to

args = pyparsing.OneOrMore(alphaWord | pyparsing.quotedString.setParseAction(pyparsing.removeQuotes))

And there are no quotes in your tokens, you are simply seeing them because that is the string representation of the parsed tokens.

But what you have written is just one step above tokenizing - you are going to have to reprocess each parsed set of tokens depending on the leading command. Instead, define a separate parsing expression for each command, using results names for the expected arguments. This will give you a richer set of named results, much easier to process and perform your various commands. Take it still further and attach a class as the parse action for each defined command, add a __call__ method to perform the command's function, and then you can just do the_cmd() with the parsed results. The parser will construct the correct command object type, and the __call__ method will extract the necessary args from the tokens passed into __init__. See the SimpleBool.py example on the wiki for a more detailed example.

Paul
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Steve Reiss - 2012-02-01

Hi Paul!

Thanks for the code correction! I can't believe that I was that close on my own… Minor tweak - major difference!

Will look into making my grammer / commands even stronger with your suggestions.

Thanks Again!

Steve

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Help Needed Parsing Quoted and UnQuoted input

Forums

Help

Help Needed Parsing Quoted and UnQuoted input document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

Help Needed Parsing Quoted and UnQuoted input