I have started a little PyParsing project that needs a little sophistication added to the grammar. I need to be able to parse quoted and unquoted strings in a line of input.
And there are no quotes in your tokens, you are simply seeing them because that is the string representation of the parsed tokens.
But what you have written is just one step above tokenizing - you are going to have to reprocess each parsed set of tokens depending on the leading command. Instead, define a separate parsing expression for each command, using results names for the expected arguments. This will give you a richer set of named results, much easier to process and perform your various commands. Take it still further and attach a class as the parse action for each defined command, add a __call__ method to perform the command's function, and then you can just do the_cmd() with the parsed results. The parser will construct the correct command object type, and the __call__ method will extract the necessary args from the tokens passed into __init__. See the SimpleBool.py example on the wiki for a more detailed example.
Paul
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi All,
I have started a little PyParsing project that needs a little sophistication added to the grammar. I need to be able to parse quoted and unquoted strings in a line of input.
Here is my Python code so far :
import pyparsing
#valid commands
commandWords =
alphaWord = pyparsing.Word(pyparsing.alphanums + '_$:\\.')
args = pyparsing.OneOrMore(alphaWord) | pyparsing.quotedString.setParseAction(pyparsing.removeQuotes)
knownWord = pyparsing.oneOf( commandWords, caseless=True )
sentence = pyparsing.OneOrMore( knownWord + args )
# test input list
cmd_list = ['copyfile c:\\temp\\file1.txt c:\\temp\\file2.txt',
'copyfile "c:\\temp\\file1.txt" to "c:\\temp\\file2.txt"',
'createdir source in "test folder"',
'movefile c:\\temp\\file1.txt c:\\temp\\file2.txt',
'createfolder source'
]
# run tests through grammar
for current_cmd in cmd_list:
print("\nTest Command = %s") % current_cmd
for the_cmd,start,end in sentence.scanString(current_cmd):
print("%s %s") % (the_cmd, the_cmd)
The quoted tokens are not fully parsed :
This produces the output :
Test Command = copyfile c:\temp\file1.txt c:\temp\file2.txt
CopyFile
Test Command = copyfile "c:\temp\file1.txt" to "c:\temp\file2.txt"
CopyFile <<---- didn't get final 2 tokens
Test Command = createdir source in "test folder"
CreateDir <<--- didn't get last quoted token
Test Command = movefile c:\temp\file1.txt c:\temp\file2.txt
MoveFile
Test Command = createfolder source
CreateFolder
As an added bonus, can the quotes be removed from the tokens?
Thanks!
Steve
Try changing args from:
args = pyparsing.OneOrMore(alphaWord) | pyparsing.quotedString.setParseAction(pyparsing.removeQuotes)
to
args = pyparsing.OneOrMore(alphaWord | pyparsing.quotedString.setParseAction(pyparsing.removeQuotes))
And there are no quotes in your tokens, you are simply seeing them because that is the string representation of the parsed tokens.
But what you have written is just one step above tokenizing - you are going to have to reprocess each parsed set of tokens depending on the leading command. Instead, define a separate parsing expression for each command, using results names for the expected arguments. This will give you a richer set of named results, much easier to process and perform your various commands. Take it still further and attach a class as the parse action for each defined command, add a __call__ method to perform the command's function, and then you can just do the_cmd() with the parsed results. The parser will construct the correct command object type, and the __call__ method will extract the necessary args from the tokens passed into __init__. See the SimpleBool.py example on the wiki for a more detailed example.
Hi Paul!
Thanks for the code correction! I can't believe that I was that close on my own… Minor tweak - major difference!
Will look into making my grammer / commands even stronger with your suggestions.
Thanks Again!
Steve