Help Needed Parsing Quoted and UnQuoted input

  • Steve Reiss

    Steve Reiss - 2012-01-31

    Hi All,

    I have started a little PyParsing project that needs a little sophistication added to the grammar.   I need to be able to parse quoted and unquoted strings  in a line of input.

    Here is my Python code so far :

    import pyparsing

    #valid commands

    commandWords =

    alphaWord = pyparsing.Word(pyparsing.alphanums + '_$:\\.')
    args = pyparsing.OneOrMore(alphaWord) | pyparsing.quotedString.setParseAction(pyparsing.removeQuotes)
    knownWord = pyparsing.oneOf( commandWords, caseless=True )
    sentence = pyparsing.OneOrMore( knownWord + args )

    # test input list

    cmd_list = ['copyfile c:\\temp\\file1.txt  c:\\temp\\file2.txt',
                'copyfile "c:\\temp\\file1.txt" to "c:\\temp\\file2.txt"',
                'createdir source in "test folder"',
                'movefile c:\\temp\\file1.txt  c:\\temp\\file2.txt',
                'createfolder source'

    # run tests through grammar

    for current_cmd in cmd_list:
      print("\nTest Command = %s") % current_cmd
      for the_cmd,start,end in sentence.scanString(current_cmd):
        print("%s  %s") % (the_cmd, the_cmd)

    The quoted tokens are not fully parsed :

    This produces the output :

    Test Command = copyfile c:\temp\file1.txt  c:\temp\file2.txt

    Test Command = copyfile "c:\temp\file1.txt" to "c:\temp\file2.txt"
    CopyFile      <<---- didn't get final 2 tokens

    Test Command = createdir source in "test folder"
    CreateDir    <<---  didn't get last quoted token

    Test Command = movefile c:\temp\file1.txt  c:\temp\file2.txt

    Test Command = createfolder source

    As an added bonus, can the quotes be removed from the tokens?



  • Paul McGuire

    Paul McGuire - 2012-02-01

    Try changing args from:

    args = pyparsing.OneOrMore(alphaWord) | pyparsing.quotedString.setParseAction(pyparsing.removeQuotes)


    args = pyparsing.OneOrMore(alphaWord | pyparsing.quotedString.setParseAction(pyparsing.removeQuotes))

    And there are no quotes in your tokens, you are simply seeing them because that is the string representation of the parsed tokens.

    But what you have written is just one step above tokenizing - you are going to have to reprocess each parsed set of tokens depending on the leading command.  Instead, define a separate parsing expression for each command, using results names for the expected arguments. This will give you a richer set of named results, much easier to process and perform your various commands.  Take it still further and attach a class as the parse action for each defined command, add a __call__ method to perform the command's function, and then you can just do the_cmd() with the parsed results. The parser will construct the correct command object type, and the __call__ method will extract the necessary args from the tokens passed into __init__.  See the example on the wiki for a more detailed example.

    • Paul
  • Steve Reiss

    Steve Reiss - 2012-02-01

    Hi Paul!

    Thanks for the code correction!  I can't believe that I was that close on my own… Minor tweak - major difference!

    Will look into making my grammer / commands even stronger with your suggestions.

    Thanks Again!



Get latest updates about Open Source Projects, Conferences and News.

Sign up for the SourceForge newsletter:

No, thanks