Work at SourceForge, help us to make it a better place! We have an immediate need for a Support Technician in our San Francisco or Denver office.

Close

Help Needed Parsing Quoted and UnQuoted input

2012-01-31
2013-05-14
  • Steve Reiss
    Steve Reiss
    2012-01-31

    Hi All,

    I have started a little PyParsing project that needs a little sophistication added to the grammar.   I need to be able to parse quoted and unquoted strings  in a line of input.

    Here is my Python code so far :

    import pyparsing

    #valid commands

    commandWords =

    alphaWord = pyparsing.Word(pyparsing.alphanums + '_$:\\.')
    args = pyparsing.OneOrMore(alphaWord) | pyparsing.quotedString.setParseAction(pyparsing.removeQuotes)
    knownWord = pyparsing.oneOf( commandWords, caseless=True )
    sentence = pyparsing.OneOrMore( knownWord + args )

    # test input list

    cmd_list = ['copyfile c:\\temp\\file1.txt  c:\\temp\\file2.txt',
                'copyfile "c:\\temp\\file1.txt" to "c:\\temp\\file2.txt"',
                'createdir source in "test folder"',
                'movefile c:\\temp\\file1.txt  c:\\temp\\file2.txt',
                'createfolder source'
                ]

    # run tests through grammar

    for current_cmd in cmd_list:
      print("\nTest Command = %s") % current_cmd
     
      for the_cmd,start,end in sentence.scanString(current_cmd):
        print("%s  %s") % (the_cmd, the_cmd)

    The quoted tokens are not fully parsed :

    This produces the output :

    Test Command = copyfile c:\temp\file1.txt  c:\temp\file2.txt
    CopyFile 

    Test Command = copyfile "c:\temp\file1.txt" to "c:\temp\file2.txt"
    CopyFile      <<---- didn't get final 2 tokens

    Test Command = createdir source in "test folder"
    CreateDir    <<---  didn't get last quoted token

    Test Command = movefile c:\temp\file1.txt  c:\temp\file2.txt
    MoveFile 

    Test Command = createfolder source
    CreateFolder 

    As an added bonus, can the quotes be removed from the tokens?

    Thanks!

    Steve

     
  • Paul McGuire
    Paul McGuire
    2012-02-01

    Try changing args from:

    args = pyparsing.OneOrMore(alphaWord) | pyparsing.quotedString.setParseAction(pyparsing.removeQuotes)

    to

    args = pyparsing.OneOrMore(alphaWord | pyparsing.quotedString.setParseAction(pyparsing.removeQuotes))

    And there are no quotes in your tokens, you are simply seeing them because that is the string representation of the parsed tokens.

    But what you have written is just one step above tokenizing - you are going to have to reprocess each parsed set of tokens depending on the leading command.  Instead, define a separate parsing expression for each command, using results names for the expected arguments. This will give you a richer set of named results, much easier to process and perform your various commands.  Take it still further and attach a class as the parse action for each defined command, add a __call__ method to perform the command's function, and then you can just do the_cmd() with the parsed results. The parser will construct the correct command object type, and the __call__ method will extract the necessary args from the tokens passed into __init__.  See the SimpleBool.py example on the wiki for a more detailed example.

    • Paul
     
  • Steve Reiss
    Steve Reiss
    2012-02-01

    Hi Paul!

    Thanks for the code correction!  I can't believe that I was that close on my own… Minor tweak - major difference!

    Will look into making my grammer / commands even stronger with your suggestions.

    Thanks Again!

    Steve