I'm hoping to implement a little scripting language like sh or csh with a python slant in python. I had already written a parser to do it, but it always seemed a little brittle.
I thought I'd give pyparsing a try... It is very easy to get the easy stuff going but I'm totally lost when it comes more complicated things. None of the examples se
Here's where I've gotten so far. I'm trying to get the "\" escape char to negate the quotes, but I don't know how to fit it into the statement.
Also, how can I give an error if have an odd number of quotes before a newline?
I ran your code and it appears to work, but here are some comments:
1. oneOf takes a list of words, but they must be whitespace-separated. You should change:
keyword = oneOf( string.join(keywords) )
To
keyword = oneOf( " ".join(keywords) )
2. Is there a problem using the quotedString built-in in pyparsing? I think this will handle the '\' character escaping you are looking for.
Lastly, your approach looks more like you are tokenizing - not that there's anything wrong with that! - when in fact, you can define separate sub-grammars by keyword, and have pyparsing do more semantic processing for you. For example:
stringExpr = quotedString #expand this to handle complex string expressions
echoCmd = Literal("echo") + stringExpr.setResultsName("echoText")
Now you can make echoCmd part of a larger grammar of your shell commands, and dispatch directly from the parsed results, instead of having pyparsing just break up your string into tokens and have some other batch of code retrace many of pyparsing's steps in traversing through the list of tokens to interpret them semantically.
My presentation at PyCon implemented a pyparsing->Command pattern, using a text adventure game as an example. I'll post that code as soon as I get access back to my web-page (grrrrr!), and post a notice on the pyparsing SF news page.
-- Paul
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Anonymous
-
2006-02-28
Thanks for the reply, Paul.
Regarding #1, both do the same thing, I don't use the second because I find it unintuitive, although string.join isn't much better ... oh well, not important, it's a python issue.
2. The quotedString returned the quotes too, which I didn't want really. It ignored and deleted the backslash escapes.
Actually I'm not sure what direction I'm going with this. The reason I am trying to tokenize it all is that I am just trying to get the syntax to work first before I'm going to try to execute anything. Another reason is that the command line may be multiple statements joined by semi-colons, maybe on multiple lines.
How can I start validating a statement before even knowing if is single or multiple? Also, I want to accept regular python statements too, so I need to look at the line and identify it first.
There is still tons of stuff to figure out, like backslash escapes, redirection, brace,tilde, and command `` expansions, etc. Even the simple example above doesn't always work if I change something little here or there.
So I'm not sure this is the correct way to handle the issues or not, I'm asking for advice not just on the module but even on how to approach the problem in general.
Sorry this is a bit of a drag, I don't mean to burden anyone with solving my problem. ;)
On the bright side I've already got a working parser I wrote myself, but I'd like to use something more general, robust, and not have to reinvent the wheel.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Anonymous
-
2006-02-28
Here's a newer version, but it doesn't catch the newline as an ending at the end of the first line. :(
===========================================
#!/bin/env python
from pyparsing import *
import string
if len(sys.argv) > 1:
cmdstring = string.join(sys.argv[1:])
else:
cmdstring = '''alias dude=holmes; echo \"one two\" 'three ' && ver # nuthin more
alias dir = ls -l; echo "one two" 'three ' && ver -h >>/dev/null
'''
Hunh! I never used string.join that way. I guess I just stay away from using the string module, since it is supposed to go away at some point.
As you say, when you parse a quoted string, you are not often very interested in the quotes. Pyparsing includes a built-in parse action for removing them. Try this:
What I would do in your case would be to build up my scripting language a command at a time. So with your language, start with dir and echo. echo will require a definition for a string expression, but start with something very simple, just one or more quoted strings which our parser will concatenate together.
What do you mean by "handle"? Do you mean "get rid of the backslashes and translate the escaped char"? You might want to do this with a parse action attached to quotedString, something like:
def unescapeBackslashes(s,l,t):
#expand this list as necessary - last item in list escapes \\ -> \
escapes = ((r"\t","\t"), (r"\b","\b"), (r"\f","\f"), (r"\n","\n"), ("\\\\","\\"))
tmp = t[0]
for lit,rep in escapes:
tmp = tmp.replace(lit,rep)
return tmp
sampleData = r"""'This is some sample code containing\tbackslashes that\nshould be converted.'"""
from pyparsing import *
qtString = quotedString.setParseAction(unescapeBackslashes)
print qtString.parseString(sampleData)[0]
-- Paul
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi,
I'm hoping to implement a little scripting language like sh or csh with a python slant in python. I had already written a parser to do it, but it always seemed a little brittle.
I thought I'd give pyparsing a try... It is very easy to get the easy stuff going but I'm totally lost when it comes more complicated things. None of the examples se
Here's where I've gotten so far. I'm trying to get the "\" escape char to negate the quotes, but I don't know how to fit it into the statement.
Also, how can I give an error if have an odd number of quotes before a newline?
Thanks in advance if anyone can help,
Mike
----------------
from pyparsing import *
import string
if len(sys.argv) > 1:
cmdstring = string.join(sys.argv[1:])
else:
cmdstring = '''alias dude=holmes; echo \"one two" 'three ' && ver # nuthin more\n'''
# define grammar
keywords = ('alias', 'echo', 'setenv', 'ver') # to be expanded later
keyword = oneOf( string.join(keywords) )
argument = Word(alphanums + '_-=/')
quoted_arg = ( Suppress("'") + CharsNotIn("'") + Suppress("'") ^
Suppress('"') + CharsNotIn('"') + Suppress('"') )
contmode = oneOf( '; | || & &&' ).setResultsName('contmode')
escapes = Literal('\\') + Word(printables,exact=1)
statement = Group( keyword + ZeroOrMore(quoted_arg) + ZeroOrMore(argument) +
Optional(contmode, default=';') )
compound_statement = OneOrMore(statement)
compound_statement.ignore(pythonStyleComment)
# parse
print compound_statement.parseString(cmdstring)
Mike -
I ran your code and it appears to work, but here are some comments:
1. oneOf takes a list of words, but they must be whitespace-separated. You should change:
keyword = oneOf( string.join(keywords) )
To
keyword = oneOf( " ".join(keywords) )
2. Is there a problem using the quotedString built-in in pyparsing? I think this will handle the '\' character escaping you are looking for.
Lastly, your approach looks more like you are tokenizing - not that there's anything wrong with that! - when in fact, you can define separate sub-grammars by keyword, and have pyparsing do more semantic processing for you. For example:
stringExpr = quotedString #expand this to handle complex string expressions
echoCmd = Literal("echo") + stringExpr.setResultsName("echoText")
Now you can make echoCmd part of a larger grammar of your shell commands, and dispatch directly from the parsed results, instead of having pyparsing just break up your string into tokens and have some other batch of code retrace many of pyparsing's steps in traversing through the list of tokens to interpret them semantically.
My presentation at PyCon implemented a pyparsing->Command pattern, using a text adventure game as an example. I'll post that code as soon as I get access back to my web-page (grrrrr!), and post a notice on the pyparsing SF news page.
-- Paul
Thanks for the reply, Paul.
Regarding #1, both do the same thing, I don't use the second because I find it unintuitive, although string.join isn't much better ... oh well, not important, it's a python issue.
2. The quotedString returned the quotes too, which I didn't want really. It ignored and deleted the backslash escapes.
Actually I'm not sure what direction I'm going with this. The reason I am trying to tokenize it all is that I am just trying to get the syntax to work first before I'm going to try to execute anything. Another reason is that the command line may be multiple statements joined by semi-colons, maybe on multiple lines.
How can I start validating a statement before even knowing if is single or multiple? Also, I want to accept regular python statements too, so I need to look at the line and identify it first.
There is still tons of stuff to figure out, like backslash escapes, redirection, brace,tilde, and command `` expansions, etc. Even the simple example above doesn't always work if I change something little here or there.
So I'm not sure this is the correct way to handle the issues or not, I'm asking for advice not just on the module but even on how to approach the problem in general.
Sorry this is a bit of a drag, I don't mean to burden anyone with solving my problem. ;)
On the bright side I've already got a working parser I wrote myself, but I'd like to use something more general, robust, and not have to reinvent the wheel.
Here's a newer version, but it doesn't catch the newline as an ending at the end of the first line. :(
===========================================
#!/bin/env python
from pyparsing import *
import string
if len(sys.argv) > 1:
cmdstring = string.join(sys.argv[1:])
else:
cmdstring = '''alias dude=holmes; echo \"one two\" 'three ' && ver # nuthin more
alias dir = ls -l; echo "one two" 'three ' && ver -h >>/dev/null
'''
# define grammar
ParserElement.setDefaultWhitespaceChars(' \t')
keywords = ('alias', 'echo', 'setenv', 'ver') # to be expanded later
keyword = oneOf( string.join(keywords) )
argument = Word(alphanums + '_-=/')
redirector = oneOf('>e> >e>> < << > >> >3> >3>>')
path = Word(printables)
redirection = redirector + path
quoted_arg = ( Suppress("'") + CharsNotIn("'") + Suppress("'") |
Suppress('"') + CharsNotIn('"') + Suppress('"') )
#quoted_arg = quotedString
contmode = oneOf( '; | || & &&' ).setResultsName('contmode')
escapes = Literal('\\') + Word(printables,exact=1)
statement = Group(
keyword +
ZeroOrMore(escapes) +
ZeroOrMore(quoted_arg) +
ZeroOrMore(argument) +
ZeroOrMore( Group(redirection) ) +
Optional(contmode, default=';')
)
# ZeroOrMore(escapes) +
compound_statement = OneOrMore(statement) + LineEnd().suppress()
compound_statement.ignore(pythonStyleComment)
multi_line_stm = OneOrMore(compound_statement)
# parse
if '\n' in cmdstring: print multi_line_stm.parseString(cmdstring)
else: print compound_statement.parseString(cmdstring)
Mike -
Hunh! I never used string.join that way. I guess I just stay away from using the string module, since it is supposed to go away at some point.
As you say, when you parse a quoted string, you are not often very interested in the quotes. Pyparsing includes a built-in parse action for removing them. Try this:
quoted_arg = quotedString.setParseAction( removeQuotes )
What I would do in your case would be to build up my scripting language a command at a time. So with your language, start with dir and echo. echo will require a definition for a string expression, but start with something very simple, just one or more quoted strings which our parser will concatenate together.
stringExpr = OneOrMore(quotedString.setParseAction(removeQuotes))
stringExpr.setParseAction( lambda s,l,t: "".join(t) )
echoCmd = Keyword("echo") + stringExpr.setResultsName("echoString")
dirCmd = Keyword("dir") + filespec
cmds = echoCmd | dirCmd
Of course, this is your project, so you are better off with whatever approach makes most sense to you.
I think you can take it from there.
-- Paul
Thank you very much for the advice. It seems to be working.
I'm still not clear on how to handle backslash escape chars ... like "foo \"bar\" ". Does anyone know?
Mike -
What do you mean by "handle"? Do you mean "get rid of the backslashes and translate the escaped char"? You might want to do this with a parse action attached to quotedString, something like:
def unescapeBackslashes(s,l,t):
#expand this list as necessary - last item in list escapes \\ -> \ escapes = ((r"\t","\t"), (r"\b","\b"), (r"\f","\f"), (r"\n","\n"), ("\\\\","\\"))
tmp = t[0]
for lit,rep in escapes:
tmp = tmp.replace(lit,rep)
return tmp
sampleData = r"""'This is some sample code containing\tbackslashes that\nshould be converted.'"""
from pyparsing import *
qtString = quotedString.setParseAction(unescapeBackslashes)
print qtString.parseString(sampleData)[0]
-- Paul