I'm in the process of building a SQL parser, based on the example that you provide with the pyparsing distribution. I'm now able to handle all of the common query types, table and column aliases, joins, etc. pyparsing makes this all quite simple, and for that I am thankful! But I digress...
I'm having trouble getting past the following:
"SELECT f.*, b.col1, b.col2 FROM foo f, bar b WHERE . . . "
The problem? According to our grammar, a select statement is either "SELECT * FROM foo" or "SELECT f.col1, f.col2 FROM foo f". Because it finds "*" in "SELECT f.* FROM foo f", it expects fromToken to follow the "*".
I've tried changing columnName such that the elements within the delimited list can be either ident OR * as follows:
This will allow the remained of the statement to be properly parsed, but will ignore column names after the "*". Thus, in "SELECT f.* b.col1 FROM foo f, bar b", tokens.tables will only contain [f.*].
Apologies for the long-winded description of the problem, but I thought it more convenient than having to (potentially) post several clarifications.
Many thanks in advance for all of your thoughts and/or suggestions.
Keith
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Apologies for the representation of the double-quotes in my post ("). First time posting to an SF board. Wasn't aware that it would mangle something like this, as it's clearly not part of any tag or code (of which I'm aware).
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I'm in the process of building a SQL parser, based on the example that you provide with the pyparsing distribution. I'm now able to handle all of the common query types, table and column aliases, joins, etc. pyparsing makes this all quite simple, and for that I am thankful! But I digress...
I'm having trouble getting past the following:
"SELECT f.*, b.col1, b.col2 FROM foo f, bar b WHERE . . . "
A columnName is defined as:
columnName = delimitedList( ident, ".", combine=True )
And a selectStmt looks like this:
selectToken + ( "*" | columnNameList ).setResultsName("columns") + fromToken + . . .
The problem? According to our grammar, a select statement is either "SELECT * FROM foo" or "SELECT f.col1, f.col2 FROM foo f". Because it finds "*" in "SELECT f.* FROM foo f", it expects fromToken to follow the "*".
I've tried changing columnName such that the elements within the delimited list can be either ident OR * as follows:
columnName = delimitedList( ("*" | ident ), ".", combine=True )
This will allow the remained of the statement to be properly parsed, but will ignore column names after the "*". Thus, in "SELECT f.* b.col1 FROM foo f, bar b", tokens.tables will only contain [f.*].
Apologies for the long-winded description of the problem, but I thought it more convenient than having to (potentially) post several clarifications.
Many thanks in advance for all of your thoughts and/or suggestions.
Keith
Apologies for the representation of the double-quotes in my post ("). First time posting to an SF board. Wasn't aware that it would mangle something like this, as it's clearly not part of any tag or code (of which I'm aware).
OK. Was able to fix this after taking a night off :)
Removed the * from the selectStmt def, and left it in the columnName def only:
columnName = delimitedList( "*" | ident ), ".", combine=True )
Keith -
(Sorry you had to fix this yourself. I thought I replied to this, but SF must have eaten my reply. Here it is below.)
I think the cleanest approach would be to get away from the delimitedList trickery, and just say what you mean:
columnRef = "*" | ident
tableRef = ident
columnAlias = quotedString | ident
columnName = Combine(Optional(tableRef + ".") + columnRef).setResultsName("colName") + Optional(columnAlias).setResultsName("colAlias")
Does this make things better or worse?
-- Paul