From: Kevin K. <kk...@ny...> - 2011-01-16 22:42:10
|
Gentlebeings, I've managed thoroughly to embarrass myself by letting TDBC escape with its stored procedure support thought out only very incompletely. The following document details what's wrong, and my best guesses about how to fix it. I welcome comments; I welcome programming assistance even more. -- 73 de ke9tv/2, Kevin -- TDBC, Stored Procedures and Multiple Results ===== ====== ========== === ======== ======= Introduction. Recent user reports that stored procedures with output parameters don't work in tdbc::odbc have led me to investigate the reasons - and discover, much to my chagrin, that the implementation of the 'preparecall' method and associated handling simply was never completed, and the specification is quite loose - too loose to be usable. For this reason, I'm circulating this email to explore the ideas of how it ought to work, and try to move toward specifying the missing pieces. I've half a mind just to leave them out, on the grounds that they are not part of the "95% solution" that tdbc was intended to be, but I've had more than my share of experience with the "jackbooted thug" style of database administrator who supports access to his database only through SP calls, and so I'm fairly convinced that we need to have *some* sort of support for them. I. Multiple result sets: the easy part. The first, and easiest, missing piece to full stored-procedure support is to recognize that several databases (at least SQL Server, MySQL and Oracle) allow stored procedures to return multiple result sets. (Note: I am not referring to 'refcursors' in this discussion, just to stored procedures that, for instance, return in order the result of each statement that they execute.) Fortunately, it appears that a fairly simple change to the TDBC API can cover multiple returns from stored procedures: to the result set object, we add the method: $resultSet nextResults which does the following: - discards any unfetched rows in the current result set. - checks if there are further results and returns 0 if there are not. - advances to the next result set - adjusts the internal state so that the 'columns' and 'rowcount' methods act on the next result set - returns 1 The 'foreach' and 'allrows' methods on result sets, statements and connections will be adjusted to have an outer loop that calls 'nextResults', updates any '-columnsvar' with the columns of the next result, and continues to iterate. Only when all results are retrieved will 'foreach' and 'allrows' return. (This behaviour is entirely analogous to what the sqlite3 Tcl bindings do in the case of multiple semicolon-separated statements.) Compatibility of multiple result sets: ODBC and SQL/CLI: The SQLMoreResults call is precisely analogous to the '$resultSet nextResults' method. Some internal changes to the tdbc::odbc driver will be needed to rebind the result set after '$resultSet nextResults' is called. The 'prepare' method on connections will also be changed to allow semicolons in the statement; given that all the result sets can be handled, there is no reason to forbid them. MySQL: The MySQL prepared statement interface does not support multiple result sets, so MySQL stored procedures that return them must be invoked through the unprepared 'mysql_query'. Given the risks of SQL injection attacks, 'mysql_query' shall be used only for stored procedure calls; these will be discussed at greater length below. The 'prepare' method on connections will continue to forbid semicolons in statements. PostgreSQL: All PostgreSQL statements return single results. For PostgreSQL, therefore, the 'nextResults' method will always return 0. The 'prepare' method on connections will continue to forbid semicolons in statements. SQLite: SQLite does not have stored procedures in the ISO sense; it has no CALL statement that potentially expects OUT or INOUT parameters. (It does have SELECT statements that invoke user-defined functions, but these present no problem to the current API.) The proposal here, since SQLite can do it readily, is to allow semicolons in the SQL string presented to the 'prepare' method, and to allow for multiple results if the given statement contains semicolons. Other databases: At present, SQL Server and Oracle are accessed only through the ODBC layer. ODBC provides full stored-procedure support for both of these, and one must imagine that what an ODBC driver can do, a native driver can do as well. Similarly, SQL/CLI (essentially, ODBC) is the primary way to access DB/2. Compatibility with hypothetical drivers for other databases is not considered. II. The 'preparecall' method. Recall that the 'preparecall' method accepts only a simpleminded syntax: :result = functionname(:arg1, :arg2, ...) and procedurename(:arg1, :arg2, ...) A. ODBC and SQL/CLI. The two calls map precisely onto the ODBC 'procedure call escape sequence': {?=call functionname(?, ?, ...)} {call procedurename(?, ?, ...)} Once this mapping is done, the only further requirement is correct handling of multiple results and of OUT and INOUT parameters, which is discussed below. B. MySQL MySQL is rather in disarray with respect to the support of stored procedures. Stored functions are easy: the call :result = functionname(:arg1, :arg2, ...) can be replaced with SELECT functionname(:arg1, :arg2, ...) followed by storing the singleton result of the call into :result. But procedures face the problem that parameter types cannot be determined readily, and this determination is necessary to figure out whether :arg1 needs to be replaced with a quoted string (suitably escaped) or a numeric value. (Remember that stored procedure calls cannot be prepared because of the possibility of multiple result sets.) In MySQL 5.5 and beyond, the parameter data are readily available by querying INFORMATION_SCHEMA.PARAMETERS, and once MySQL 5.5 is more widely deployed, this approach will probably be the best. Nevertheless, MySQL 5.1 is still in widespread use (current Debian and Red Hat Enterprise systems still have it), so this approach does not appear entirely feasible at present. The workaround in MySQL 5.1 is for the client to retrieve the procedure definition and parse it for the parameter types. The definition can be retrieved in one of two ways: either a query of the 'mysql.proc' table or executing a 'SHOW CREATE PROCEDURE' statement. The former requires a DBA to give the user SELECT permission on 'mysql.proc' - something that database hosting providers are likely to be unwilling to do. The latter requires either SELECT permission on 'mysql.proc' or else that the user be the owner of the procedure - again, something that cannot be presumed with a hosted database. The ultimate fallback is to require the Tcl program to do 'paramtype' calls on the statement to supply parameter type information itself. This method is at least guaranteed to work, but is likely to be cumbersome and poorly received. My inclination for MySQL is to depend on INFORMATION_SCHEMA.PARAMETERS, and if a query to this table fails, then require the user to specify the parameters explicitly. Once the parameters are known, the translation of a stored procedure call will work with session variables. A call like procedurename(:param1, :param2, :param3) where :param1 is an IN VARCHAR, :param2 is an INOUT INTEGER, and :param3 is an OUT VARCHAR, would translate to a sequence like: @param1 = 'escaped string' @param2 = numeric-value CALL procedurename(@param1, @param2, @param3) followed by: SELECT @param2, @param3 Support for MySQL stored procedures, because of its complexity (including the fact that prepared statements don't work with stored procedure calls) is likely to be the lowest-priority of anything discussed in this document. As an interim step, I propose to modify the tdbc::mysql driver so that @variable gets passed through into the queries, so that client code can use session variables. I also propose to implement an 'execdirect' method that provides a simple escape to mysql_real_query() so that a client application can invoke SQL code that cannot appear in a prepared statement. (This method will do no parameter substitution, and will return a result set where every column type is 'string': there will be no corresponding 'statement' object.) This will at least provide a workaround so that TDBC client code can access stored procedures, albeit with an inconvenient API (and the need to escape strings - for which an 'escape' method on the 'connection' object will also be provided). C. PostgreSQL Just as with MySQL, stored function calls can be handled readily by translating :result = function(:param1, :param2, ...); into a SELECT statememt. PostgreSQL's view of stored procedures is also slightly eccentric. While procedures can be declared as having OUT and INOUT parameters, they actually have only IN parameters expressed in the CALL statement: the OUT and INOUT parameters appear as columns in the procedure's final result set. A simple CALL procedure(:param1, :param2, ...) where only IN and INOUT parameters are listed, is all that is needed. Once again, parameter type determination is problematic, but in this case, a third party has solved most of the problem for us by providing code at http://www.alberton.info/postgresql_meta_info.html that retrieves a procedure or function's parameter definitions from the somewhat awkward form that they appear in the 'pg_catalog.pg_proc' table. D. SQLite SQLite lacks stored procedures in the SQL/CLI sense. As with the other two databases, function calls :result = functionname(:param1, :param2, ...) can be mapped to SELECT statements. Procedure calls passed to 'preparecall' will return an error with a SQLstate of 42000 (syntax error or access denied). III. Output parameters PostgreSQL and SQLite have no need for specific support of output parameters, since procedure and function outputs appear in the result set. MySQL could also be implemented that way, since we use session variables to stage the output parameters, and then execute a SELECT statement to retrieve them. But ODBC or SQL/CLI requires output parameters to be retrieved using a different API. For ODBC, the parameters will be stored in client-provided buffers after the final result is retrieved (SQLMoreResults returns SQL_NO_DATA or, equivalently, the 'nextresults' method on the result set returns 0.) It is tempting to put output parameters directly into Tcl variables, but it is also quite wrong for several reasons: (1) If the 'execute' call that created a result set supplied a dictionary, then the program does not wish to have parameters in variables at all. (2) If the 'execute' call that created the result set is in a different context from the 'nextresults' call that finally made the output parameters available for inspection, there is no obvious right place to store the parameters. (3) In any case, storing them directly in variables pollutes the namespace. Rather, the 'resultset' object will support an 'outputparams' method that will return a dictionary whose keys are the names of OUT and INOUT parameters, and whose values are the returned parameter values. This method may be called at any time, but values are promised to be present only after 'nextresults' has returned 0. If output parameters in variables are desired, converting them is a simple matter: dict for {name value} [$resultset outputparams] {set $name $value} IV. Timing of the changes. Obviously, all of this is a fair amount of work, and I have little time to spare at the moment. I propose that initially the 'preparecall' method be modified in all drivers to return an error with SQLstate = IM001 (driver does not support the requested function). I next propose to do multiple result sets and output parameters in ODBC (since that is used for the "big iron" databases where stored procedures are most likely to be unavoidable). Next step will be implementing 'escape' and 'execdirect' in MySQL, allowing for session variables in MySQL and PostgreSQL, and perhaps doing stored function calls. Full-up 'preparecall' functionality for PostgreSQL and MySQL will have to come last (with MySQL trailing the pack). Obviously, I welcome comments about whether this is headed in the right direction. And I beg for implementation assistance from any C programmers who are interested in speeding up the work! V. Summary of API changes. connection: preparecall Immediately, return an error with SQLstate=IM001 on all platforms. Lift this restriction as correct implementations of stored procedure calls become available. They will be implemented as sketched above. escape (MySQL only) New method that accepts a string and returns the string with MySQL metacharacters replaced with escape sequences. (Used to protect 'evaldirect' against SQL injection attacks.) evaldirect (MySQL only) New method that accepts a single parameter - a string of SQL code to execute, and evaluates it without preparing. The method will yield a result set, and all columns of the result set will have the 'string' data type. foreach allrows These two convenience methods will be modified to concatenate multple results if multiple results are present. statement: foreach allrows These two convenience methods will be modified to concatenate multple results if multiple results are present. resultset: nextresults This method will advance to the next group of rows in the result set, returning 0 if the current group was the last and 1 if more rows remain. It will also adjust the internal state of the object so that 'columns' and 'rowcount' will reflect the data pertaining to the next set of rows. outputparams After 'nextresults' returns 0, this method may be called to retrieve a dictionary of the OUT and INOUT parameters (keys are parameter names, and values are parameter values). foreach allrows These two convenience methods will be modified to concatenate multple results if multiple results are present. |
From: Kevin K. <kk...@ny...> - 2011-02-21 04:35:05
|
Dirk, I've finally managed to make a version of TDBC that I think will allow for you to work around stored procedure limitations. (I'm still working on making OUT and INOUT parameters work; that turns out to be seriously tricky, since ODBC isn't really willing to tell a caller in advance whether a parameter in an unknown statement is IN, OUT or INOUT.) But you should be able to do what you suggested and set up a call like: set name fred set data [$connection allrows { DECLARE @x INTEGER; EXECUTE lookup_person :name, @x; SELECT @x AS result; }] and get a row like {result 1-518-555-1212} Essentially, the changes are: - @ is no longer a special character in the SQL accepted by tdbc::odbc (or tdbc::postgres or tdbc::mysql). This allows for user variables on those three drivers. - tdbc::odbc and tdbc::sqlite3 now allow semicolons in statements. - All drivers now have a 'nextresults' method on result sets so that a statement can return multiple groups of results (either because it has multiple semicolon-separated pieces, or because a stored procedure returns multiple cursors). 'nextresults' is called after processing a set of rows ('nextrow', 'nextlist' or 'nextdict' has returned 0) and returns 1 if there's another set of rows to process, or 0 if the statement is completely finished. - columns and rowcount on result sets are updated at 'nextresults' - 'allrows' and 'foreach' methods continue as long as 'nextresults' returns 1. The '-columnsvariable' gets updated every time 'nextresults' is called. If this gives you enough to make progress, let me know and I'll try to roll up another beta release. And I'll try to keep plugging on real OUTPUT parameters! -- 73 de ke9tv/2, Kevin |