Thread: [Tcl-tdbc] TDBC, stored procedures, output parameters, and multiple results.

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

Gentlebeings,

I've managed thoroughly to embarrass myself by letting TDBC escape with
its stored procedure support thought out only very incompletely.

The following document details what's wrong, and my best guesses about
how to fix it.

I welcome comments; I welcome programming assistance even more.

-- 
73 de ke9tv/2, Kevin
-- 

	     TDBC, Stored Procedures and Multiple Results
	     ===== ====== ========== === ======== =======

Introduction.

Recent user reports that stored procedures with output parameters
don't work in tdbc::odbc have led me to investigate the reasons - and
discover, much to my chagrin, that the implementation of the
'preparecall' method and associated handling simply was never
completed, and the specification is quite loose - too loose to be
usable. For this reason, I'm circulating this email to explore the
ideas of how it ought to work, and try to move toward specifying the
missing pieces. I've half a mind just to leave them out, on the
grounds that they are not part of the "95% solution" that tdbc was
intended to be, but I've had more than my share of experience with the
"jackbooted thug" style of database administrator who supports access
to his database only through SP calls, and so I'm fairly convinced
that we need to have *some* sort of support for them.

I. Multiple result sets: the easy part.

The first, and easiest, missing piece to full stored-procedure
support is to recognize that several databases (at least SQL Server,
MySQL and Oracle) allow stored procedures to return multiple
result sets. (Note: I am not referring to 'refcursors' in this
discussion, just to stored procedures that, for instance, return
in order the result of each statement that they execute.)
Fortunately, it appears that a fairly simple change to the TDBC API
can cover multiple returns from stored procedures: to the
result set object, we add the method:

     $resultSet nextResults

which does the following:

   - discards any unfetched rows in the current result set.
   - checks if there are further results and returns 0 if there are
     not.
   - advances to the next result set
   - adjusts the internal state so that the 'columns' and 'rowcount'
     methods act on the next result set
   - returns 1

The 'foreach' and 'allrows' methods on result sets, statements and
connections will be adjusted to have an outer loop that calls
'nextResults', updates any '-columnsvar' with the columns of the next
result, and continues to iterate. Only when all results are retrieved
will 'foreach' and 'allrows' return. (This behaviour is entirely
analogous to what the sqlite3 Tcl bindings do in the case of multiple
semicolon-separated statements.)

Compatibility of multiple result sets:

ODBC and SQL/CLI: The SQLMoreResults call is precisely analogous
to the  '$resultSet nextResults' method. Some internal changes to the
tdbc::odbc driver will be needed to rebind the result set after
'$resultSet nextResults' is called. The 'prepare' method on
connections will also be changed to allow semicolons in the statement;
given that all the result sets can be handled, there is no reason to
forbid them.

MySQL: The MySQL prepared statement interface does not support
multiple result sets, so MySQL stored procedures that return them must
be invoked through the unprepared 'mysql_query'. Given the risks of
SQL injection attacks, 'mysql_query' shall be used only for stored
procedure calls; these will be discussed at greater length below.  The
'prepare' method on connections will continue to forbid semicolons in
statements.

PostgreSQL: All PostgreSQL statements return single results. For
PostgreSQL, therefore, the 'nextResults' method will always return 0.
The 'prepare' method on connections will continue to forbid semicolons
in statements.

SQLite: SQLite does not have stored procedures in the ISO sense; it
has no CALL statement that potentially expects OUT or INOUT
parameters. (It does have SELECT statements that invoke user-defined
functions, but these present no problem to the current API.) The
proposal here, since SQLite can do it readily, is to allow semicolons
in the SQL string presented to the 'prepare' method, and to allow for
multiple results if the given statement contains semicolons.

Other databases: At present, SQL Server and Oracle are accessed
only through the ODBC layer. ODBC provides full stored-procedure
support for both of these, and one must imagine that what an ODBC
driver can do, a native driver can do as well. Similarly, SQL/CLI
(essentially, ODBC) is the primary way to access DB/2.  Compatibility
with hypothetical drivers for other databases is not considered.

II. The 'preparecall' method.

Recall that the 'preparecall' method accepts only a simpleminded
syntax:
     :result = functionname(:arg1, :arg2, ...)
and
     procedurename(:arg1, :arg2, ...)

A.  ODBC and SQL/CLI.

     The two calls map precisely onto the ODBC 'procedure call escape
     sequence':

         {?=call functionname(?, ?, ...)}
         {call procedurename(?, ?, ...)}

     Once this mapping is done, the only further requirement is correct
     handling of multiple results and of OUT and INOUT parameters,
     which is discussed below.

B.  MySQL

     MySQL is rather in disarray with respect to the support of stored
     procedures.  Stored functions are easy: the call
        :result = functionname(:arg1, :arg2, ...)
     can be replaced with
        SELECT functionname(:arg1, :arg2, ...)
     followed by storing the singleton result of the call into :result.
     But procedures face the problem that parameter types cannot
     be determined readily, and this determination is necessary to
     figure out whether :arg1 needs to be replaced with a quoted
     string (suitably escaped) or a numeric value. (Remember that
     stored procedure calls cannot be prepared because of the
     possibility of multiple result sets.)

     In MySQL 5.5 and beyond, the parameter data are readily available
     by querying INFORMATION_SCHEMA.PARAMETERS, and once MySQL 5.5 is
     more widely deployed, this approach will probably be the best.
     Nevertheless, MySQL 5.1 is still in widespread use (current Debian
     and Red Hat Enterprise systems still have it), so this approach
     does not appear entirely feasible at present.

     The workaround in MySQL 5.1 is for the client to retrieve the
     procedure definition and parse it for the parameter types. The
     definition can be retrieved in one of two ways: either a query of
     the 'mysql.proc' table or executing a 'SHOW CREATE PROCEDURE'
     statement. The former requires a DBA to give the user SELECT
     permission on 'mysql.proc' - something that database hosting
     providers are likely to be unwilling to do. The latter requires
     either SELECT permission on 'mysql.proc' or else that the user be
     the owner of the procedure - again, something that cannot be
     presumed with a hosted database.

     The ultimate fallback is to require the Tcl program to do
     'paramtype' calls on the statement to supply parameter type
     information itself. This method is at least guaranteed to work,
     but is likely to be cumbersome and poorly received.

     My inclination for MySQL is to depend on
     INFORMATION_SCHEMA.PARAMETERS, and if a query to this table fails,
     then require the user to specify the parameters explicitly.

     Once the parameters are known, the translation of a stored
     procedure call will work with session variables. A call like
         procedurename(:param1, :param2, :param3)
     where :param1 is an IN VARCHAR, :param2 is an INOUT INTEGER,
     and :param3 is an OUT VARCHAR, would translate to a sequence
     like:

         @param1 = 'escaped string'
         @param2 = numeric-value
         CALL procedurename(@param1, @param2, @param3)

     followed by:

         SELECT @param2, @param3

     Support for MySQL stored procedures, because of its complexity
     (including the fact that prepared statements don't work with
     stored procedure calls) is likely to be the lowest-priority of
     anything discussed in this document.

     As an interim step, I propose to modify the tdbc::mysql driver
     so that @variable gets passed through into the queries, so that
     client code can use session variables.  I also propose to
     implement an 'execdirect' method that provides a simple escape
     to mysql_real_query() so that a client application can invoke
     SQL code that cannot appear in a prepared statement. (This method
     will do no parameter substitution, and will return a result set
     where every column type is 'string': there will be no
     corresponding 'statement' object.) This will at least provide
     a workaround so that TDBC client code can access stored
     procedures, albeit with an inconvenient API (and the need to
     escape strings - for which an 'escape' method on the 'connection'
     object will also be provided).

C.  PostgreSQL

     Just as with MySQL, stored function calls can be handled
     readily by translating

         :result = function(:param1, :param2, ...);

     into a SELECT statememt.  PostgreSQL's view of stored procedures
     is also slightly eccentric. While procedures can be declared as
     having OUT and INOUT parameters, they actually have only IN
     parameters expressed in the CALL statement: the OUT and INOUT
     parameters appear as columns in the procedure's final result set.
     A simple
         CALL procedure(:param1, :param2, ...)
     where only IN and INOUT parameters are listed, is all that is
     needed.

     Once again, parameter type determination is problematic, but
     in this case, a third party has solved most of the problem for
     us by providing code at
     http://www.alberton.info/postgresql_meta_info.html
     that retrieves a procedure or function's parameter definitions
     from the somewhat awkward form that they appear in the
     'pg_catalog.pg_proc' table.

D.  SQLite

     SQLite lacks stored procedures in the SQL/CLI sense.
     As with the other two databases, function calls
         :result = functionname(:param1, :param2, ...)
     can be mapped to SELECT statements. Procedure calls passed to
     'preparecall' will return an error with a SQLstate of 42000
     (syntax error or access denied).

III. Output parameters

PostgreSQL and SQLite have no need for specific support of output
parameters, since procedure and function outputs appear in the result
set. MySQL could also be implemented that way, since we use session
variables to stage the output parameters, and then execute a SELECT
statement to retrieve them. But ODBC or SQL/CLI requires output
parameters to be retrieved using a different API.  For ODBC, the
parameters will be stored in client-provided buffers after the final
result is retrieved (SQLMoreResults returns SQL_NO_DATA or,
equivalently, the 'nextresults' method on the result set returns 0.)

It is tempting to put output parameters directly into Tcl variables,
but it is also quite wrong for several reasons:

     (1) If the 'execute' call that created a result set supplied
         a dictionary, then the program does not wish to have
	parameters in variables at all.
     (2) If the 'execute' call that created the result set is in
         a different context from the 'nextresults' call that finally
         made the output parameters available for inspection, there
         is no obvious right place to store the parameters.
     (3) In any case, storing them directly in variables pollutes the
         namespace.

Rather, the 'resultset' object will support an 'outputparams' method
that will return a dictionary whose keys are the names of OUT and
INOUT parameters, and whose values are the returned parameter values.
This method may be called at any time, but values are promised
to be present only after 'nextresults' has returned 0. If output
parameters in variables are desired, converting them is a simple
matter:

     dict for {name value} [$resultset outputparams] {set $name $value}

IV. Timing of the changes.

Obviously, all of this is a fair amount of work, and I have little
time to spare at the moment. I propose that initially the
'preparecall' method be modified in all drivers to return an error
with SQLstate = IM001 (driver does not support the requested
function).  I next propose to do multiple result sets and output
parameters in ODBC (since that is used for the "big iron" databases
where stored procedures are most likely to be unavoidable).
Next step will be implementing 'escape' and 'execdirect' in MySQL,
allowing for session variables in MySQL and PostgreSQL, and perhaps
doing stored function calls.  Full-up 'preparecall' functionality for
PostgreSQL and MySQL will have to come last (with MySQL trailing the
pack).

Obviously, I welcome comments about whether this is headed in the
right direction.  And I beg for implementation assistance from any
C programmers who are interested in speeding up the work!

V. Summary of API changes.

     connection:
         preparecall
             Immediately, return an error with SQLstate=IM001 on all
             platforms. Lift this restriction as correct
             implementations of stored procedure calls become
             available. They will be implemented as sketched above.
         escape (MySQL only)
             New method that accepts a string and returns the
             string with MySQL metacharacters replaced with
             escape sequences. (Used to protect 'evaldirect'
             against SQL injection attacks.)
         evaldirect (MySQL only)
             New method that accepts a single parameter - a
             string of SQL code to execute, and evaluates it without
             preparing. The method will yield a result set, and all
             columns of the result set will have the 'string' data
             type.
         foreach
         allrows
             These two convenience methods will be modified to
             concatenate multple results if multiple results
             are present.

     statement:
         foreach
         allrows
             These two convenience methods will be modified to
             concatenate multple results if multiple results
             are present.

     resultset:
         nextresults
             This method will advance to the next group of rows
             in the result set, returning 0 if the current group
             was the last and 1 if more rows remain. It will also
             adjust the internal state of the object so that
             'columns' and 'rowcount' will reflect the data pertaining
             to the next set of rows.
         outputparams
             After 'nextresults' returns 0, this method may be called
             to retrieve a dictionary of the OUT and INOUT parameters
             (keys are parameter names, and values are parameter
             values).
         foreach
         allrows
             These two convenience methods will be modified to
             concatenate multple results if multiple results
             are present.

Thread: [Tcl-tdbc] TDBC, stored procedures, output parameters, and multiple results.

The Tool Command Language implementation

tcl-tdbc