Re: [Tcl-tdbc] Last-minute comments on TIP #308

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Twylite wrote:
> Sorry to get these comments in so late ... never enough time :(
> 
> SECTION 1: REQUEST FOR CLARIFICATIONS
> 
> (1) Substitutions in SQL statements: "The database interface MUST 
> support substitutions in SQL-code. Each substitution request has the 
> form :variableName. That is, each substitution request begins with a 
> literal colon (:), followed by a letter or underscore, followed by zero 
> or more letters, digits, or underscores."
> I take it that:
> - The variable name ends with the first non-(letter|digit|underscore) 
> character, so that if X is 10 then ":X:" becomes "10:"?
> - There is no support for scoped variables - the variableName MUST be in 
> the params dict or the caller's scope and nowhere else?

That is correct.  In particular, I didn't want to have to specify
what a scoped variable means when a parameter dictionary is supplied.
Moreover, there's an opening there for scope creep; someone else
will want array elements, or ${...} notation, or $env(LC_DEITY) help
us, command substitution.  I don't want to try to address how
all the details of Tcl's notation might interact with SQL's.

> (2) In the "$dbhandle transaction <script>" construct, can someone 
> please explain:
> - the reason that TCL_BREAK and TCL_CONTINUE should cause a commit 
> rather than a rollback?

It was "principle of least astonishment."  It seems reasonable to
consider exiting a loop to be a "normal" action:

     foreach value $list {
         db transaction {
             ... do something complicated with $value
             if {$done} break
             ... do some more stuff
         }
     }

seems like something that a user would expect to commit rather than
rollback.

The downside of that is that if the code is not in a loop,
the "invoked [break] but not in a loop" error gets thrown.
But that error isn't thrown until the stack is already unwound;
within a transaction, there's no way to detect it.

> - With regard to the note "(Note: Scripts inside a transaction command 
> SHOULD avoid use of the return -code or return -level operations. If a 
> script returns from a transaction, with any combination of return 
> options, the transaction SHALL be committed.)" does this refer only to 
> the use of "return -code" directly within the script, or also within 
> procs called by the script?

It refers to the use of a [return] that will exit the transaction.
Returns that land on code within the transaction are fine.

> (3) Why are the functions "$dbhandle tables" and "$dbhandle columns" a 
> MUST?  I was under the impression that not all databases supported this 
> functionality, and that others may require less-than-straightforward 
> implementations?
> For example, my understanding of PostgreSQL (please feel free to correct 
> me/it) is that one can only get this information by querying the system 
> catalog (tables), which may change depending on the DBMS version.

I'm willing once again to downgrade to SHOULD, but the fact that you
have to query a system catalog for the information is nothing unusual.
Oracle and Sybase also require you to go after the system catalog
to get the stuff.  Certainly the Postgres ODBC and JDBC drivers manage
to provide comparable functionality.  I'd prefer not to relax this
one until someone demonstrates a database where it's inordinately
difficult to get the lists.

> (4) Is it possible to determine if a transaction is in progress for a 
> given database connection?

No.  It would be easy enough to add, and a half-page TIP can add it,
so I don't think this particular issue is worth holding up the vote.

> (5) With regards to autocommit support the following statement is made: 
> "Statements executed against the database when no transaction is in 
> progress (before the first starttransaction or after all started 
> transactions have been either committed or rolled back) SHOULD be 
> auto-committed; that is, each such statement SHOULD be executed as if a 
> starttransaction command preceded the statement and a commit command 
> followed it."
> If the statement given to "$dbhandle prepare" is in fact multiple 
> statements (separated by ;) how does autocommit behave?
> For that matter, is prepare expected to support multiple statements?  
> Should this be clarified?

Prepare is not expected to support multiple statements; in particular,
no provision is made for multiple result sets.  If a given database
implementation decides to allow for multiple statements (presumably
containing at most one SELECT, I would presume that the statements
would be executed as a group.  (Although the tokenizer recogizes ';',
both the database interfaces that I've code it reject it.)

> (6) The support for SQL tokenisation says that "'$', ':' and '@' are all 
> variable substitutions; the remainder of the token string is a variable 
> name."  This doesn't match completely with behaviour specified for 
> "$dbhandle prepare" which only specified ':'.  Further, allowing '$' for 
> substitutions complicates (rather than simplifies) SQL code in Tcl as 
> additional escaping will be required.
> Which behaviour is correct?

The tokenizer was lifted wholesale from SQLite and contributed by
Richard Hipp.  I don't see the need for any substitution character
other than ':', but haven't quite got around to removing the others.

> SECTION 2: REQUEST FOR TIP CHANGES/FEATURES
> 
> (1) In the section "Basic mechanics of database interfaces" the 
> following statements are made:
> - "Any of the ensembles MAY support abbreviation of its subcommands 
> according to the rules defined by Tcl_GetIndexFromObj"
> - "In all of the places where this syntax is expected, a database module 
> MAY support abbreviation of options according to the rules of 
> Tcl_GetIndexFromObj()"
> Given that code that uses the DBI "SHOULD spell out subcommands/options 
> in full", and that (mis)use of abbreviations can affect applications if 
> the DBI is extended, and that database implementations are not 
> restricted to providing only the subcommands specified by the TIP 
> (potentially requiring longer abbraviations), I would like to request 
> that ensembles MUST NOT (or at least SHOULD NOT) support abbreviation.

In an interactive shell (where the code is not, after all, expected
to be preserved), the ability to abbreviate is a bit too convenient
to dispense with it altogether.  For well or ill, the "unique prefix"
rule seems to be with us in enough other places without having
caused too many headaches.

> (2) I am also requesting a configuration option "-nullvalue val" (where 
> val defaults to {}).  In the result set interface the commands 
> "nextlist" and "nextrow -as lists" , if a cell in the row is NULL, the 
> configured nullvalue MUST be stored as its value.
> In discussions on c.l.t and tcl-core (and possibly the wiki) numerous 
> users have mentioned that they find this feature useful, and that it is 
> supported by a number of existing Tcl database extensions (including 
> oratcl, fbsql, SQLite and proprietary/in-house DBIs).

Several others have requested this, as well.  I'm not against
it in principle.  My guess is that it would be most flexible
to configure it at the level of individual statements, rather
than connection-wide.

Nevertheless, it can be added compatibly later; can we maybe
get some alpha experience with -as dicts before resorting to
it?  I suspect that the latter will prove to be the preferred
way to deal with NULLs.  (Also, let's stay away from further
discussion of NULL in this message.)

> (3) In the section "Connecting to a database" I would like to suggest a 
> recommendation that implementations MAY support generic URI syntax to 
> specify the connection parameters.  This approach is common e.g. in web 
> applications and frameworks, and spares the user from the need to learn 
> a different syntax for each supported database.
> We have implemented such a scheme in our proprietary DBI in response to 
> the pain of pgintcl's conninfo structure ;(

Uhm.  They MAY support URI syntax, or, indeed, anything else.
SQLite uses file names.  ODBC connection strings are well defined
if Byzantine.  These things are nothing if not varied, and
will in any case NOT be portable from one database to another.

> The generic syntax for URIs is defined by RFC 3986 (see 
> http://en.wikipedia.org/wiki/URI_scheme#Generic_syntax for a quick 
> reference).
> One could describe most remote database connections using:
>   scheme://username:password@host:port/dbname
> ... where scheme indicates the database type (pgsql, mysql, oracle, etc.).
> Local file-based databases could likewise be defined using:
>   scheme://username:password@/path/filename.extension
> For a constructor (something that takes database connections parameters 
> and returns a database handle) to support generic URI syntax it should:
> - Accept a uri parameter and a dict of options (or args to be treated as 
> a dict)
> - Set defaults for scheme, username, host, port, dbname, etc. as appropriate
> - Parse the URI and merge onto the dict of defaults to get a uri_dict
> - Merge the dict of options (constructor parameter) onto the uri_dict.  
> This allows things like passwords to be handled separately (one could 
> have the URI in a configuration file and solicit the password).
> 
> If this suggestion is accepted I am willing to contribute Tcl code for 
> generic URI parsing that will support the functionality described above.

It's not the parsing, it's the mapping of the URI to the semantics
of the underlying engine that's going to be the hard part. Since the
names are going to be inherently unportable in any case, where is the
advantage in unifying them?

> SECTION 3: CONCERNS & COMMENTS
> 
> (1) When retrieving the columns of a table using "$dbhandle columns 
> tablename" the results are returned as a dict.  The order of the columns 
> should be preserved (for the same reasons that the order of columns in a 
> resultset row should be preserved).  I suggest that EITHER:
> - The TIP states the assumption that dict preserves order and requires 
> that implementations MUST construct the dict in order such that [dict 
> keys] returns the ordered list of columns; OR
> - The "columns" command is changed to return information in some other 
> form, such as a resultSetHandle or list-of-tuples.

[dict] does indeed preserve order. I'll make the appropriate editorial
change in the TIP; it was indeed my intention that the keys
be returned in sequence.

> (2) I am concerned about the API of the "allrows" and "foreach" 
> iterators - the manner in which these commands accept and handle 
> parameters is both confusing and (somewhat) limiting.
> I find the kludgy "-as lists|dicts" and ordering of parameters 
> particularly distasteful.
> I would also like to see some clarification of the interaction of these 
> commands with transactions.
> 
> Code example:
>   $mydb foreach -as lists -columnsvariable c \
>     {SELECT * FROM t_table WHERE type = :type} \
>     [dict create type "book"] r {
>     puts "([join $c ","]) -> ([join $r ","])"
>   }
> Umm ... to me that doesn't read like a sentence, and I am going to be 
> hesitant to guess the meaning of some of those parameters.
> 
> On c.l.t. I presented a sample tcl::db::execute that offers the combined 
> functionality of these two commands with (IMHO) a clearer interface.  
> See http://groups.google.com/group/comp.lang.tcl/msg/bf16218a45d280e3 
> (the code will not work against the current revision of the TIP).
> 
> My suggestion is roughly:
>     $dbhandle execute ?-option value? ?...?
> Where the options are:
>   -sql stmt (required) SQL statement to execute
>   -params dict (optional) parameters/substituents for the SQL statement; 
> if not present the substituents will be obtained from variables in the 
> caller's scope
>   -columns varname (optional) if present the variable will be populated 
> with an ordered list of column names before result rows are processed.
>   -listvar varname (optional) variable to be populated with an ordered 
> list of fields in the result row
>   -dictvar varname (optional) variable to be populated with a dict of 
> fields in the result row
>   -script script (optional) if present the results will be retrieved 
> (into -listvar/-dictvar) one row at a time and the script executed.  If 
> absent then all rows are retrieved into -listvar as a list-of-lists 
> and/or into -dictvar as a list-of-dicts
> 
> The reasons that I feel this interface is cleaner are basically:
> - There are a lot of parameters, and no obvious way to order them to 
> "read like a sentence", so clarity can be enhanced by tagging each 
> parameter.
> - The interface is more extensible.  It would be easy to add a 
> -transaction or -isolation option, for example.

Let me review this one.  I certainly want to keep the API
involving discrete result sets and 'nextrow' - it's needed for
client-side merges, if nothing else - but I agree that the
'foreach' and 'allrows' are perhaps not the best notation that
can be achieved.

> (3) The "-as lists|dicts" thing.  I'm still unhappy about this one, not 
> only because I think it looks like a kludge, but because it means you 
> can get you resultset as a list OR a dict but NOT both.
> As is well noted there are deficiencies in retrieving rows as lists, and 
> other deficiencies in retrieving rows as dicts.  And there are no doubt 
> times when you have to deal with both at once, and the current interface 
> makes that difficult (if not impossible).  I have encountered such a 
> situation but I forget the specifics (which are all-important) and it's 
> getting a little late (or is that early) to go digging.  No doubt a 
> better database design and SQL statement could have solved the problem, 
> but the code in question was a report renderer that accepted a SQL 
> statement as input, so I wasn't at liberty to design around the problem.

To me, it's not clear that you'd ever need both at once.
In fact, '-as lists' was intended simply to make report renderers
a little easier to write. Certainly, '-as dicts' contains exactly
the same information, only packaged differently. In any case,
the whole discussion flows directly into your item (4), which
at your sensible request, I'm addressing in a separate message.

-- 
73 de ke9tv/2, Kevin

Re: [Tcl-tdbc] Last-minute comments on TIP #308

The Tool Command Language implementation

Re: [Tcl-tdbc] Last-minute comments on TIP #308