Re: [Tcl-tdbc] Last-minute comments on TIP #308

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Thanks Kevin, please see replies below.
>> - There is no support for scoped variables - the variableName MUST be 
>> in the params dict or the caller's scope and nowhere else?
> That is correct.  In particular, I didn't want to have to specify
> what a scoped variable means when a parameter dictionary is supplied.
>> (6) The support for SQL tokenisation says that "'$', ':' and '@' are 
>> all variable substitutions; the remainder of the token string is a 
>> variable name."  This doesn't match completely with behaviour 
>> specified for "$dbhandle prepare" which only specified ':'.  Further, 
>> allowing '$' for substitutions complicates (rather than simplifies) 
>> SQL code in Tcl as additional escaping will be required.
>> Which behaviour is correct?
> The tokenizer was lifted wholesale from SQLite and contributed by
> Richard Hipp.  I don't see the need for any substitution character
> other than ':', but haven't quite got around to removing the others.
Great - can the TIP be updated to clarify that only ':' is used for 
substitution and scoping is explicitly not permitted (i.e. the 
variableName MUST be in the params dict or the caller's scope and 
nowhere else)?

>> - the reason that TCL_BREAK and TCL_CONTINUE should cause a commit 
>> rather than a rollback?
> It was "principle of least astonishment."  It seems reasonable to
> consider exiting a loop to be a "normal" action:
>
>     foreach value $list {
>         db transaction {
>             ... do something complicated with $value
>             if {$done} break
>             ... do some more stuff
>         }
>     }
>
> seems like something that a user would expect to commit rather than
> rollback.
Okay, sounds logical to me.

>> (3) Why are the functions "$dbhandle tables" and "$dbhandle columns" 
>> a MUST?  I was under the impression that not all databases supported 
>> this functionality, and that others may require 
>> less-than-straightforward implementations?
>> For example, my understanding of PostgreSQL (please feel free to 
>> correct me/it) is that one can only get this information by querying 
>> the system catalog (tables), which may change depending on the DBMS 
>> version.
> I'm willing once again to downgrade to SHOULD, but the fact that you
> have to query a system catalog for the information is nothing unusual.
> Oracle and Sybase also require you to go after the system catalog
> to get the stuff.  Certainly the Postgres ODBC and JDBC drivers manage
> to provide comparable functionality.  I'd prefer not to relax this
> one until someone demonstrates a database where it's inordinately
> difficult to get the lists.
I'm more concerned about the implications for database extensions when 
DBMS versions are upgraded.  This may result in an incompatibility that 
affects _only these two functions_, but if you rely on their presence or 
have no way to determine if they are supported this is the difference 
between "works" and "doesn't work".
I'd be interested to hear the feelings of DB extension developers in 
this regard?
Also, is it worth considering a "compatibility" indicator in the DBI so 
that code can query what "level" of support is available before trying 
to use it?

> (4) Is it possible to determine if a transaction is in progress for a 
> given database connection?
>
> No.  It would be easy enough to add, and a half-page TIP can add it,
> so I don't think this particular issue is worth holding up the vote.
Any chance of just adding it to the TIP now?

> Prepare is not expected to support multiple statements; in particular,
> no provision is made for multiple result sets.  If a given database
> implementation decides to allow for multiple statements (presumably
> containing at most one SELECT, I would presume that the statements
> would be executed as a group.  (Although the tokenizer recogizes ';',
> both the database interfaces that I've code it reject it.)
Hmm, this one is of some concern.  I've often written and encountered 
multi-part SQL statements.  As a trivial example consider a database 
upgrade script which may need to create/alter roles and tables, insert 
information about the upgrade into a history table, etc.  It is also 
common to have database install scripts as a single .sql file that is 
executed "as a single statement" (by which I mean the code outside the 
DBI doesn't have to parse it, but rather hands the file-as-a-string to 
the DBI to execute).
How do you see code that uses the DBI handling these multi-part statements?

>> (2) I am also requesting a configuration option "-nullvalue val" 
>> (where val defaults to {}).  In the result set interface the commands 
>> "nextlist" and "nextrow -as lists" , if a cell in the row is NULL, 
>> the configured nullvalue MUST be stored as its value.
> Several others have requested this, as well.  I'm not against
> it in principle.  My guess is that it would be most flexible
> to configure it at the level of individual statements, rather
> than connection-wide.
>
> Nevertheless, it can be added compatibly later; can we maybe
> get some alpha experience with -as dicts before resorting to
> it?  I suspect that the latter will prove to be the preferred
> way to deal with NULLs.  (Also, let's stay away from further
> discussion of NULL in this message.)
It would be more flexible at statement level, but also less usable as it 
would have to be set every time.  Both would of course be the most 
flexible/usable, and the most effort to implement ;)

The purpose of nullvalue in my mind is not to try to offer another way 
of identifying NULLs, but to allow users of the "-as lists" 
functionality more flexibility in how they want to represent null data.  
For example if I want to show "(null)" in a report instead of "" I need 
either a nullvalue, or I must use the "-as dicts" interface in 
conjunction with "$resultset columns" and a foreach loop.  Besides the 
horrible performance of this approach (I know, because I have to use 
something similar at the moment) it is ugly and noisy.  It also means 
that "allrows -as lists" becomes pretty much unusable if you don't want 
an empty string as the nullvalue.

I really think this needs to be added now.  There was also another 
independent request to tcl-tdbc for this functionality within the last 
24 hours.

>> (3) In the section "Connecting to a database" I would like to suggest 
>> a recommendation that implementations MAY support generic URI syntax 
>> to specify the connection parameters.  This approach is common e.g. 
>> in web applications and frameworks, and spares the user from the need 
>> to learn a different syntax for each supported database.
>> We have implemented such a scheme in our proprietary DBI in response 
>> to the pain of pgintcl's conninfo structure ;(
> Uhm.  They MAY support URI syntax, or, indeed, anything else.
> SQLite uses file names.  ODBC connection strings are well defined
> if Byzantine.  These things are nothing if not varied, and
> will in any case NOT be portable from one database to another.
>
> It's not the parsing, it's the mapping of the URI to the semantics
> of the underlying engine that's going to be the hard part. Since the
> names are going to be inherently unportable in any case, where is the
> advantage in unifying them?
The point of TDBC - unless I'm missing something - is to have an 
abstraction layer for database access.  This confers two advantages:
(a) A single API that developers need to learn, rather than one API per DBMS
(b) Portability, allowing multiple DBMS support and/or easy targeting of 
a different DBMS.

A recommendation that database extensions _attempt_ to handle 
connections in a common way _where possible and sensible_ increases 
these advantages.

The ODBC connection string is a good example of this: despite the many 
ways to specify a database connection ODBC has defined a single syntax 
that does the job.  There are core elements that all databases handle 
(Driver, Uid, Pwd), core elements for remote databases (Server, Port, 
Database), core elements for local databases (Dbq/Database), and 
additional database-specific elements.  But there is one syntax, one set 
of rules that developers need to understand, and one set of core 
elements that get you 90% of the way there.

I suggested URIs because I can represent typical/default connections to 
every database I have used as a URI, be it postgreSQL, mysql, mssql, 
sqlite, whatever.

Let me instead make a more generic but still compatible suggestion: a 
recommendation in the TIP that dbhandle offer a constructor that takes a 
dict of options, and that they support _where possible and relevant_ 
certain well-known keys in that dict: driver/provider, username, 
password, host, port, database, path, filename, extension.

Using a dict (or other common Tcl representation) distances the DBI user 
from the syntax of the DB-specific connection string.  This means that 
the DBI user doesn't have to know how to correctly quote & escape values 
- the DBI handles these DB-specific details.  If you're still not 
convinced then take some of the common Tcl database extensions out 
there, that use their own connection strings, and build the connection 
string using a user-provided password (i.e. from a UI).  And then, for 
fun, make the password something like a{b$c'd"e--f;g\h[i:j . 
Why is code in _your_ application figuring out how to quote this 
correctly for a particular database, rather than letting the DBI do it?

Using well-known keys in that dict makes the information _somewhat_ 
portable.  For example the same URI "username:password@host:port/dbname" 
could easily be used to create a pgsql, mysql, mssql, or oracle database 
connection to a local or remote server, if they all use the same 
well-known keys and provide sensible defaults.  That means I'm spending 
less time writing a translation layer between my UI or config file and 
the DBI to convert from {ui_key value} to {dbi_key value}.

A _recommendation_, focused on having the DBI do magic quoting and 
following the principles of the ODBC connection string (name=value + 
common keys) can't hurt, and it can provide advantages in portability, 
understandability and reduced bugs in connection string handling.

>> (2) I am concerned about the API of the "allrows" and "foreach" 
>> iterators - the manner in which these commands accept and handle 
>> parameters is both confusing and (somewhat) limiting.
>> I find the kludgy "-as lists|dicts" and ordering of parameters 
>> particularly distasteful.
>> I would also like to see some clarification of the interaction of 
>> these commands with transactions.
> Let me review this one.  I certainly want to keep the API
> involving discrete result sets and 'nextrow' - it's needed for
> client-side merges, if nothing else
Absolutely!  I am in no way suggesting that these should go.  I 
understand 'foreach' / 'allrows' / 'tdbc::execute' to be helpers that 
drive the underlying dbhandle/statement/resultset to make it easier to 
retrieve and process results.

Regards,
Trevor

Re: [Tcl-tdbc] Last-minute comments on TIP #308

The Tool Command Language implementation

Re: [Tcl-tdbc] Last-minute comments on TIP #308