From: Twylite <tw...@cr...> - 2008-06-19 08:30:33
|
Thanks Kevin, please see replies below. >> - There is no support for scoped variables - the variableName MUST be >> in the params dict or the caller's scope and nowhere else? > That is correct. In particular, I didn't want to have to specify > what a scoped variable means when a parameter dictionary is supplied. >> (6) The support for SQL tokenisation says that "'$', ':' and '@' are >> all variable substitutions; the remainder of the token string is a >> variable name." This doesn't match completely with behaviour >> specified for "$dbhandle prepare" which only specified ':'. Further, >> allowing '$' for substitutions complicates (rather than simplifies) >> SQL code in Tcl as additional escaping will be required. >> Which behaviour is correct? > The tokenizer was lifted wholesale from SQLite and contributed by > Richard Hipp. I don't see the need for any substitution character > other than ':', but haven't quite got around to removing the others. Great - can the TIP be updated to clarify that only ':' is used for substitution and scoping is explicitly not permitted (i.e. the variableName MUST be in the params dict or the caller's scope and nowhere else)? >> - the reason that TCL_BREAK and TCL_CONTINUE should cause a commit >> rather than a rollback? > It was "principle of least astonishment." It seems reasonable to > consider exiting a loop to be a "normal" action: > > foreach value $list { > db transaction { > ... do something complicated with $value > if {$done} break > ... do some more stuff > } > } > > seems like something that a user would expect to commit rather than > rollback. Okay, sounds logical to me. >> (3) Why are the functions "$dbhandle tables" and "$dbhandle columns" >> a MUST? I was under the impression that not all databases supported >> this functionality, and that others may require >> less-than-straightforward implementations? >> For example, my understanding of PostgreSQL (please feel free to >> correct me/it) is that one can only get this information by querying >> the system catalog (tables), which may change depending on the DBMS >> version. > I'm willing once again to downgrade to SHOULD, but the fact that you > have to query a system catalog for the information is nothing unusual. > Oracle and Sybase also require you to go after the system catalog > to get the stuff. Certainly the Postgres ODBC and JDBC drivers manage > to provide comparable functionality. I'd prefer not to relax this > one until someone demonstrates a database where it's inordinately > difficult to get the lists. I'm more concerned about the implications for database extensions when DBMS versions are upgraded. This may result in an incompatibility that affects _only these two functions_, but if you rely on their presence or have no way to determine if they are supported this is the difference between "works" and "doesn't work". I'd be interested to hear the feelings of DB extension developers in this regard? Also, is it worth considering a "compatibility" indicator in the DBI so that code can query what "level" of support is available before trying to use it? > (4) Is it possible to determine if a transaction is in progress for a > given database connection? > > No. It would be easy enough to add, and a half-page TIP can add it, > so I don't think this particular issue is worth holding up the vote. Any chance of just adding it to the TIP now? > Prepare is not expected to support multiple statements; in particular, > no provision is made for multiple result sets. If a given database > implementation decides to allow for multiple statements (presumably > containing at most one SELECT, I would presume that the statements > would be executed as a group. (Although the tokenizer recogizes ';', > both the database interfaces that I've code it reject it.) Hmm, this one is of some concern. I've often written and encountered multi-part SQL statements. As a trivial example consider a database upgrade script which may need to create/alter roles and tables, insert information about the upgrade into a history table, etc. It is also common to have database install scripts as a single .sql file that is executed "as a single statement" (by which I mean the code outside the DBI doesn't have to parse it, but rather hands the file-as-a-string to the DBI to execute). How do you see code that uses the DBI handling these multi-part statements? >> (2) I am also requesting a configuration option "-nullvalue val" >> (where val defaults to {}). In the result set interface the commands >> "nextlist" and "nextrow -as lists" , if a cell in the row is NULL, >> the configured nullvalue MUST be stored as its value. > Several others have requested this, as well. I'm not against > it in principle. My guess is that it would be most flexible > to configure it at the level of individual statements, rather > than connection-wide. > > Nevertheless, it can be added compatibly later; can we maybe > get some alpha experience with -as dicts before resorting to > it? I suspect that the latter will prove to be the preferred > way to deal with NULLs. (Also, let's stay away from further > discussion of NULL in this message.) It would be more flexible at statement level, but also less usable as it would have to be set every time. Both would of course be the most flexible/usable, and the most effort to implement ;) The purpose of nullvalue in my mind is not to try to offer another way of identifying NULLs, but to allow users of the "-as lists" functionality more flexibility in how they want to represent null data. For example if I want to show "(null)" in a report instead of "" I need either a nullvalue, or I must use the "-as dicts" interface in conjunction with "$resultset columns" and a foreach loop. Besides the horrible performance of this approach (I know, because I have to use something similar at the moment) it is ugly and noisy. It also means that "allrows -as lists" becomes pretty much unusable if you don't want an empty string as the nullvalue. I really think this needs to be added now. There was also another independent request to tcl-tdbc for this functionality within the last 24 hours. >> (3) In the section "Connecting to a database" I would like to suggest >> a recommendation that implementations MAY support generic URI syntax >> to specify the connection parameters. This approach is common e.g. >> in web applications and frameworks, and spares the user from the need >> to learn a different syntax for each supported database. >> We have implemented such a scheme in our proprietary DBI in response >> to the pain of pgintcl's conninfo structure ;( > Uhm. They MAY support URI syntax, or, indeed, anything else. > SQLite uses file names. ODBC connection strings are well defined > if Byzantine. These things are nothing if not varied, and > will in any case NOT be portable from one database to another. > > It's not the parsing, it's the mapping of the URI to the semantics > of the underlying engine that's going to be the hard part. Since the > names are going to be inherently unportable in any case, where is the > advantage in unifying them? The point of TDBC - unless I'm missing something - is to have an abstraction layer for database access. This confers two advantages: (a) A single API that developers need to learn, rather than one API per DBMS (b) Portability, allowing multiple DBMS support and/or easy targeting of a different DBMS. A recommendation that database extensions _attempt_ to handle connections in a common way _where possible and sensible_ increases these advantages. The ODBC connection string is a good example of this: despite the many ways to specify a database connection ODBC has defined a single syntax that does the job. There are core elements that all databases handle (Driver, Uid, Pwd), core elements for remote databases (Server, Port, Database), core elements for local databases (Dbq/Database), and additional database-specific elements. But there is one syntax, one set of rules that developers need to understand, and one set of core elements that get you 90% of the way there. I suggested URIs because I can represent typical/default connections to every database I have used as a URI, be it postgreSQL, mysql, mssql, sqlite, whatever. Let me instead make a more generic but still compatible suggestion: a recommendation in the TIP that dbhandle offer a constructor that takes a dict of options, and that they support _where possible and relevant_ certain well-known keys in that dict: driver/provider, username, password, host, port, database, path, filename, extension. Using a dict (or other common Tcl representation) distances the DBI user from the syntax of the DB-specific connection string. This means that the DBI user doesn't have to know how to correctly quote & escape values - the DBI handles these DB-specific details. If you're still not convinced then take some of the common Tcl database extensions out there, that use their own connection strings, and build the connection string using a user-provided password (i.e. from a UI). And then, for fun, make the password something like a{b$c'd"e--f;g\h[i:j . Why is code in _your_ application figuring out how to quote this correctly for a particular database, rather than letting the DBI do it? Using well-known keys in that dict makes the information _somewhat_ portable. For example the same URI "username:password@host:port/dbname" could easily be used to create a pgsql, mysql, mssql, or oracle database connection to a local or remote server, if they all use the same well-known keys and provide sensible defaults. That means I'm spending less time writing a translation layer between my UI or config file and the DBI to convert from {ui_key value} to {dbi_key value}. A _recommendation_, focused on having the DBI do magic quoting and following the principles of the ODBC connection string (name=value + common keys) can't hurt, and it can provide advantages in portability, understandability and reduced bugs in connection string handling. >> (2) I am concerned about the API of the "allrows" and "foreach" >> iterators - the manner in which these commands accept and handle >> parameters is both confusing and (somewhat) limiting. >> I find the kludgy "-as lists|dicts" and ordering of parameters >> particularly distasteful. >> I would also like to see some clarification of the interaction of >> these commands with transactions. > Let me review this one. I certainly want to keep the API > involving discrete result sets and 'nextrow' - it's needed for > client-side merges, if nothing else Absolutely! I am in no way suggesting that these should go. I understand 'foreach' / 'allrows' / 'tdbc::execute' to be helpers that drive the underlying dbhandle/statement/resultset to make it easier to retrieve and process results. Regards, Trevor |