From: Kenny, K. B (G. Research) <ke...@ge...> - 2010-05-17 15:38:53
Attachments:
smime.p7s
|
CALL FOR DISCUSSION: TDBC introspection of referential integrity constraints, and of database indices. Purpose and scope: ================== Donal Fellows has been experimenting with an extremely simple object-relational mapping engine for TclOO and TDBC, which he's begun to discuss at http://wiki.tcl.tk/26254. In a conversation on the Tcl'ers Chat, he and I have discussed characteristics of TDBC's introspection of the database - or rather, its lack thereof - that get in the way. The purpose of this message is to open up discussion of how best to add the needed functionality to TDBC. Donal's need, as I understand it, consists of: (a) Identifying the primary key of a table; the primary key becomes an object's identity. (b) Identifying foreign keys: for a table, identifying what other tables have foreign key constraints that refer to it. Conversely, for a table, identifying its foreign keys and the tables to which they refer. The columns of a table that are foreign keys can be replaced with object identities in the object-relational mapping. (c) Identifying the indices that apply to table columns. This identification has the side effect of identifying unique key constraints (which again are candidates for replacement with object IDs). Even nonunique indices can assist developers of general query software to suggest appropriate query designs. Tentative proposal: =================== This proposal is very much partly-baked and is going out primarily to get comment. In particular, the names of methods and dictionary keys are subject to negotiation, as are the parameters of methods. (1) Determining primary keys. Each TDBC connection shall provide a method of the form: $connection primarykeys $table where $table is the name of a table in the underlying database. The return type of the 'primarykeys' call is a list of dictionaries. The list will be empty if the table lacks a primary key; a singleton is the table has a simple primary key, and of length two or more if the table has a compound primary key. The list elements shall have at least the keys columnName - Name of the column participating in the key and may have columnSequence - Ordinal position of the column in the key. constraintName - Name of the constraint defining the primary key, if known. and other columns as defined by the underlying database engine. For example, if a table has the schema CREATE TABLE mytable ( id1 INT NOT NULL, id2 INT NOT NULL, moredata VARCHAR(40), CONSTRAINT pk_mytable PRIMARY KEY (id1, id2) ) then '$connection primarykeys mytable' would be expected to return a list of two dictionaries: columnName id1 columnSequence 1 constraintName pk_mytable columnName id2 columnSequence 2 constraintName pk_mytable (Some databases might omit the 'columnSequence' and 'constraintName' columns; in any case, the key components are expected to be returned in left-to-right order.) (2) Determining foreign keys. Each TDBC connection shall provide a method of the form: $connection foreignkeys \ ?-primary tableName? ?-foreign tableName? If neither the '-primary' nor '-foreign' option is supplied, the results are unspecified. If '-primary' is supplied, the set of results is restricted to foreign keys that reference columns of the given table. If '-foreign' is supplied, the set or results is specified to foreign keys that appear in the given table. It is meaningful to specify both, to restrict results to a (usually unique) foreign key relationship between a given pair of tables. The result of the method is a list of dictionaries giving the columns participating in the foreign key relationship. The dictionaries will have at least the following keys: primaryTable - Table referenced by the foreign key primaryColumn - Column referenced by the foreign key foreignTable - Table in which the foreign key appears foreignColumn - Column name of the foreign key keySequence - Ordinal position of the column in the foreign constraint. It may contain the following keys, if they are meaningful to the underlying database and specified: primaryConstraintName - Name of the constraint as applied to the primary table foreignConstraintName - Name of the constraint as applied to the foreign table updateAction - Action to take if an UPDATE operation violates the constraint. May be one of 'cascade', 'setNull' or 'setDefault', or 'omitted (in which case an UPDATE that violates the constraint is an error). deleteAction - Action to take if a DELETE operation violates the constraint. May be one of 'cascade', 'setNull', or 'setDefault', or omitted defer - Time at which the referential integrity constraint is validated. May be one of 'deferred', 'immediate', or 'nondeferrable', or omitted. Example: If a schema contains two tables, 'department' and 'employee', with the following definitions: CREATE TABLE department( id INT PRIMARY KEY NOT NULL, name VARCHAR(40) ) CREATE TABLE employee( id INT PRIMARY KEY NOT NULL, departmentId INT, surname VARCHAR(40), givenname VARCHAR(40), CONSTRAINT fk_employee FOREIGN KEY departmentId REFERENCES department(id) ON DELETE SET NULL ) then any of the three commands $db foreignkeys -primary department $db foreignkeys -foreign employee $db foreignkeys -primary department -foreign employee will return a list of dicts that includes the value: primaryTable department primaryColumn id foreignTable employee foreignColumn departmentId foreignConstraintName fk_employee keySequence 1 updateAction setNull (3) Characterizing indices Each TDBC connection shall support a method $db indices tableName where tableName is the name of a table in the database. (The 'connection' base class will also provide a synonym, 'indexes', that forwards to 'indices'.) If the given table is found, the method returns a list of dictionaries describing the table columns that participate in index definitions. Each dictionary shall include the keys, tableName - Name of the table (which should be the same name that was passed to the 'indices' method). indexName - Name of the index (suitable for passing to a DROP INDEX statement) unique - 1 if duplicate values are forbidden, 0 if duplicates are allowed. columnName - Name of a column that participates in the index keySequence - Ordinal position of the column in the index key. direction - 'asc' or 'desc' specifying the collation order. The dictionaries may also contain the following keys if they are meaningful in context: type - The type of index (one of 'btree', 'clustered', 'content', 'hashed', or another implementation-dependent type) cardinality - The number of rows in the index if known. pages - The number of pages in the index if known. filterCondition - If the index is a filtered index, gives the filter condition (e.g., SALARY>100000) as a SQL expression. Omitted if the index is not a filtered index or the filter condition cannot be determined. As an example, if a table has the following definitions: CREATE TABLE "value"( id INT NOT NULL PRIMARY KEY, entityID INT NOT NULL, attributeID INT NOT NULL, stringValue VARCHAR(40) ) CREATE UNIQUE INDEX idx_value_entity ON "value"(entityID ASC, attributeID ASC) CREATE INDEX idx_value_attribute ON "value"(attributeID ASC, entityID ASC) then the call: $db indices value should return the following dictionaries: tableName value indexName idx_value_entity unique 1 columnName entityID keySequence 1 direction asc tableName value indexName idx_value_entity unique 1 columnName attributeID keySequence 2 direction asc tableName value indexName idx_value_attribute unique 0 columnName attributeID keySequence 1 direction asc tableName value indexName idx_value_attribute unique 0 columnName entityID keySequence 2 direction asc A few notes: ============ If any names in the returned data resolve in a catalog or schema other than the default for the connection, they should be qualified in the returned data: 'catalogName.schemaName.objectName', or whatever is appropriate to the database in question. An open question for discussion is whether the returned data should be more deeply nested; specifically, should the components of a compound key be grouped, or should the caller perform the grouping? OK, time to fire potshots at the idea. -- 73 de ke9tv/2, Kevin |
From: Larry M. <lm...@bi...> - 2010-05-18 02:53:19
|
I am _so_ not an expert in all this stuff. The only question I have is has anyone looked at what perl/python/etc do in this area? -- --- Larry McVoy lm at bitmover.com http://www.bitkeeper.com |
From: Tom J. <tom...@gm...> - 2010-05-18 22:24:38
|
On Mon, May 17, 2010 at 8:34 PM, Kevin Kenny <ke...@ac...> wrote: > Larry McVoy wrote: >> On Mon, May 17, 2010 at 11:09:50PM -0400, Kevin Kenny wrote: >>> Larry McVoy wrote: >>>> I am _so_ not an expert in all this stuff. >>>> >>>> The only question I have is has anyone looked at what perl/python/etc >>>> do in this area? >>> Yes. >> >> And? What was learned? > > ODBC implements this stuff with the SQLGetPrimaryKeys, > SQLGetForeignKeys, and SQLGetStatistics calls. > Perl's DBI essentially copies what ODBC does directly: it has > 'primary_key_info', 'foreign_key_info' and the (misleadingly named) > 'statistics_info' to query primary keys, foreign keys, and indices > respectively. JDBC also does the same thing -- in a typically > Byzantine fashion. Its 'DatabaseMetaData' interface - which must > be constructed/retrieved by a separate call upon the connection > object - supports 'getPrimaryKeys', 'getImportedKeys', > 'getExportedKeys', 'getCrossReference', and 'getIndexInfo'. > > All three of these define a very similar API to what I propose, with > perhaps slightly more awkward names of things. Perl's documentation > also warns that 'statistics_info' is 'experimental and subject to > change. > > Caveat - All three seem to use the ODBC names for things. > I dislike them - all uppercase, and not always as mnemonic as > they should be, and think them un-Tclish. I'm willing to be convinced > that they are the True Names, if enough people feel strongly > about it. > > The JDBC, ODBC and Perl API's require negotiating a result set > rather than returning lists-of-dicts directly. That interface > seemed rather like overkill to me. > > Python's DB-API has no support for schema introspection. It > appears to be under consideration for dbApi3. > http://wiki.python.org/moin/DbApi3#SchemaInformation > > PHP, predictably enough, is a hodgepodge. I'm not even going to > discuss it. SQL has an information_schema, a set of views which describe the user databases. Nice things about the information_schema is that all the values are strings and the names of columns are not as cryptic as say the pg_* tables. I have started an introspection system based upon the information_schema. Right now I only use it for browsing a database and getting table information, but this information can be used to write Tcl code using any object system, or just pure Tcl code. I've done this before using the pg_* tables, but these tables change too often and my code is outdated now. Using info from the information_schema I plan on generating mapping data which will allow table and object names to differ as well as column and attribute names to differ. Implementing a mapping system doesn't mean that you can't use defaults and it also protects against future changes on either side of the map (you don't change code, just the mapping). Mapping also allow you to define multiple views of the same base table. |
From: Kevin K. <kev...@gm...> - 2010-05-19 00:38:48
|
Tom Jackson wrote: > SQL has an information_schema, a set of views which describe the user > databases. Nice things about the information_schema is that all the > values are strings and the names of columns are not as cryptic as say > the pg_* tables. Indeed it does. But the vendors are quite lax in actually implementing it consistently. (As they are about a lot of the SQL standard, actually!) My plan is to have the tdbc::connection base class provide versions of [$db primarykeys] and [$db foreignkeys] that query the INFORMATION_SCHEMA.TABLE_CONSTRAINTS and INFORMATION_SCHEMA.REFERENTIAL_CONSTRAINTS views, respectively. Individual database drivers must have the ability to override these, since as I mentioned above, database are pretty haphazard about the completeness of their implementations of INFORMATION_SCHEMA. While I'm on it, I might also provide default implementations of [db tables] and [db columns] that get the information from the appropriate INFORMATION_SCHEMA views. Alas, the ISO committee left out an INFORMATION_SCHEMA.STATISTICS view. (MySQL has one, but it's MySQL-specific.) For this reason, even though SQL/CLI has SQLStatistics, there isn't a comparable "portable" way to ask the database for index column usage at the SQL level. So the current plan is to do nothing in the base class and require that drivers implement [db indices]. Drivers facing databases with incomplete or nonconformant implementations of INFORMATION_SCHEMA will of course have to override the base class methods with ones that get the information another way. Of course, the tdbc::odbc driver will implement all three methods by making calls to the SQL/CLI SQLGetPrimaryKeys, SQLGetForeignKeys and SQLGetStatistics functions, respectively. In other words, tdbc::odbc will punt the problem a further level down into the vendor-supplied ODBC driver. I've already looked at the vendor docs, and it's at least possible to provide all the proposed functionality in the MySQL, ODBC, Postgres and SQLite3 drivers (and the unreleased Oracle driver as well). The fact that the functionality has all been in ODBC since version 1.0 also bodes well for its being available on other databases. -- 73 de ke9tv/2, Kevin |
From: Kevin K. <ke...@ac...> - 2010-05-18 03:09:58
|
Larry McVoy wrote: > I am _so_ not an expert in all this stuff. > > The only question I have is has anyone looked at what perl/python/etc > do in this area? Yes. -- 73 de ke9tv/2, Kevin |
From: Larry M. <lm...@bi...> - 2010-05-18 03:11:24
|
On Mon, May 17, 2010 at 11:09:50PM -0400, Kevin Kenny wrote: > Larry McVoy wrote: >> I am _so_ not an expert in all this stuff. >> >> The only question I have is has anyone looked at what perl/python/etc >> do in this area? > > Yes. And? What was learned? -- --- Larry McVoy lm at bitmover.com http://www.bitkeeper.com |
From: Arnulf W. <ar...@wi...> - 2010-05-19 06:38:59
|
Am Dienstag 18 Mai 2010 05:34:42 schrieb Kevin Kenny: > Larry McVoy wrote: > > On Mon, May 17, 2010 at 11:09:50PM -0400, Kevin Kenny wrote: > >> Larry McVoy wrote: > >>> I am _so_ not an expert in all this stuff. > >>> > >>> The only question I have is has anyone looked at what perl/python/etc > >>> do in this area? > >> > >> Yes. > > > > And? What was learned? > > ODBC implements this stuff with the SQLGetPrimaryKeys, > SQLGetForeignKeys, and SQLGetStatistics calls. > Perl's DBI essentially copies what ODBC does directly: it has > 'primary_key_info', 'foreign_key_info' and the (misleadingly named) > 'statistics_info' to query primary keys, foreign keys, and indices > respectively. JDBC also does the same thing -- in a typically > Byzantine fashion. Its 'DatabaseMetaData' interface - which must > be constructed/retrieved by a separate call upon the connection > object - supports 'getPrimaryKeys', 'getImportedKeys', > 'getExportedKeys', 'getCrossReference', and 'getIndexInfo'. > > All three of these define a very similar API to what I propose, with > perhaps slightly more awkward names of things. Perl's documentation > also warns that 'statistics_info' is 'experimental and subject to > change. > > Caveat - All three seem to use the ODBC names for things. > I dislike them - all uppercase, and not always as mnemonic as > they should be, and think them un-Tclish. I'm willing to be convinced > that they are the True Names, if enough people feel strongly > about it. > > The JDBC, ODBC and Perl API's require negotiating a result set > rather than returning lists-of-dicts directly. That interface > seemed rather like overkill to me. > > Python's DB-API has no support for schema introspection. It > appears to be under consideration for dbApi3. > http://wiki.python.org/moin/DbApi3#SchemaInformation > > PHP, predictably enough, is a hodgepodge. I'm not even going to > discuss it. > I vote for making them tcl'ish then and I was thinking about asking you to make the proposed interfaces, so you are talking exactly what I will need for my ATWF project Arnulf (apw) |
From: Kevin K. <ke...@ac...> - 2010-05-18 03:34:53
|
Larry McVoy wrote: > On Mon, May 17, 2010 at 11:09:50PM -0400, Kevin Kenny wrote: >> Larry McVoy wrote: >>> I am _so_ not an expert in all this stuff. >>> >>> The only question I have is has anyone looked at what perl/python/etc >>> do in this area? >> Yes. > > And? What was learned? ODBC implements this stuff with the SQLGetPrimaryKeys, SQLGetForeignKeys, and SQLGetStatistics calls. Perl's DBI essentially copies what ODBC does directly: it has 'primary_key_info', 'foreign_key_info' and the (misleadingly named) 'statistics_info' to query primary keys, foreign keys, and indices respectively. JDBC also does the same thing -- in a typically Byzantine fashion. Its 'DatabaseMetaData' interface - which must be constructed/retrieved by a separate call upon the connection object - supports 'getPrimaryKeys', 'getImportedKeys', 'getExportedKeys', 'getCrossReference', and 'getIndexInfo'. All three of these define a very similar API to what I propose, with perhaps slightly more awkward names of things. Perl's documentation also warns that 'statistics_info' is 'experimental and subject to change. Caveat - All three seem to use the ODBC names for things. I dislike them - all uppercase, and not always as mnemonic as they should be, and think them un-Tclish. I'm willing to be convinced that they are the True Names, if enough people feel strongly about it. The JDBC, ODBC and Perl API's require negotiating a result set rather than returning lists-of-dicts directly. That interface seemed rather like overkill to me. Python's DB-API has no support for schema introspection. It appears to be under consideration for dbApi3. http://wiki.python.org/moin/DbApi3#SchemaInformation PHP, predictably enough, is a hodgepodge. I'm not even going to discuss it. -- 73 de ke9tv/2, Kevin |
From: Donal K. F. <don...@ma...> - 2010-05-18 09:06:05
Attachments:
donal_k_fellows.vcf
|
On 18/05/2010 07:31, Kristoffer Lawson wrote: > Sounds cool. You may want to have a look at 'Storm', which I just > released yesterday, and built on XOTcl. The idea is to have virtually > transparent queryable object storage. Ie. object loading and storing > is done automatically, as needed. While the version I put up is very > early, and does not do a lot of things optimally, it still might be > useful to just take a look at. Oh and it also does not even try to > look like an RDBMS (the object-relational mismatch is often a source > for headaches, especially with small projects). So not really what you > are trying to do, but it does work for what it is. That's taking the opposite approach to me (ORM is my baby). Storm seems to be an object-first model, whereas ORM is a database-first model; I want to be able to just point it at a database connection and have it generate all the classes and methods for me. (Neither is superior to the other; they're different problem-spaces that both happen to involve coupling objects and databases.) I don't claim that my code works yet. Need to get a build of 8.6 with DB drivers (which isn't my normal deployment) before I can test... Donal. |
From: Kristoffer L. <se...@fi...> - 2010-05-18 09:32:44
|
On 18 May 2010, at 11:42, Donal K. Fellows wrote: > On 18/05/2010 07:31, Kristoffer Lawson wrote: >> Sounds cool. You may want to have a look at 'Storm', which I just >> released yesterday, and built on XOTcl. The idea is to have virtually > > That's taking the opposite approach to me (ORM is my baby). Storm > seems > to be an object-first model, whereas ORM is a database-first model; I > want to be able to just point it at a database connection and have it > generate all the classes and methods for me. (Neither is superior to > the > other; they're different problem-spaces that both happen to involve > coupling objects and databases.) Yes, absolutely. I wasn't offering it as any kind of 'replacement' to what you are doing. Just as a heads up that I've been thinking about some stuff — albeit at the other end of the pipe (and with XOTcl versus TclOO). -- Kristoffer Lawson, Co-Founder, Scred // http://www.scred.com/ |
From: Kristoffer L. <se...@fi...> - 2010-05-18 06:31:55
|
On 17 May 2010, at 18:38, Kenny, Kevin B (GE, Research) wrote: > Donal Fellows has been experimenting with an extremely > simple object-relational mapping engine for TclOO and TDBC, which he's > begun to discuss at http://wiki.tcl.tk/26254. In a conversation on the > Tcl'ers Chat, he and I have discussed characteristics of TDBC's > introspection of the database - or rather, its lack thereof - that get > in the way. The purpose of this message is to open up discussion of > how best to add the needed functionality to TDBC. Sounds cool. You may want to have a look at 'Storm', which I just released yesterday, and built on XOTcl. The idea is to have virtually transparent queryable object storage. Ie. object loading and storing is done automatically, as needed. While the version I put up is very early, and does not do a lot of things optimally, it still might be useful to just take a look at. Oh and it also does not even try to look like an RDBMS (the object-relational mismatch is often a source for headaches, especially with small projects). So not really what you are trying to do, but it does work for what it is. http://github.com/Setok/Storm -- Kristoffer Lawson, Co-Founder, Scred // http://www.scred.com/ |
From: Donal K. F. <don...@ma...> - 2010-05-19 08:08:32
Attachments:
donal_k_fellows.vcf
|
On 19/05/2010 07:38, Arnulf Wiedemann wrote: > I vote for making them tcl'ish then and I was thinking about asking you to > make the proposed interfaces, so you are talking exactly what I will need for > my ATWF project I also prefer making them Tcl-ish. No need to import other people's ugliness when we know we don't have to! Donal. |
From: Steve L. <st...@di...> - 2010-05-19 08:26:11
|
On 19/05/2010, at 4:08 PM, Donal K. Fellows wrote: > On 19/05/2010 07:38, Arnulf Wiedemann wrote: >> I vote for making them tcl'ish then and I was thinking about asking you to >> make the proposed interfaces, so you are talking exactly what I will need for >> my ATWF project > > I also prefer making them Tcl-ish. No need to import other people's > ugliness when we know we don't have to! The documentation could refer to the ugly equivalents for those coming from ODBC et al Steve |