Thread: [Tcl-tdbc] CALL FOR DISCUSSION: TDBC introspection

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

CALL FOR DISCUSSION: TDBC introspection of referential integrity
constraints, and of database indices.

Purpose and scope:
==================

Donal Fellows has been experimenting with an extremely
simple object-relational mapping engine for TclOO and TDBC, which he's
begun to discuss at http://wiki.tcl.tk/26254. In a conversation on the
Tcl'ers Chat, he and I have discussed characteristics of TDBC's
introspection of the database - or rather, its lack thereof - that get
in the way. The purpose of this message is to open up discussion of
how best to add the needed functionality to TDBC.

Donal's need, as I understand it, consists of:

(a) Identifying the primary key of a table; the primary key becomes an
object's identity.

(b) Identifying foreign keys: for a table, identifying what other
tables have foreign key constraints that refer to it. Conversely, for
a table, identifying its foreign keys and the tables to which they
refer.  The columns of a table that are foreign keys can be replaced
with object identities in the object-relational mapping.

(c) Identifying the indices that apply to table columns. This
identification has the side effect of identifying unique key
constraints (which again are candidates for replacement with object
IDs). Even nonunique indices can assist developers of general
query software to suggest appropriate query designs.

Tentative proposal:
===================

This proposal is very much partly-baked and is going out primarily to
get comment. In particular, the names of methods and dictionary keys
are subject to negotiation, as are the parameters of methods.

(1) Determining primary keys.

    Each TDBC connection shall provide a method of the form:

        $connection primarykeys $table

    where $table is the name of a table in the underlying database.
    The return type of the 'primarykeys' call is a list of
    dictionaries. The list will be empty if the table lacks a primary
    key; a singleton is the table has a simple primary key, and of
    length two or more if the table has a compound primary key.
    The list elements shall have at least the keys

        columnName - Name of the column participating in the key

   and may have

        columnSequence - Ordinal position of the column in the key.
        constraintName - Name of the constraint defining the primary
                         key, if known.

   and other columns as defined by the underlying database engine.

For example, if a table has the schema

    CREATE TABLE mytable (
        id1 INT NOT NULL,
        id2 INT NOT NULL,
        moredata VARCHAR(40),
        CONSTRAINT pk_mytable PRIMARY KEY (id1, id2)
    )

then '$connection primarykeys mytable' would be expected to return
a list of two dictionaries:

    columnName id1 columnSequence 1 constraintName pk_mytable
    columnName id2 columnSequence 2 constraintName pk_mytable

(Some databases might omit the 'columnSequence' and 'constraintName'
columns; in any case, the key components are expected to be returned
in left-to-right order.)

(2) Determining foreign keys.

Each TDBC connection shall provide a method of the form:

        $connection foreignkeys \
            ?-primary tableName? ?-foreign tableName?

If neither the '-primary' nor '-foreign' option is supplied, the
results are unspecified.

If '-primary' is supplied, the set of results is restricted to
foreign keys that reference columns of the given table.  If
'-foreign' is supplied, the set or results is specified to
foreign keys that appear in the given table. It is meaningful to
specify both, to restrict results to a (usually unique) foreign
key relationship between a given pair of tables.

The result of the method is a list of dictionaries giving the
columns participating in the foreign key relationship. The
dictionaries will have at least the following keys:

        primaryTable - Table referenced by the foreign key
        primaryColumn - Column referenced by the foreign key
        foreignTable - Table in which the foreign key appears
        foreignColumn - Column name of the foreign key
        keySequence - Ordinal position of the column in the foreign
                      constraint.

It may contain the following keys, if they are meaningful to the
underlying database and specified:

        primaryConstraintName - Name of the constraint as applied to
                                the primary table
        foreignConstraintName - Name of the constraint as applied to
                                the foreign table
        updateAction - Action to take if an UPDATE operation violates
                       the constraint. May be one of
                       'cascade', 'setNull' or 'setDefault', or
                       'omitted (in which case an UPDATE that violates
                       the constraint is an error).
        deleteAction - Action to take if a DELETE operation violates
                       the constraint. May be one of 'cascade',
                       'setNull', or 'setDefault', or omitted 
        defer - Time at which the referential integrity constraint
                is validated. May be one of 'deferred', 'immediate',
                or 'nondeferrable', or omitted.

Example: If a schema contains two tables, 'department' and 'employee',
with the following definitions:

    CREATE TABLE department(
        id INT PRIMARY KEY NOT NULL,
        name VARCHAR(40)
    )
    CREATE TABLE employee(
        id INT PRIMARY KEY NOT NULL,
        departmentId INT,
        surname VARCHAR(40),
        givenname VARCHAR(40),
        CONSTRAINT fk_employee 
            FOREIGN KEY departmentId REFERENCES department(id)
            ON DELETE SET NULL
    )

then any of the three commands

    $db foreignkeys -primary department
    $db foreignkeys -foreign employee
    $db foreignkeys -primary department -foreign employee

will return a list of dicts that includes the value:

    primaryTable department primaryColumn id
        foreignTable employee foreignColumn departmentId
        foreignConstraintName fk_employee keySequence 1
        updateAction setNull

(3) Characterizing indices

Each TDBC connection shall support a method

        $db indices tableName

where tableName is the name of a table in the database.  (The
'connection' base class will also provide a synonym, 'indexes', that
forwards to 'indices'.)

If the given table is found, the method returns a list of dictionaries
describing the table columns that participate in index definitions.
Each dictionary shall include the keys,

        tableName - Name of the table (which should be the same
                    name that was passed to the 'indices' method).
        indexName - Name of the index (suitable for passing to a
                    DROP INDEX statement)
        unique - 1 if duplicate values are forbidden, 0 if duplicates
                 are allowed.
        columnName - Name of a column that participates in the index
        keySequence - Ordinal position of the column in the index
                      key.
        direction - 'asc' or 'desc' specifying the collation order.

The dictionaries may also contain the following keys if they
are meaningful in context:

        type - The type of index (one of 'btree', 'clustered',
              'content', 'hashed', or another implementation-dependent 
              type)
        cardinality - The number of rows in the index if known.
        pages - The number of pages in the index if known.
        filterCondition - If the index is a filtered index, gives the
                          filter condition (e.g., SALARY>100000) as
                          a SQL expression. Omitted if the index is
                          not a filtered index or the filter condition
                          cannot be determined.

As an example, if a table has the following definitions:

    CREATE TABLE "value"(
        id INT NOT NULL PRIMARY KEY,
        entityID INT NOT NULL,
        attributeID INT NOT NULL,
        stringValue VARCHAR(40)
    )
    CREATE UNIQUE INDEX idx_value_entity 
        ON "value"(entityID ASC, attributeID ASC)
    CREATE INDEX idx_value_attribute
        ON "value"(attributeID ASC, entityID ASC)

then the call:

    $db indices value

should return the following dictionaries:

    tableName value indexName idx_value_entity unique 1
        columnName entityID keySequence 1 direction asc
    tableName value indexName idx_value_entity unique 1
        columnName attributeID keySequence 2 direction asc
    tableName value indexName idx_value_attribute unique 0
        columnName attributeID keySequence 1 direction asc
    tableName value indexName idx_value_attribute unique 0
        columnName entityID keySequence 2 direction asc

A few notes:
============

If any names in the returned data resolve in a catalog or schema other
than the default for the connection, they should be qualified in the
returned data: 'catalogName.schemaName.objectName', or whatever is
appropriate to the database in question.

An open question for discussion is whether the returned data should be
more deeply nested; specifically, should the components of a compound
key be grouped, or should the caller perform the grouping?  

OK, time to fire potshots at the idea.

-- 
73 de ke9tv/2, Kevin

Thread: [Tcl-tdbc] CALL FOR DISCUSSION: TDBC introspection

The Tool Command Language implementation

tcl-tdbc