Work at SourceForge, help us to make it a better place! We have an immediate need for a Support Technician in our San Francisco or Denver office.

Close

#251 MySQLdb can't detect primary keys: MySQLdb.ROWID is empty

open
Andy Dustman
MySQLdb (285)
5
2012-09-19
2008-04-02
Christopher Lee
No

This bug causes MySQLdb to fail to detect any primary key by the test prescribed by the DB API 2.0, namely to test cursor.description[icol][1]==MySQLdb.ROWID. On all the platforms I've checked so far (Mac OS X, linux), MySQLdb.ROWID appears to be empty, which explains why the test always fails!

import MySQLdb
MySQLdb.ROWID
DBAPISet([])
By contrast, other datatypes have a defined set of possible values:
MySQLdb.STRING
DBAPISet([253, 254, 247])

This bug therefore causes the officially prescribed test for a primary key to always fail. Using MySQLdb will therefore lead to the erroneous indication that all tables in any MySQL database lack a primary key! That is a serious problem, and a big departure from what is prescribed in PEP 249.

I searched the sourceforge tracker for any report with the keyword ROWID and found nothing. I suspect this bug has been ignored because a developer can just use MySQL's "describe foo" SQL command to detect a primary key. However, this is non-portable and breaks the whole purpose of the DB API to provide a portable interface to many types of databases. By failing to implement the PEP 249 prescribed ROWID type test, this bug forces developers to write non-portable code.

It would be easy to add support for the ROWID capability, by using MySQL's "describe foo" command and looking for columns with Key="PRI" value.

MySQLdb version: 1.2.2.final
MySQL version: 5.0.38
Python version: 2.5.2
OS: Mac OS 10.5.2
CPU: Intel penryn

Discussion

  • Andy Dustman
    Andy Dustman
    2008-04-02

    Logged In: YES
    user_id=71372
    Originator: NO

    Your interpretation of PEP-249 is faulty. ROWID describes a particular set of column types used for a unique row ID. This is not the same thing as the primary key. It is more like what PostgreSQL's system-generated OID column, which is not the primary key, and it looks like PostgreSQL-8 does not add this to user tables by default like 7 and earlier did. PEP-249 does not mention primary keys at all. The implementation of DBAPISet is straight out of PEP-249 so your interpretation cannot be correct as it only meant to compare column types and not by column name.

     
  • Logged In: YES
    user_id=1525006
    Originator: YES

    Hi Andy,
    thanks for the quick response! If I'm understanding right, you are saying that the platform-independent DB API 2.0 lacks the concept of a primary key, which is central to the definition of a database, and instead provides ROWID, a non-standard concept that you seem to indicate is some kind of platform-specific feature, maybe from PostgreSQL. Am I understanding you right?

    I'd be grateful for your advice on how to pursue an optimal resolution of this issue. I searched PEP-249 and db-sig for more clues on this. PEP-249 says that to construct a dictionary for a set of rows (which requires determining the primary key for the table) "use the approach of using the column names defined in the cursor attribute .description as basis for the keys in the row dictionary." On the one hand, the inclusion of this statement in the PEP seems to acknowledge that identifying a primary key is a NECESSARY operation, but on the other hand the suggestion of trying to reverse-engineer the information by guessing what the column names mean is technically unsatisfying (e.g. if a table has two columns "foo_id" and "bar_id", which is the primary key?), especially since the correct information is trivially accessible from the database itself.

    In your view, what is the right forum for raising this issue? db-sig?

    RE: "your interpretation cannot be correct as DBAPISet only meant to compare column types and not by column name." Whether a column is a primary key or not is independent of the column name in the SQL standard, so why do you mention column names? Users specify whether a column is a primary key as part of its type description, completely independently of what they choose as the column name. The key point is that the database knows whether a given column is the primary key or not, and any DB API method that ignores that information and tries to determine it some other way (e.g. from the column name) is unsound. DBAPITypeObject, as specified in PEP-249, explicitly allows matching one type against multiple type values returned by the cursor.description attribute. Thus an int column that is a primary key could compare as equal to both the NUMBER type and the PRIMARYKEY type, which seems like a usable solution. Do you have a better solution in mind?

    At a minimum, it seems like the PEP-249 text needs to be corrected. PEP-249 states: "ROWID: This type object is used to describe the "Row ID" column in a database." It does not say anything further, so the sole basis for interpreting it is the words in that sentence. "Row" is clear. "ID" is clear. The definition of a primary key is that every row has an ID. So if PEP-249 wants "Row ID" to mean something different from "primary key", it needs to specify that meaning exactly. Otherwise many more readers will be misled.

     
  • Andy Dustman
    Andy Dustman
    2008-04-03

    Logged In: YES
    user_id=71372
    Originator: NO

    PEP-249 doesn't even specify that the queries passed to cursor.execute() are SQL. That was left deliberately unspecified as I remember. The DB-SIG has expressed some interest in the past on adding interface features for database introspection, but this has not been specified yet, to the best of my knowledge. The only SQL feature I see mentioned in PEP-249 is the equivalency of SQL NULL and Python None.

    When they are talking about making a dictionary for a set of rows, they are talking about a sequence of dictionaries, where the keys in each dictionary are column names. It is not a dictionary of rows. Or to be exactly, it mentions getting a dictionary for a row instead of a tuple. This is easy to do:

    c = db.cursor()
    c.execute(query)
    for row in c: # assumes cursors are iterator extension
    dict_row = dict([ desc[0], value for desc, value in zip(c.description, row) ])

    And MySQLdb does this natively anyway, so it's besides the point, except to show that the primary key is irrelevant.

    "Whether a column is a primary key or not is independent of the column name in the SQL standard, so why do you mention column names?" Because what PEP-249 describes with it's use of DBAPISet to define STRING, BINARY, ROWID, etc. are only a way to compare column types. "Primary key" is not a column type; it's a flag or modifier on a column type. ROWID cannot mean "primary key".

    if people are being misled, they have kept quiet about it, as you are the first person in ten years to have a public issue with it. All the ORMs (Object Relational Models) and frameworks out there (SQLobject, Django, SQLAlchemy, etc.) get along just fine without this. Django does some introspection but only for generating models from existing databases, and in their case, they have small wrappers around the various backends which implement the introspection; it's not used for normal operation at all. In all of the examples above, you are defining a model, and the primary key is part of your model specification, so introspecting is not necessary.

    So what exactly is your use case for this?

     
  • Logged In: YES
    user_id=1525006
    Originator: YES

    Thanks for all the info! A few clarifications:
    - the concept of a primary key is not limited to SQL, so the need for identifying the primary key is not tied to SQL.
    - You said "primary key... is a flag or modifier on a column type". Exactly. So allocate a bit on the type value for representing whether the column is a primary key, and provide a module-level value or method for testing this. This is so easy that I don't see why we wouldn't want the extra information that it provides.

    Example use case: write a python module for producing an ER diagram for any database. Identifying primary keys is trivial with introspection, because the database server knows.

    example use case: write a class that provides a dictionary interface to any database table, whose keys are simply the primary key values in the table, and whose associated values are objects representing the corresponding row.

    • introspection is pythonic, for good reason. Lack of introspection limits the usability of the API to writing code that is hard-wired to a specific table schema. If even slight changes to the database schema are made (e.g. a user changes a primary key name), the Python database code will no longer work. Lack of introspection would rule out portable code for database analysis tools. The fact that the cursor.description returns useful information like column type is an example of introspection, and demonstrates its value.

    • "introspection...'s not used for normal operation at all": when a program that uses introspection runs, it typically only needs to use introspection to get the necessary information once, after which it just uses that information. However, this is no argument that we don't need introspection; it merely shows that introspection is efficient (a constant-time operation).

    • "you are defining a model, and the primary key is part of your model specification, so introspecting is not necessary". You are assuming that the person who defines the database schema, and the person who writes some code using the Python DB API are the same person. But what if I want to write code that other people can use on their databases, in a general way? e.g.

    • the code is supposed to work with any database schema, even ones (gasp) not written expressly to work with this code?
    • database schemas can and should evolve over time. Why should minor changes (like altering the column name of a primary key) cause the Python code to break?
    • the fact that others (e.g. Django, as you mentioned) have had to write a bunch of platform-specific code to obtain this introspection information, illustrates both the need for introspection and the problems that result when there is no platform-independent way to get it.
    • your same argument applies to other cases of introspection, like returning type information for each column. If introspection is truly not necessary, why then does the DB API mandate it? Obviously, the reason is that introspection solves a lot of important needs.

    I'm sorry to bother you with this issue. Is there anywhere you'd suggest that I should raise this issue to see if other people have any interest?