Work at SourceForge, help us to make it a better place! We have an immediate need for a Support Technician in our San Francisco or Denver office.

Close

#89 Cursor returns array instead of string for blob column type

MySQLdb-1.1
closed
Andy Dustman
MySQLdb (285)
1
2012-09-19
2004-06-19
No

The standard cursor class ('Cursor') used to return
MySQL blob row types as Python string values. In
version 1.0 it returns arrays containing a single item
with the string 'c' as key and the blob string as the
value.

So, where you used to get:

'A value'

you now get:

array('c', 'Another first'))

when selecting values from blobs.

Is there any reason for this, or should this be
considered a bug?

Discussion

  • Logged In: YES
    user_id=1066725

    According to the DB-API specs, Python array objects are the
    preferred representation for Blobs. This appears to have
    changed a couple of times for MySQLdb and the latest change
    seems to be missing from the CHANGELOG.

    But it's probably not a bug.

     
  • Andy Dustman
    Andy Dustman
    2004-06-22

    Logged In: YES
    user_id=71372

    It was pointed out that returning BLOBs as strings didn't
    match the spec; originally BLOBs were returned as strings.
    It took a whlie to figure out how to fix, too, because
    columns like TEXT are also internally a BLOB type, whereas
    real BLOBs have the BINARY flag set, and previously there
    was no conversion mechanism that could pick up on the BINARY
    flag.

    There are a couple options for you:

    • If your BLOB is really containing text, change it to a
      TEXT column.

    • Change the default conversion dictionary, removing the
      FIELD.BLOB entry (default is to return as string)

    I'm leaving this as an Open bug for awhile so it'll be visible.

     
  • Logged In: YES
    user_id=1066725

    Another option is converting in your python code using the
    'tostring()' method of the array object. To make older
    applications work with both the old and the newer versions
    of the db api, you could do something like:

    try:
    field = field.tostring()
    except AttributeError:
    pass

     
  • Logged In: NO

    I've been having a problem with the following code:

    CONCAT(
    IFNULL(
    claim_begin,
    'No Date Entered'
    ),
    ' - ',
    co_name
    ) AS list_string''',

    The type of claim_begin is date, co_name is a VARCHAR(255).
    Which used to return a string value in earlier versions of
    MySQLdb, I read this thread and it convinced me that the
    return type of CONCAT might be a BLOB, in this specific
    case, however I checked the MySQL docs, and they claim that
    the return type should be a String. If I take the
    claim_begin column out the return type that results is a
    string an works properly.

    I'm posting to here because I have no idea where else to
    post, or how to look into the matter further to find out if
    MySQL is indeed converting this to a blob, or if it's
    MySQLdb's fault, any ideas as to what I could do?

     
  • Logged In: NO

    Woops, I should have included the MySQL version in my last
    post: 4.0.20.

     
  • Geert Jansen
    Geert Jansen
    2004-12-31

    Logged In: YES
    user_id=537938

    The DB-API PEP:

    http://www.python.org/peps/pep-0249.html

    implies that the preferred way of storing BLOB columns is
    buffer' and notarray.array'. To quote:

    """The preferred object type for Binary objects are the
    buffer types available in standard Python starting with
    version 1.5.2. Please see the Python documentation for
    details. For information about the the C interface have a
    look at Include/bufferobject.h and Objects/bufferobject.c in
    the Python source distribution."""

    It tells that the preferred way is any buffer object (including
    array.array'), but then refers to the "bare" buffer objectbuffer'.

    The advantages of using buffer over array are:

    • str(buffer) works and gives you a string with the buffer
      contents. str(array) does not give you this.
    • file.write(buffer) works, file.write(array) doesn't.

    The main difference between arrays and buffers seems to be
    that arrays can contain other elements that single bytes.
    This functionality is not required however for BLOB columns.

    Arrays are difficult to work with. If you want to do any I/O on
    them, you basically have to use .tostring() go go back to a
    string again.

    Considering the above, I think MySQLdb should be changed to
    return buffer objects for BLOB columns.

     
  • Andy Dustman
    Andy Dustman
    2004-12-31

    Logged In: YES
    user_id=71372

    You are correct on the contents of PEP-249. Note that since
    PEP-249 was written (07 Apr 1999), there was talk (in 2002)
    of silently deprecating buffer.

    http://mail.python.org/pipermail/python-dev/2002-June/026018.html
    http://mail.python.org/pipermail/python-dev/2000-October/009974.html

    Despite this, buffer is still in Python 2.4. However, the
    only documenation for it is this:

    http://docs.python.org/lib/non-essential-built-in-funcs.html#l2h-86

    buffer( object[, offset[, size]])
    The object argument must be an object that supports the
    buffer call interface (such as strings, arrays, and
    buffers). A new buffer object will be created which
    references the object argument. The buffer object will be a
    slice from the beginning of object (or from the specified
    offset). The slice will extend to the end of object (or will
    have a length given by the size argument).

    This is probably why I used array instead.

    Fortunately, it's not something I need to fix. You can
    change the object types returned by MySQLdb by doing simple
    assignment:

    import MySQLdb
    from MySQLdb.constants import FIELD_TYPE
    from MySQLdb.converters import conversions
    del conversions[FIELD_TYPE.BLOB] # now defaults to string
    conversions[FIELD_TYPE.BLOB] = buffer

    Alternately:

    import MySQLdb
    from MySQLdb.constants import FIELD_TYPE
    db=MySQL.connect(...)
    db.converter[FIELD_TYPE.BLOB] = buffer

    (Note that there are BLOB, LONG_BLOB, MEDIUM_BLOB, and
    TINY_BLOB FIELD_TYPE)

    MySQL (at least prior to the prepared statements API)
    returns all values as strings, or rather in C they are *
    char with a separate length (not NULL terminated). _mysql
    converts these first to Python string objects. It then uses
    the conversions dictionary (or rather the connection
    object's copy of it) to find a conversion function; if it
    does not find one, it leaves it as a string.

    buffer retains a reference to the original object. Once the
    original Python string is used to create the buffer, it
    would not be deallocated since the buffer still references
    it. buffer has less capabilities than string in general,
    although you can write to it. This would end up modifying
    the string, which is supposed to be immutable, but since
    nothing else references it, this is probably not a big problem.

    Try some of the above code and let me know how it works. At
    this point, I may be more inclined to revert to strings
    instead of arrays and avoid buffers. I'll also see what the
    consensus is in the DB-SIG.

     
  • Andy Dustman
    Andy Dustman
    2005-01-13

    Logged In: YES
    user_id=71372

    User feedback indicates that array works well for BLOB, so
    I'm not changing it for 1.2. I may revisit this in 1.3-2.0,
    since the MySQL-4.1 API has a mechanism for getting and
    putting BLOBs piecemeal.

     
  • Geert Jansen
    Geert Jansen
    2005-01-15

    Logged In: YES
    user_id=537938

    Andy,

    apologies for not replying sooner. I've used the conversions
    dictionary similar as you describe and it works as expected.

    How are the users that indicate array is working well using it?
    I think they are primarily converting it to a string with
    the .tostring() method. Also, did you get any consensus from
    the DB-SIG?

    Thanks for doing a great job with MySQLdb.

     
  • Andy Dustman
    Andy Dustman
    2005-01-16

    Logged In: YES
    user_id=71372

    I didn't get any specific details how how array was being
    used, but I expect if .tostring() was being used, they woudl
    have suggested using strings.

    I surveyed a couple of other DB API modules: mx.ODBC uses
    strings, and psycopg (PostgreSQL) uses buffer. Ultimately
    there may be a special BLOB type, but not in 1.2