Menu

A question of efficiency

2004-01-02
2004-01-03
  • Jason M. Wisnieski

    So, SQL doesn't appear to be as complex or flexible as I had previously been lead to believe.

    Let's say that I have a 'promotion' table:
    pirate
    crew
    rank
    timestamp

    To display someone's promotion history is easy - you just SELECT * FROM promotion WHERE pirate = ? ORDER BY timestamp ASC/DESC; No problem.

    Let's say that we want to determine what someone's current rank is. It appears that returning the row that contains the max value requires the creation of a temporary table. Is it just me, or is that asinine?

    To find everyone's current rank, you'd have to:
    CREATE TEMPORARY TABLE tmprank ( blah, blah);
    LOCK TABLE promotion READ;
    INSERT INTO tmprank SELECT pirate, MAX(timestamp) FROM promotion GROUP BY pirate;
    SELECT promotion.pirate, promotion.rank FROM promotion, tmprank WHERE promotion.pirate = tmprank.pirate AND promotion.timestamp = tmprank.timestamp;
    UNLOCK TABLES;
    DROP TABLE tmprank;

    So, the question is how much overhead is there in this? Is it worth it to keep this information on a separate table, or should I just break down and store the current rank with the pirate?

    It's always seemed to me that an 'elegant' database has all the required information in only one location. Putting the current rank in the pirate table seems to duplicate that information across two tables.

     
    • Eric Kerin

      Eric Kerin - 2004-01-03

      The technically correct way is to have the information in one place and only one.  But sometimes you break that rule in the name of speed.

      The problem you're running into is caused my MySQL's sub-par implementation of SQL.  In this case it lacks subselects.

      This would be the way I would write it, given subselects:
      SELECT
         promotion.pirate,
         promotion.rank
      FROM
         promotion,
         (SELECT pirate, MAX(timestamp) as timestamp FROM promotion GROUP BY pirate) tmprank
      WHERE
         promotion.pirate = tmprank.pirate
         AND promotion.timestamp = tmprank.timestamp;

      it's the same thing logically, but it's a little nicer to work with.

      Now, there's a few ways to get around it
      1. The temp table method
      2. add a "current" column using a bit type, 1 means that's the current value, 0 means it's an old one.  to update you'd just clear all the current values for that pirate, and then insert the new data, using 1 for current. (this could be a problem without transactions, if the clear is complete, but the new record never gets inserted)
      3. un-normalize the data into the current values, and a history table (this seems to be the more common method, since it's going to be the fastest, but introduces possible inconsistencies if you don't have transactions)  This is actully my favorite method, but I always have to worry about speed in the stuff I code, and I have transactions.

      How much do you thing you'll need to select in this method?

       
    • Jason M. Wisnieski

      Ok, a talk with another friend and some thinkin' on my own convinces me that 3 is the best option of the bunch. It's not much harder to do an update and an insert than an insert alone, so we'll go that route.

      My embedded systems background screams No! Smaller! But it should also be screaming No! Faster!

      I suppose I just need to get a feel for what types of operations are 'expensive' in SQL...

       

Log in to post a comment.