Menu

Iterating over SSCursor hangs

Help
2011-05-09
2012-09-19
  • Giovanni Luca Ciampaglia

    I am using SSCursor to iterate over the results of a group by query on a very
    large table (250M + rows). The query itself works fine from the mysql client,
    but it get stuck when I am fetch the results inside a loop in Python.

    This is the query:

    select 
        datediff(max(r.rev_timestamp),min(r.rev_timestamp)) as active_days,
        count(r.rev_timestamp) as rev_count
    from revision r left join user_groups u
    on r.rev_user = u.ug_user
    where (u.ug_user is null or u.ug_group <> 'bot') and r.rev_user > 0 
    group by rev_user;
    

    and this is the fetch loop:

        conn = connect(**settings)
        cursor = conn.cursor(SSCursor)
        result = deque()
        qstart = time()
        try:
            cursor.execute(query)
            for i, row in enumerate(iter(cursor.fetchone, None)):
                days, count = row
                if days > 0:
                    result.append((float(count)/days, count))
        finally:
            cursor.close()
    

    I tried from the console and it literally hangs after spitting the first few
    rows. Interestingly, both the mysql and python go into sleep. I am on MySQLdb
    1.2.3 and Python 2.6.5.

    Any idea why it behaves like this?

     
  • Anonymous

    Anonymous - 2011-05-09

    Using iter() around cursor.fetchone fetches all the rows. Try something a
    little simpler:

    for row in cursor:

    Cursors are iterators

     
  • Giovanni Luca Ciampaglia

    Hi,

    thanks for your reply. I followed your advice but it doesn't seem to fix the
    problem. I added a print statement inside the loop and it shows that the loop
    gets stuck after processing exactly 60'820 rows. Same result (after flush
    query cache) with iter(cursor.fetchone.None). In both cases, if I run
    mysqladmin processList it shows the mysql thread sleeping.

    This is very strange.

     
  • Giovanni Luca Ciampaglia

    It is set to 10 MB. It is not exactly clear to me how to set in on the client
    side for mysql-python. For the mysql client I can do

    mysql --max-allowed-packet=1GB -u ....

    then if I run the query it works fine. Since the result is now cached, the
    python script works too

    However, if I run from within Python the original query with prepended to it
    the command:

    set max_allowed_packet=1073741824;

    then -- scarily -- the script executes, mysql sends data, but there is no
    output to the console, even though there is a print statement in the loop! Am
    I doing something wronge here?

     
  • Giovanni Luca Ciampaglia

    Sorry, that was 1MB, not 10MB

     
  • Giovanni Luca Ciampaglia

    Just found I have to issue

    set global max_allowed_packet = ...

    and not just

    set max_allowed_packet

    to affect this setting. But I don't have SUPER privileges on this DB; will ask
    the sysadmin. Thanks for the assistance.

     

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.