I am using SSCursor to iterate over the results of a group by query on a very
large table (250M + rows). The query itself works fine from the mysql client,
but it get stuck when I am fetch the results inside a loop in Python.
This is the query:
select
datediff(max(r.rev_timestamp),min(r.rev_timestamp)) as active_days,
count(r.rev_timestamp) as rev_count
from revision r left join user_groups u
on r.rev_user = u.ug_user
where (u.ug_user is null or u.ug_group <> 'bot') and r.rev_user > 0
group by rev_user;
and this is the fetch loop:
conn = connect(**settings)
cursor = conn.cursor(SSCursor)
result = deque()
qstart = time()
try:
cursor.execute(query)
for i, row in enumerate(iter(cursor.fetchone, None)):
days, count = row
if days > 0:
result.append((float(count)/days, count))
finally:
cursor.close()
I tried from the console and it literally hangs after spitting the first few
rows. Interestingly, both the mysql and python go into sleep. I am on MySQLdb
1.2.3 and Python 2.6.5.
Any idea why it behaves like this?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Anonymous
-
2011-05-09
Using iter() around cursor.fetchone fetches all the rows. Try something a
little simpler:
for row in cursor:
Cursors are iterators
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
thanks for your reply. I followed your advice but it doesn't seem to fix the
problem. I added a print statement inside the loop and it shows that the loop
gets stuck after processing exactly 60'820 rows. Same result (after flush
query cache) with iter(cursor.fetchone.None). In both cases, if I run
mysqladmin processList it shows the mysql thread sleeping.
This is very strange.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
It is set to 10 MB. It is not exactly clear to me how to set in on the client
side for mysql-python. For the mysql client I can do
mysql --max-allowed-packet=1GB -u ....
then if I run the query it works fine. Since the result is now cached, the
python script works too
However, if I run from within Python the original query with prepended to it
the command:
set max_allowed_packet=1073741824;
then -- scarily -- the script executes, mysql sends data, but there is no
output to the console, even though there is a print statement in the loop! Am
I doing something wronge here?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I am using SSCursor to iterate over the results of a group by query on a very
large table (250M + rows). The query itself works fine from the mysql client,
but it get stuck when I am fetch the results inside a loop in Python.
This is the query:
and this is the fetch loop:
I tried from the console and it literally hangs after spitting the first few
rows. Interestingly, both the mysql and python go into sleep. I am on MySQLdb
1.2.3 and Python 2.6.5.
Any idea why it behaves like this?
Using iter() around cursor.fetchone fetches all the rows. Try something a
little simpler:
for row in cursor:
Cursors are iterators
Hi,
thanks for your reply. I followed your advice but it doesn't seem to fix the
problem. I added a print statement inside the loop and it shows that the loop
gets stuck after processing exactly 60'820 rows. Same result (after flush
query cache) with iter(cursor.fetchone.None). In both cases, if I run
mysqladmin processList it shows the mysql thread sleeping.
This is very strange.
Have you set this: http://dev.mysql.com/doc/refman/5.5/en/server-system-varia
bles.html#sysvar_max_allowed_packet
It is set to 10 MB. It is not exactly clear to me how to set in on the client
side for mysql-python. For the mysql client I can do
mysql --max-allowed-packet=1GB -u ....
then if I run the query it works fine. Since the result is now cached, the
python script works too
However, if I run from within Python the original query with prepended to it
the command:
set max_allowed_packet=1073741824;
then -- scarily -- the script executes, mysql sends data, but there is no
output to the console, even though there is a print statement in the loop! Am
I doing something wronge here?
Sorry, that was 1MB, not 10MB
Just found I have to issue
set global max_allowed_packet = ...
and not just
set max_allowed_packet
to affect this setting. But I don't have SUPER privileges on this DB; will ask
the sysadmin. Thanks for the assistance.