I'm using the CVS version as of 01-24-2005 and I've noticed that my calls to fetchmany() are now returning a single row.
I'm not submitting this as a bug, since it's really the user's responsibility to set cursor.arraysize. Instead, this post is meant to help people who are seeing slowdowns and such related to fetchmany() calls.
In order to fix it, simply set cursor.arraysize to the number of records you want returned with each call to fetchmany()
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
You're right. Here's why it is why it is: Previously, cursor.arraysize would default to 100, which is a sensible limit, IMHO. However, after doing some DB API conformance testing, it would fail one of the tests because the default arraysize is supposed to be 1. I knew this at the time I wrote the code, but I thought 1 was silly. Even so, it is the standard, so I changed it to be standards-compliant.
Personally, I'm surprised anyone uses fetchmany(). With the standard Cursor call, all rows are internally fetched during the call to cursor.execute() (mysql_store_result); cursor.fetch*() only returns slices of the internal list of rows. With the SSCursor, rows are fetched one by one from the server (mysql_use_result).
Another compatibility change that was made in 1.1.x is that autocommit is disabled by default. You can set the autocommit mode with conn.autocommit(boolean).
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I was using fetchmany() with the (false?) assumption that if I was selecting a dataset larger than could fit in memory, I could retrieve it from the database in manageable chunks. I haven't yet tested it on anything "too big" so I didn't know this wasn't true.
Is there any way to get the desired behavior without re-writing my SQL statements? I mean, I could use LIMIT in the SQL to break up the dataset, but this would require a pretty sizeable re-write for me. Are there any cursors that can return configurable sized chunks of data without reading the entire dataset into main memory?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
It sounds like you want to use the SSCursor, then.
import MySQLdb
from MySQLdb.cursors import SSCursor
then this
db = MySQLdb.connect(..., cursorclass=SSCursor)
c = db.cursor()
or this
c = db.cursor(SSCursor)
Be forewarned that you can only have one cursor open at a time with this class, and you must fetch all the rows; this will happen automatically when you close the cursor. This is a limitation of MySQL.
Yes, you could use LIMIT as well. All this is due to limitations in the design of MySQL: You can either get all the rows at once; or you can get them one at a time, but you are required to fetch the entire result set before you can issue another query.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I didn't want my last message to look like I had not even read your posting, so here's a clarification:
It sounds like SSCursor (server-side cursor?) will handle my problems with large datasets, but are their speed penalties associated with retrieving the dataset one row at a time? Are there any built-in cursors that can return the dataset N rows at a time?
Thanks,
Micah
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
It depends. Your initial results will actually come back faster, but because you have to send a network request for each one, it might take longer overall, depending on your network latency. Additionally, the result set ties up resources on the server until the whole thing is fetched.
OK, I think I kind of understand SSCursor now. I've read a lot of the documentation and such and so have a pretty good handle on mysql_use_result vs mysql_store_result.
However, in all this, I haven't seen a discussion on the impact on server memory. I assume that the MySQL server is smart enough not to run out of memory from a huge query, correct? In my current setup, server and client are on the same machine. So, I want to be sure that by using an SSCursor, I'm not just offloading my memory problems to the server.
Summary: If I send a query to the server asking for more records than the server has memory, is it smart enough to buffer the results and not run out of memory?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I sure hope so. You will probably consume additonal resources on the server, like locks and transaction information, so I think you'll just have to experiment.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I'm using the CVS version as of 01-24-2005 and I've noticed that my calls to fetchmany() are now returning a single row.
I'm not submitting this as a bug, since it's really the user's responsibility to set cursor.arraysize. Instead, this post is meant to help people who are seeing slowdowns and such related to fetchmany() calls.
In order to fix it, simply set cursor.arraysize to the number of records you want returned with each call to fetchmany()
You're right. Here's why it is why it is: Previously, cursor.arraysize would default to 100, which is a sensible limit, IMHO. However, after doing some DB API conformance testing, it would fail one of the tests because the default arraysize is supposed to be 1. I knew this at the time I wrote the code, but I thought 1 was silly. Even so, it is the standard, so I changed it to be standards-compliant.
Personally, I'm surprised anyone uses fetchmany(). With the standard Cursor call, all rows are internally fetched during the call to cursor.execute() (mysql_store_result); cursor.fetch*() only returns slices of the internal list of rows. With the SSCursor, rows are fetched one by one from the server (mysql_use_result).
Another compatibility change that was made in 1.1.x is that autocommit is disabled by default. You can set the autocommit mode with conn.autocommit(boolean).
I was using fetchmany() with the (false?) assumption that if I was selecting a dataset larger than could fit in memory, I could retrieve it from the database in manageable chunks. I haven't yet tested it on anything "too big" so I didn't know this wasn't true.
Is there any way to get the desired behavior without re-writing my SQL statements? I mean, I could use LIMIT in the SQL to break up the dataset, but this would require a pretty sizeable re-write for me. Are there any cursors that can return configurable sized chunks of data without reading the entire dataset into main memory?
It sounds like you want to use the SSCursor, then.
import MySQLdb
from MySQLdb.cursors import SSCursor
then this
db = MySQLdb.connect(..., cursorclass=SSCursor)
c = db.cursor()
or this
c = db.cursor(SSCursor)
Be forewarned that you can only have one cursor open at a time with this class, and you must fetch all the rows; this will happen automatically when you close the cursor. This is a limitation of MySQL.
Read this, too:
>>> help("MySQLdb.cursors")
>>> help("MySQLdb.cursors.CursorUseResultMixIn") # esp.
Yes, you could use LIMIT as well. All this is due to limitations in the design of MySQL: You can either get all the rows at once; or you can get them one at a time, but you are required to fetch the entire result set before you can issue another query.
I didn't want my last message to look like I had not even read your posting, so here's a clarification:
It sounds like SSCursor (server-side cursor?) will handle my problems with large datasets, but are their speed penalties associated with retrieving the dataset one row at a time? Are there any built-in cursors that can return the dataset N rows at a time?
Thanks,
Micah
It depends. Your initial results will actually come back faster, but because you have to send a network request for each one, it might take longer overall, depending on your network latency. Additionally, the result set ties up resources on the server until the whole thing is fetched.
For additional background:
http://dev.mysql.com/doc/mysql/en/mysql-store-result.html # used by standard Cursor
http://dev.mysql.com/doc/mysql/en/mysql-use-result.html # used by SSCursor
OK, I think I kind of understand SSCursor now. I've read a lot of the documentation and such and so have a pretty good handle on mysql_use_result vs mysql_store_result.
However, in all this, I haven't seen a discussion on the impact on server memory. I assume that the MySQL server is smart enough not to run out of memory from a huge query, correct? In my current setup, server and client are on the same machine. So, I want to be sure that by using an SSCursor, I'm not just offloading my memory problems to the server.
Summary: If I send a query to the server asking for more records than the server has memory, is it smart enough to buffer the results and not run out of memory?
I sure hope so. You will probably consume additonal resources on the server, like locks and transaction information, so I think you'll just have to experiment.