Thread: [SQLObject] PickleCol+SQLite size problem/question
SQLObject is a Python ORM.
Brought to you by:
ianbicking,
phd
From: Bart <sca...@gm...> - 2007-10-26 03:06:23
|
I am creating decent-sized amounts of data (currently strings since they probably picke better, order of magnitude of a few MB repr-wise) and want to pickle them using PickleCol. I am running into the following error when I try to create objects/rows, though: DataError: String or BLOB exceeded size limit It seems like SQLObject defaults to using VARCHAR for BLOBCol and PickleCol, which seems to imply a SQLite 1MB row limit. However, forcing sqlType='BLOB' doesn't help, and I don't get why. Am I right to doomily assume that the current SQLObject version won't handle this with SQLite at all, or is there some magic I still need to sprinkle in? (I've also read that since SQLite doesn't want SQL statements larger than 1MB, nonparametrized BLOB inserts would still fail at that limit - but I assume I'd get a different error if that were the problem) Regards, --Bart Alewijnse |
From: Oleg B. <ph...@ph...> - 2007-10-26 07:49:31
|
Hello! On Fri, Oct 26, 2007 at 05:06:21AM +0200, Bart wrote: > DataError: String or BLOB exceeded size limit What version of SQLite are you using? Could it be you've just bumped into SQLite limits: http://sqlite.org/limits.html ? The limits are actual from the version 3.4.0 and up: http://sqlite.org/changes.html#version_3_4_0 , the actual limits depend on how your installation is compiled. Oleg. -- Oleg Broytmann http://phd.pp.ru/ ph...@ph... Programmers don't die, they just GOSUB without RETURN. |
From: Bart <sca...@gm...> - 2007-10-26 16:38:34
|
On 10/26/07, Oleg Broytmann <ph...@ph...> wrote: > Hello! > > On Fri, Oct 26, 2007 at 05:06:21AM +0200, Bart wrote: > > DataError: String or BLOB exceeded size limit > > What version of SQLite are you using? Could it be you've just bumped > into SQLite limits: http://sqlite.org/limits.html ? The limits are actual > from the version 3.4.0 and up: http://sqlite.org/changes.html#version_3_4_0 , > the actual limits depend on how your installation is compiled. I've got 3.4.1 installed, and while I don't know or don't know how to inspect these values for my specific installation, I doubt that it matters. It seems to me SQLObject doesn't actually use BLOBs at all - on SQLite. That page states the BLOB size is 1GB (default) and isn't a separate size just for show - as long as the library uses the API right (apparently meaning sqlite3_bind_blob and related functions), it can use it regardless of the 1MB (default) SQL statement limit, or row limit (not sure how/whether that one applies, but it's likely besides the point too). My question is another one - since by default SQLObject uses VARCHAR to support BLOBCol and PickleCol on SQLite, it's not likely to use its specifc BLOB API. It looks like pysqlite is up to using it, so the real question seems to be whether SQLObject can be made to use it, because without using it you're likely going to run into the much lower non-BLOB limits. 'course, I could be wrong:) --Bart |
From: Bart <sca...@gm...> - 2007-11-29 14:17:45
|
On Nov 29, 2007 1:24 PM, Oleg Broytmann <ph...@ph...> wrote: > On Thu, Nov 29, 2007 at 05:22:18AM +0100, Bart wrote: > > Argh, I just don't know what data format is supposed to be, unpickling > > keeps choking on the data. > > Can you write a short test script that demonstrate the problem? Well, I don't know what it should do, that's the problem:) When I store an empty dict with sqlobject, then look at the data via sqlite2, I see u'\x02&bn/\x08,' and I don't know what that is. pickle.dumps({},0), pickle.dumps({},1), and pickle.dumps({},2) don't resemble it toomuch. Also, storing u'\x02&bn/\x08,' or '\x02&bn/\x08,' via sqlite.Binary (...so in a buffer...) causes sqlobject to give me the cPickle.UnpicklingError: invalid load key error, so I really don't know what my hack-around-sqlobject code should be doing. Huh, I didn't see that that code you pasted before was SQLObejct (pseudo)code and not a copy of the quoted page. You're saying 70MB non-parametric data work in sqlite right now without BLOBs? That's what *doesn't* work for me. (Is this new in a recent version, or am I not paying enough attention again?) --Bart |
From: Oleg B. <ph...@ph...> - 2007-11-29 14:26:30
|
On Thu, Nov 29, 2007 at 03:17:43PM +0100, Bart wrote: > On Nov 29, 2007 1:24 PM, Oleg Broytmann <ph...@ph...> wrote: > > On Thu, Nov 29, 2007 at 05:22:18AM +0100, Bart wrote: > > > Argh, I just don't know what data format is supposed to be, unpickling > > > keeps choking on the data. > > > > Can you write a short test script that demonstrate the problem? > > Well, I don't know what it should do, that's the problem:) It should put something in the DB and fail while trying to fetch it back. > When I store an empty dict with sqlobject, then look at the data via > sqlite2, I see u'\x02&bn/\x08,' Where have you got the data from? > Huh, I didn't see that that code you pasted before was SQLObejct > (pseudo)code and not a copy of the quoted page. Nothing pseudo there. It was a copy of the exact program I ran. The only thing I skipped was import and connection setup. > You're saying 70MB > non-parametric data work in sqlite right now without BLOBs? Works for me with BLOBCol and PickleCol, no problem. Oleg. -- Oleg Broytmann http://phd.pp.ru/ ph...@ph... Programmers don't die, they just GOSUB without RETURN. |
From: Oleg B. <ph...@ph...> - 2007-10-26 16:53:10
|
On Fri, Oct 26, 2007 at 06:38:26PM +0200, Bart wrote: > My question is another one - since by default SQLObject uses VARCHAR > to support BLOBCol and PickleCol on SQLite, it's not likely to use its > specifc BLOB API. We simply cannot - SQLObject is written in pure Python, and use DB API drivers, PySQLite in this case. > It looks like pysqlite is up to using it, so the > real question seems to be whether SQLObject can be made to use it, Is there any documentation for the API in PySQLite? Oleg. -- Oleg Broytmann http://phd.pp.ru/ ph...@ph... Programmers don't die, they just GOSUB without RETURN. |
From: Bart <sca...@gm...> - 2007-10-27 03:12:37
|
On 10/26/07, Oleg Broytmann <ph...@ph...> wrote: > On Fri, Oct 26, 2007 at 06:38:26PM +0200, Bart wrote: > > My question is another one - since by default SQLObject uses VARCHAR > > to support BLOBCol and PickleCol on SQLite, it's not likely to use its > > specifc BLOB API. > > We simply cannot - SQLObject is written in pure Python, and use DB API > drivers, PySQLite in this case. > > It looks like pysqlite is up to using it, so the > > real question seems to be whether SQLObject can be made to use it, > > Is there any documentation for the API in PySQLite? You don't need to go down to C, it seems pysqlite seems to use it (strictly making it a superset of the DB-API). That is to say, the insertion lines in the following: http://www.initd.org/tracker/pysqlite/wiki/SnippetsBlobs ...works for me when I hand it 500MB of data. It looks like the use of pysqlite2.Binary encases the data in a buffer object, and that pysqlite uses the buffer type to decide to use the blob api (not entirely sure, but the pysqlite usage guide seems to hint at this). But that is also the thing that had and has me confused - there are mentions of pysqlite.Binary in SQLObject's SQLite code, which is why I figured I was just missing something and not using SQLObject correctly, and the other option is that SQLObject has some catch-all code that means it acually doesn't. Or maybe someone never finished the feature. I'm having a hard time digging though the code to make sure, though, since pysqlite and SQLObject are by design fairly automagical with a good bunch of wrapping and aliasing - plus I remember pysqlite's adapters seemed overly complex a system back when I looked at them. (There are also other details, like that sqlite's duck typing seems to extends into the string/blob distinction, so the schema only matters so much -- though it would make sense to use BLOB in the schema for SQLObject to know when to use this specific blobbing) ...but you can probably answer all that a lot better:) Looking forward to where this goes, --Bart |
From: Bart <sca...@gm...> - 2007-11-29 17:11:16
|
On Nov 29, 2007 3:26 PM, Oleg Broytmann <ph...@ph...> wrote: > On Thu, Nov 29, 2007 at 03:17:43PM +0100, Bart wrote: > > On Nov 29, 2007 1:24 PM, Oleg Broytmann <ph...@ph...> wrote: > > > On Thu, Nov 29, 2007 at 05:22:18AM +0100, Bart wrote: > > > > Argh, I just don't know what data format is supposed to be, unpickling > > > > keeps choking on the data. > > > > > > Can you write a short test script that demonstrate the problem? > > > > Well, I don't know what it should do, that's the problem:) > > It should put something in the DB and fail while trying to fetch it > back. I meant that I have no idea how to format the data I should be setting. Given that, it's trivial to give you code that doesn't work :), say... def set_data(id,data): ''' Works around the fact that SQLObject does not handle big queries and, because it uses literal queries, therefore not bug BLOBs. ''' from pysqlite2 import dbapi2 as sqlite con = sqlite.connect(Thing._dbpath) #my own variable. It needed a place. cur = con.cursor() pickledata = pickle.dumps(data,2) cur.execute("update thing set data=? where id=?",(pickledata,id,) ) #cur.execute("update thing set data=? where id=?",(sqlite.Binary(pickledata),id,) ) con.commit() cur.close() con.close() > > When I store an empty dict with sqlobject, then look at the data via > > sqlite2, I see u'\x02&bn/\x08,' > > Where have you got the data from? That's the data that sqlobject stored. That view on it comes from: from pysqlite2 import dbapi2 as sqlite con = sqlite.connect(Thing._dbpath) cur = con.cursor() cur.execute("select id,data from thing ") print 't(): '+`cur.fetchall()` cur.close() con.close() sqlitebrowser shows something similar (that is, it shows don't-know glyphs for the first and last character). > > Huh, I didn't see that that code you pasted before was SQLObejct > > (pseudo)code and not a copy of the quoted page. > > Nothing pseudo there. It was a copy of the exact program I ran. The only > thing I skipped was import and connection setup. Well, you didn't import sqlobject, or connect to any backing database so I it's not exactly runnable code either, nor does it seem to apply to sqlite. ('course, SQLObject is slightly Weird when it comes to state and connections, so maybe you're doing something interesting) > > You're saying 70MB > > non-parametric data work in sqlite right now without BLOBs? > > Works for me with BLOBCol and PickleCol, no problem. But sqlite has a 1MB (default) query size limit which causes the "String or BLOB exceeded size limit" DataError that bugs me so much, so unless you have a non-standard sqlite (well.. I don't know how different distros do this) or tweaked headers when compiling sqlite, I don't see how this could *not* fail. Actually, I suppose the quickest short-term (but not so portable) fix for me is to recompile sqlite. Thanks anyway, --Bart |
From: Oleg B. <ph...@ph...> - 2007-11-29 17:57:10
|
On Thu, Nov 29, 2007 at 06:11:11PM +0100, Bart wrote: > I meant that I have no idea how to format the data I should be setting. PickleCol stores pickle strings, and expects a pickle string back from the DB. But PickleCol inherits BLOBCol, and hence with SQLite uses base64 to encode/decode pickles. > Given that, it's trivial to give you code that doesn't work :), say... > > def set_data(id,data): I rewrote your code this way: #! /usr/bin/env python import base64, pickle from pysqlite2 import dbapi2 as sqlite data = {"test1": 1, "test2": "2"} con = sqlite.connect("test.sqdb") #my own variable. It needed a place. cur = con.cursor() cur.execute("CREATE TABLE test (id INTEGER PRIMARY KEY, test BLOB)") pickledata = pickle.dumps(data,2) cur.execute("INSERT INTO test (test) VALUES (?)",(base64.encodestring(pickledata),)) con.commit() cur.close() con.close() And get the data back in SQLObject: from sqlobject import * from sqlobject.sqlbuilder import * __connection__ = "sqlite:///home/phd/work/SQLObject/test-SQLObject/test.sqdb?debug=1" class Test(SQLObject): test = PickleCol() for row in Test.select(): print row.test It prints: 1/Select : SELECT test.id, test.test FROM test WHERE 1 = 1 1/QueryR : SELECT test.id, test.test FROM test WHERE 1 = 1 1/COMMIT : auto {'test1': 1, 'test2': '2'} I.e., the data is unpickled successfully. > Well, you didn't import sqlobject, or connect to any backing database > so I it's not exactly runnable code either, nor does it seem to apply to sqlite. > > ('course, SQLObject is slightly Weird when it comes to state and connections, > so maybe you're doing something interesting) Nothing interesting. The code that I've skipped is: from sqlobject import * from sqlobject.sqlbuilder import * __connection__ = "sqlite:///home/phd/work/SQLObject/test-SQLObject/test.sqdb?debug=1" > > > You're saying 70MB > > > non-parametric data work in sqlite right now without BLOBs? > > > > Works for me with BLOBCol and PickleCol, no problem. > But sqlite has a 1MB (default) query size limit This is the default. It seems Debian changed the default. > Actually, I suppose the quickest short-term (but not so portable) > fix for me is to recompile sqlite. Certainly. In any case SQLObject cannot work around builtin SQLite limits. Oleg. -- Oleg Broytmann http://phd.pp.ru/ ph...@ph... Programmers don't die, they just GOSUB without RETURN. |
From: Oleg B. <ph...@ph...> - 2007-11-28 15:25:41
|
On Sat, Oct 27, 2007 at 05:12:33AM +0200, Bart wrote: > http://www.initd.org/tracker/pysqlite/wiki/SnippetsBlobs > ...works for me when I hand it 500MB of data. 1. The code import sys infile = open(sys.argv[1], "rb") data = infile.read() infile.close() class Images(SQLObject): image = BLOBCol() # or PickleCol, no difference... Images.createTable() image = Images(image=data) Images._connection.cache.clear() image = Images.get(1) outfile = open("test.dat", "wb") outfile.write(image.image) outfile.close() works for me for files up to 100M, but for bigger files OS just kills python process due to MemoryError. But if I have enough memory I don't think there would be any problem with bigger data. At least I don't see a difference between 100M and 500M. 2. The difference between the snippet and the way SQLObject uses Binary is that the snippet uses parameter(s) and SQLObject generates query strings. Well, query strings work for 100M binary files... 3. Are you sure you really want to transfer 500M in one piece to and from a database? Wouldn't it be better to use files and store filenames in the DB? Oleg. -- Oleg Broytmann http://phd.pp.ru/ ph...@ph... Programmers don't die, they just GOSUB without RETURN. |
From: Bart <sca...@gm...> - 2007-11-28 21:53:10
|
On Nov 28, 2007 4:25 PM, Oleg Broytmann <ph...@ph...> wrote: > On Sat, Oct 27, 2007 at 05:12:33AM +0200, Bart wrote: > > http://www.initd.org/tracker/pysqlite/wiki/SnippetsBlobs > > ...works for me when I hand it 500MB of data. > > 1. The code > > import sys > infile = open(sys.argv[1], "rb") > data = infile.read() > infile.close() > > class Images(SQLObject): > image = BLOBCol() # or PickleCol, no difference... > > Images.createTable() > image = Images(image=data) > > Images._connection.cache.clear() > image = Images.get(1) > > outfile = open("test.dat", "wb") > outfile.write(image.image) > outfile.close() > > works for me for files up to 100M, but for bigger files OS just kills > python process due to MemoryError. > But if I have enough memory I don't think there would be any problem > with bigger data. At least I don't see a difference between 100M and 500M. A MemoryError seems to signal the underlying malloc decides you're out of system memory - that's your computer, unrelated to the code or problem. > 2. The difference between the snippet and the way SQLObject uses Binary > is that the snippet uses parameter(s) and SQLObject generates query > strings. Well, query strings work for 100M binary files... You meant *don't* work for...? As I recall, the query size limit is/was as low as 64KB for one of the DB-API interfaces, which would mean that SQLObject-wise, the limiting factor to BLOB-support is the specific interface and backend, and unpredictable. > 3. Are you sure you really want to transfer 500M in one piece to and > from a database? Wouldn't it be better to use files and store filenames in > the DB? 500MB was just a test case to be very sure it was being handled with blobs. My data is actually on the order of 1 to 3MB for the most complex data I've handled (largely picking overhead), and yes, I'm sure. In my app, the data is a write-once, read-often thing, so it's basically just a cleaner variation of the explicitly file-based solution, if slightly slower. The file solution would open me up to a list of potential path access, security, dangling-reference, and cleanup problems I don't want to think about or code around just now. It's more of a hack than a solution when storing pickled data and not *files* files. I figured I could use SQLObject as a quick-and-dirty object persistance system that would also save me from having to write a lot of dull database code, but it sounds a lot like my objects are too big for its BLOB support to handle in the current design, so I'll have to design my way around it, or code for pysqlite2 directly. It's a pity, but I suppose I'll live:) Well, thanks anyway, --Bart |
From: Oleg B. <ph...@ph...> - 2007-11-28 22:09:52
|
On Wed, Nov 28, 2007 at 10:53:05PM +0100, Bart wrote: > > 2. The difference between the snippet and the way SQLObject uses Binary > > is that the snippet uses parameter(s) and SQLObject generates query > > strings. Well, query strings work for 100M binary files... > > You meant *don't* work for...? Work. I slurped a file of 70M, put into the DB, got it back and put into test.dat file, then compared the original file with test.dat - no difference. So SQLObject (at least with SQLite/PySQLite2) *can* handle BLOBs/pickles of that size. Oleg. -- Oleg Broytmann http://phd.pp.ru/ ph...@ph... Programmers don't die, they just GOSUB without RETURN. |
From: Bart <sca...@gm...> - 2007-11-29 03:06:19
|
On Nov 28, 2007 11:09 PM, Oleg Broytmann <ph...@ph...> wrote: > On Wed, Nov 28, 2007 at 10:53:05PM +0100, Bart wrote: > > > 2. The difference between the snippet and the way SQLObject uses Binary > > > is that the snippet uses parameter(s) and SQLObject generates query > > > strings. Well, query strings work for 100M binary files... > > > > You meant *don't* work for...? > > Work. I slurped a file of 70M, put into the DB, got it back and put into > test.dat file, then compared the original file with test.dat - no > difference. So SQLObject (at least with SQLite/PySQLite2) *can* handle > BLOBs/pickles of that size. Ah, you inserted it with something other than SQLObject then fetched it with SQLObject? That'd mean pysqlite2 is doing its job decoding blobs while fetching, which I don't think is too surprising; it's getting the data in there that's the problem. I've been hacking in my sqlobject installation to convert it from literal to parameter style. It seems to work, but there's probably a reason SQLObject uses literal instead of parametric queries, and it's a bit of an overhaul. It just occurred to me that for my project at least, it would probably be cleaner to just add my own wrapping code that just updates the database without SQLObject knowing about it. Always fun to figure out things like that after an hour or two of hacing, particularly when you're doing it in your site packages:) --Bart |
From: Oleg B. <ph...@ph...> - 2007-11-29 12:23:59
|
On Thu, Nov 29, 2007 at 04:05:55AM +0100, Bart wrote: > Ah, you inserted it with something other than SQLObject then fetched > it with SQLObject? No. The program that I have posted is a complete program - it INSERTs a huge BLOB and fetches it back. Works for for both BLOBCol and PickleCol. > I've been hacking in my sqlobject installation to convert it from > literal to parameter style. I had been converting SQLObject to paramstyles queries in this branch: http://svn.colorstudy.com/home/phd/SQLObject/paramstyles/ but stopped working on it for two reasons: -- different DB API drivers have different problems converting some datatypes; some choked on mxDateTime, some on datetime, etc. -- Luke Opperman reworked SQLObject in the trunk unifying SQLObject.select(), sqlbuilder.Select() and dbConnection.queryForSelect(); by itself it is a great job (will be in SQLObject 0.10); but it is incompatible with what I was doing for parameterized queries, so I have to start almost from beginning. > It seems to work, but there's probably a > reason SQLObject uses literal instead of parametric queries There is one - datatypes problem. Ian Bicking started SQLObject long ago, when DB API drivers were even worse than now. > It just occurred to me that for my project at least, it would probably > be cleaner to just add my own wrapping code that just updates the > database without SQLObject knowing about it. Don't forget about cache coherency problem. Oleg. -- Oleg Broytmann http://phd.pp.ru/ ph...@ph... Programmers don't die, they just GOSUB without RETURN. |
From: Bart <sca...@gm...> - 2007-11-29 04:22:21
|
Argh, I just don't know what data format is supposed to be, unpickling keeps choking on the data. I can't make sense of the data that's in there - it's not a straight pickle string, it doesn't decode, nor does anything I encode look horribly like it. On top of that, I get a "cPickle.UnpicklingError: invalid load key" whenever sqlobject tries to unpickle buffer objects. Can you help with this? --Bart |
From: Oleg B. <ph...@ph...> - 2007-11-29 12:24:24
|
On Thu, Nov 29, 2007 at 05:22:18AM +0100, Bart wrote: > Argh, I just don't know what data format is supposed to be, unpickling > keeps choking on the data. Can you write a short test script that demonstrate the problem? Oleg. -- Oleg Broytmann http://phd.pp.ru/ ph...@ph... Programmers don't die, they just GOSUB without RETURN. |