Thread: [SQLObject] sqlobject test suite weirdness
SQLObject is a Python ORM.
Brought to you by:
ianbicking,
phd
From: Victor Ng <cra...@gm...> - 2005-10-11 15:20:33
|
Hi, I've branched and modified the SQLObject trunk to use the SQLite memory tables by default, this seems to have cut down the runtime for the testsuite on my laptop from 600 seconds down to 23 seconds. There's a couple oddities I couldn't quite figure out: Unicode tests: The data in one of the tests uses 0xf0, this doesn't seem to decode properly using UTF8 under Python 2.4.1 on OSX. Not sure if this is specific to Python 2.4.1 as I've seen some errors on c.l.p regarding unicode. You can reproduce the problem using the following command: unicode("\xf0", 'utf8') I'm also getting some odd behavior with the test_blob module where bytes aren't coming back as expected. py.test also seems to barf up on error which I can't seem to figure out, but I'm not really used to py.test quite yet. My brain is still in unittest/doctest mode. :) I've noticed a bunch of locking in SQLObject. Specifically the use of _SO_writeLock. Do we really need this? I'd like to see thread safety pushed _out_ of SQLObject and make it the responsibility of the caller. Any thoughts? vic -- "Never attribute to malice that which can be adequately explained by stupidity." - Hanlon's Razor |
From: Robin M. <rob...@gm...> - 2005-10-11 17:21:53
|
On 10/11/05, Victor Ng <cra...@gm...> wrote: > Hi, > > I've branched and modified the SQLObject trunk to use the SQLite > memory tables by default, this seems to have cut down the runtime for > the testsuite on my laptop from 600 seconds down to 23 seconds. With that dramatic an improvement, it seems likely that you're running into the OS X F_FULLFSYNC issue. SQLite 3 defaults to "PRAGMA synchronous=3DFULL" (see http://sqlite.org/pragma.html for details), which on OS X means it calls the F_FULLFSYNC ioctl. That does a *complete* flush all the way to the disk platters, including flushing the disk's own internal cache. Good for reliability, less good for speed. If you need speed more than you need reliability (say you're doing unit testing but your database can't just live in memory), then you may want to set "PRAGMA synchronous=3DNORMAL" before you start your testing. http://www.sqlite.org/changes.html says that F_FULLFSYNC is disabled if the synchronous pragma is anything other than "full". I'd be curious to know whether you see a dramatic increase in speed from doing "PRAGMA synchronous=3DNORMAL" on OS X before running the test suite. As for the rest of your email, I don't feel qualified to comment on it, except for the following: > I've noticed a bunch of locking in SQLObject. Specifically the use of > _SO_writeLock. Do we really need this? I'd like to see thread > safety pushed _out_ of SQLObject and make it the responsibility of the > caller. I almost never use threads, since they're so hard to get right. But I'd still like to see that locking stay, precisely *because* threads are so hard to get right. Making thread safety the responsibility of the caller is begging for thread-unsafe code, IMHO. What's your rationale for taking those out? Do you think it would produce better performance in the non-threaded case? -- Robin Munn Rob...@gm... GPG key 0xD6497014 |
From: Victor Ng <cra...@gm...> - 2005-10-12 23:12:44
|
I'm apparently stupid today and didn't properly read your email, nor the SQLite link until 10 minutes ago.... I've got the PRAGMA synchronous=3Dnormal in the sqliteconnection now, and I'm not getting significantly different results in my testsuite run times. For reference, I made the following patch: Index: sqlobject/sqlite/sqliteconnection.py =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D --- sqlobject/sqlite/sqliteconnection.py (revision 1106) +++ sqlobject/sqlite/sqliteconnection.py (working copy) @@ -45,6 +45,8 @@ # use only one connection for sqlite - supports multiple) # cursors per connection self._conn =3D sqlite.connect(self.filename, **opts) + cur =3D self._conn.cursor() + cur.execute("PRAGMA synchronous =3D NORMAL") DBAPI.__init__(self, **kw) def connectionFromURI(cls, uri): That cut runtime down by ~50% to : tests finished: 135 passed, 5 failed in 346.24 seconds Which is still better than 625 seconds normally, but it's far from the 20 seconds that I get with in memory tables. > I'd be curious to know whether you see a dramatic increase in speed > from doing "PRAGMA synchronous=3DNORMAL" on OS X before running the test > suite. |
From: Victor Ng <cra...@gm...> - 2005-10-13 01:05:51
|
I'm not too keen on changing my SQLite installation - I'm using the stock sqlite3 binary that comes on OSX 10.4, and I'd like to keep my machine as plain as possible. Regardless, I think switching to in-memory tables is better for testing anyway. You're not going to beat heap tables for speed, and I'd like to see unit tests run as quickly as possible. > I almost never use threads, since they're so hard to get right. But > I'd still like to see that locking stay, precisely *because* threads > are so hard to get right. Making thread safety the responsibility of > the caller is begging for thread-unsafe code, IMHO. > > What's your rationale for taking those out? Do you think it would > produce better performance in the non-threaded case? At our workplace, we're using a forked version of SQLObject 0.5.4, and I've actually gutted most of the code to get better performance. =20 We're handling lots of data over at my work - roughly 5 million records spread across 200 odd tables. The application is an ERP system and we push SQLObject pretty hard so we've had the benefit of seeing where it falls down and where it wins. We gutted the locking and we're quite a bit better off. Locks are expensive, so we just avoid them. If you need locking - we just handle it on our own outside of SQLObject. The biggest changes we've made are with caching - we have 2 modes of caching. Write-safe caching and read-only caching. The read-only cache assumes that _get_attr methods are idempotent and turns sqlobjects into really thin wrappers around dictionaries for a big speed improvement. We're using the hotshot profiler to measure all of our tuneups and we've cut down the runtime by at least 2 orders of magnitude by reducing disk i/o and by reducing the function call overhead. I'll try to merge more changes in over the next little while, but I'm still getting used to the difference in the codebases. Can someone explain why there are always SOTypeCol and TypeCol? It seems like extraneous syntax to me and I'm not sure what the benefit is of having classes split like this. vic -- "Never attribute to malice that which can be adequately explained by stupidity." - Hanlon's Razor |
From: Ian B. <ia...@co...> - 2005-10-11 17:33:27
|
Victor Ng wrote: > Hi, > > I've branched and modified the SQLObject trunk to use the SQLite > memory tables by default, this seems to have cut down the runtime for > the testsuite on my laptop from 600 seconds down to 23 seconds. That sure is slow ;) But sure, :memory: is convenient. > There's a couple oddities I couldn't quite figure out: > > Unicode tests: > > The data in one of the tests uses 0xf0, this doesn't seem to decode > properly using UTF8 under Python 2.4.1 on OSX. Not sure if this is > specific to Python 2.4.1 as I've seen some errors on c.l.p regarding > unicode. > > You can reproduce the problem using the following command: > > unicode("\xf0", 'utf8') OK, I don't entirely understand that. It is a normal ISO-8859-1 character, though. Exactly which test is this? Well, u'\u00f0'.encode('iso-8859-1') == '\xf0'. > I'm also getting some odd behavior with the test_blob module where > bytes aren't coming back as expected. Is FormEncode up to date? There was a bug in the ordering of conversions that caused problems with that. > py.test also seems to barf up on error which I can't seem to figure > out, but I'm not really used to py.test quite yet. My brain is still > in unittest/doctest mode. :) There's a couple times py.test's magic doesn't work right. You can turn it off with an error. One thing that can be difficult is Collector errors, which are essentially errors on import, which happen in SQLObject fairly often (since all the classes are created on import). > I've noticed a bunch of locking in SQLObject. Specifically the use of > _SO_writeLock. Do we really need this? I'd like to see thread > safety pushed _out_ of SQLObject and make it the responsibility of the > caller. I don't want to push it out of SQLObject, though I'll admit _SO_writeLock is fishy. It's subtle, I suppose -- there's different levels of threadsafety, because it all depends on the caller's intentions. I feel fairly confident about the locking in the cache. There's also a question about how much safety wwe care about. For instance, _SO_writeLock seems to keep us from doing duplicate queries in some conditions. But only very few conditions; is the very occassional spurious query a problem? Probably not. But in another case it might be possible to expire an object, and then while the expiration is happening you could refetch the data, and potentially get the object in a weird state where it is half-expired and all messed up. -- Ian Bicking / ia...@co... / http://blog.ianbicking.org |
From: Victor Ng <cra...@gm...> - 2005-10-12 21:18:42
|
On 10/11/05, Ian Bicking <ia...@co...> wrote: > Victor Ng wrote: > > Hi, > > > > I've branched and modified the SQLObject trunk to use the SQLite > > memory tables by default, this seems to have cut down the runtime for > > the testsuite on my laptop from 600 seconds down to 23 seconds. > > That sure is slow ;) But sure, :memory: is convenient. Indeed. :) Faster is better. :) > > There's a couple oddities I couldn't quite figure out: > > > > Unicode tests: > > > > The data in one of the tests uses 0xf0, this doesn't seem to decode > > properly using UTF8 under Python 2.4.1 on OSX. Not sure if this is > > specific to Python 2.4.1 as I've seen some errors on c.l.p regarding > > unicode. > > > > You can reproduce the problem using the following command: > > > > unicode("\xf0", 'utf8') > > OK, I don't entirely understand that. It is a normal ISO-8859-1 > character, though. Exactly which test is this? Well, > u'\u00f0'.encode('iso-8859-1') =3D=3D '\xf0'. I've reverted the code and commited it back to SVN. The test case is sqlobject/tests/test_unicode.py How do i run just 1 test function? I want to do something like: $ py.test sqlobject/tests/test_unicode.py/test_create > > I'm also getting some odd behavior with the test_blob module where > > bytes aren't coming back as expected. > > Is FormEncode up to date? There was a bug in the ordering of > conversions that caused problems with that. I just updated and installed from FormEncode's trunk and I still get this in the test_create test case: E assert prof2.image =3D=3D data > assert <ImageData 1 image=3D"'">.image =3D=3D '\x00\x01\x02\x03\x04= \x05\x06\x07\x08\t\n\x0b\x0c\r\x0e\x0f\x10\x11\x12\x13\x14\x15\x16\x17\x18\= x19\x1a\x1b\x1c\x1d\x1e\x1f !"#$%&\'()*+,-./0123456789:;<=3D>?@ABCDEFGHIJKL= MNOPQRSTUVWXYZ[\\]^_`abcdefghijklmnopqrstuvwxyz{|}~\x7f\x80\x81\x82\x83\x84= \x85\x86\x87\x88\x89\x8a\x8b\x8c\x8d\x8e\x8f\x90\x91\x92\x93\x94\x95\x96\x9= 7\x98\x99\x9a\x9b\x9c\x9d\x9e\x9f\xa0\xa1\xa2\xa3\xa4\xa5\xa6\xa7\xa8\xa9\x= aa\xab\xac\xad\xae\xaf\xb0\xb1\xb2\xb3\xb4\xb5\xb6\xb7\xb8\xb9\xba\xbb\xbc\= xbd\xbe\xbf\xc0\xc1\xc2\xc3\xc4\xc5\xc6\xc7\xc8\xc9\xca\xcb\xcc\xcd\xce\xcf= \xd0\xd1\xd2\xd3\xd4\xd5\xd6\xd7\xd8\xd9\xda\xdb\xdc\xdd\xde\xdf\xe0\xe1\xe= 2\xe3\xe4\xe5\xe6\xe7\xe8\xe9\xea\xeb\xec\xed\xee\xef\xf0\xf1\xf2\xf3\xf4\x= f5\xf6\xf7\xf8\xf9\xfa\xfb\xfc\xfd\xfe\xff' > > py.test also seems to barf up on error which I can't seem to figure > > out, but I'm not really used to py.test quite yet. My brain is still > > in unittest/doctest mode. :) > > There's a couple times py.test's magic doesn't work right. You can turn > it off with an error. One thing that can be difficult is Collector > errors, which are essentially errors on import, which happen in > SQLObject fairly often (since all the classes are created on import). How do I turn off the error? > > I've noticed a bunch of locking in SQLObject. Specifically the use of > > _SO_writeLock. Do we really need this? I'd like to see thread > > safety pushed _out_ of SQLObject and make it the responsibility of the > > caller. > > I don't want to push it out of SQLObject, though I'll admit > _SO_writeLock is fishy. It's subtle, I suppose -- there's different > levels of threadsafety, because it all depends on the caller's > intentions. I feel fairly confident about the locking in the cache. > > There's also a question about how much safety wwe care about. For > instance, _SO_writeLock seems to keep us from doing duplicate queries in > some conditions. But only very few conditions; is the very occassional > spurious query a problem? Probably not. But in another case it might > be possible to expire an object, and then while the expiration is > happening you could refetch the data, and potentially get the object in > a weird state where it is half-expired and all messed up. Ok - I'm willing to buy that answer for now. I'm going to try and get my profiler integrated into my branch so that I can actually measure the cost of all this locking. If the cost is high, then can we at least consider dropping the locks? I've seen enough bad locking code in SQLite and my own code to really hate threads and critical sections. vic |
From: Ian B. <ia...@co...> - 2005-10-12 21:29:40
|
Victor Ng wrote: >>OK, I don't entirely understand that. It is a normal ISO-8859-1 >>character, though. Exactly which test is this? Well, >>u'\u00f0'.encode('iso-8859-1') == '\xf0'. > > > I've reverted the code and commited it back to SVN. The test case is > sqlobject/tests/test_unicode.py > > How do i run just 1 test function? I want to do something like: > > $ py.test sqlobject/tests/test_unicode.py/test_create I think $ py.test -k create sqlobject/tests/test_unicode.py >>>I'm also getting some odd behavior with the test_blob module where >>>bytes aren't coming back as expected. >> >>Is FormEncode up to date? There was a bug in the ordering of >>conversions that caused problems with that. > > > I just updated and installed from FormEncode's trunk and I still get > this in the test_create test case: > > E assert prof2.image == data > >> assert <ImageData 1 image="'">.image == '\x00\x01\x02\x03\x04\x05\x06\x07\x08\t\n\x0b\x0c\r\x0e\x0f\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f !"#$%&\'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghijklmnopqrstuvwxyz{|}~\x7f\x80\x81\x82\x83\x84\x85\x86\x87\x88\x89\x8a\x8b\x8c\x8d\x8e\x8f\x90\x91\x92\x93\x94\x95\x96\x97\x98\x99\x9a\x9b\x9c\x9d\x9e\x9f\xa0\xa1\xa2\xa3\xa4\xa5\xa6\xa7\xa8\xa9\xaa\xab\xac\xad\xae\xaf\xb0\xb1\xb2\xb3\xb4\xb5\xb6\xb7\xb8\xb9\xba\xbb\xbc\xbd\xbe\xbf\xc0\xc1\xc2\xc3\xc4\xc5\xc6\xc7\xc8\xc9\xca\xcb\xcc\xcd\xce\xcf\xd0\xd1\xd2\xd3\xd4\xd5\xd6\xd7\xd8\xd9\xda\xdb\xdc\xdd\xde\xdf\xe0\xe1\xe2\xe3\xe4\xe5\xe6\xe7\xe8\xe9\xea\xeb\xec\xed\xee\xef\xf0\xf1\xf2\xf3\xf4\xf5\xf6\xf7\xf8\xf9\xfa\xfb\xfc\xfd\xfe\xff' I don't have any particular ideas at the moment. Further investigation is required ;) It doesn't seem to be showing what prof2.image really is either. If it is some kind of Binary() object, then it's a case where the validator/converter isn't coercing it back to a string. >>>py.test also seems to barf up on error which I can't seem to figure >>>out, but I'm not really used to py.test quite yet. My brain is still >>>in unittest/doctest mode. :) >> >>There's a couple times py.test's magic doesn't work right. You can turn >>it off with an error. One thing that can be difficult is Collector >>errors, which are essentially errors on import, which happen in >>SQLObject fairly often (since all the classes are created on import). > > > How do I turn off the error? py.test --nomagic --nocapture --fulltrace >>There's also a question about how much safety wwe care about. For >>instance, _SO_writeLock seems to keep us from doing duplicate queries in >>some conditions. But only very few conditions; is the very occassional >>spurious query a problem? Probably not. But in another case it might >>be possible to expire an object, and then while the expiration is >>happening you could refetch the data, and potentially get the object in >>a weird state where it is half-expired and all messed up. > > > Ok - I'm willing to buy that answer for now. I'm going to try and get > my profiler integrated into my branch so that I can actually measure > the cost of all this locking. If the cost is high, then can we at > least consider dropping the locks? If the cost is high we can try to figure out just what is required. Correctness coming first, of course ;) -- Ian Bicking / ia...@co... / http://blog.ianbicking.org |
From: Oleg B. <ph...@ma...> - 2005-10-12 21:31:25
|
On Wed, Oct 12, 2005 at 12:45:34AM -0400, Victor Ng wrote: > How do i run just 1 test function? I want to do something like: > > $ py.test sqlobject/tests/test_unicode.py/test_create $ py.test -D 'sqlite:/:memory:?debug=1' sqlobject/tests/test_unicode.py/test_create.py Oleg. -- Oleg Broytmann http://phd.pp.ru/ ph...@ph... Programmers don't die, they just GOSUB without RETURN. |
From: Oleg B. <ph...@ma...> - 2005-10-12 21:42:03
|
On Wed, Oct 12, 2005 at 12:45:34AM -0400, Victor Ng wrote: > E assert prof2.image == data > > assert <ImageData 1 image="'">.image == '\x00\x01\x02\x03\x04\x05\x06\x07\x08\t\n\x0b\x0c\r\x0e\x0f\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f !"#$%&\'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghijklmnopqrstuvwxyz{|}~\x7f\x80\x81\x82\x83\x84\x85\x86\x87\x88\x89\x8a\x8b\x8c\x8d\x8e\x8f\x90\x91\x92\x93\x94\x95\x96\x97\x98\x99\x9a\x9b\x9c\x9d\x9e\x9f\xa0\xa1\xa2\xa3\xa4\xa5\xa6\xa7\xa8\xa9\xaa\xab\xac\xad\xae\xaf\xb0\xb1\xb2\xb3\xb4\xb5\xb6\xb7\xb8\xb9\xba\xbb\xbc\xbd\xbe\xbf\xc0\xc1\xc2\xc3\xc4\xc5\xc6\xc7\xc8\xc9\xca\xcb\xcc\xcd\xce\xcf\xd0\xd1\xd2\xd3\xd4\xd5\xd6\xd7\xd8\xd9\xda\xdb\xdc\xdd\xde\xdf\xe0\xe1\xe2\xe3\xe4\xe5\xe6\xe7\xe8\xe9\xea\xeb\xec\xed\xee\xef\xf0\xf1\xf2\xf3\xf4\xf5\xf6\xf7\xf8\xf9\xfa\xfb\xfc\xfd\xfe\xff' Can I guess you use PySQLite2? SQLObject+PySQLite2 doesn't support BLOBs with embedded zeros (chr(0)). PySQLite2 lacks encode/decode functionality from SQLite that PySQLite1 provided, and SQLObject does not support prepared queries (in PySQLite2 BLOBS could be INSERTed/UPDATEd using cursor.execute("INSERT ... VALUES (?)", Binary(value)), but SQLObject cannot do it yet; I am working on it, but the road in front of me is long and hard.) Oleg. -- Oleg Broytmann http://phd.pp.ru/ ph...@ph... Programmers don't die, they just GOSUB without RETURN. |