Thread: [SQLObject] Problem with circular references
SQLObject is a Python ORM.
Brought to you by:
ianbicking,
phd
From: Dan P. <da...@ag...> - 2006-05-29 20:27:06
|
Hi, I've started using sqlobject in a project and I found that it leaks memory after an object derived from SQLObject is destroyed and deleted. The code is like: obj = SomeObject(parameters...) obj.destroySelf() del obj <at this point we have a mem leak caused by circular references> A simple infinite loop like this: while True: o = SomeObject(parameters...) o.destroySelf() del o will make the program go from 4 Mb to 50 Mb in a couple of minutes. I've found that the problem lies in 2 circular references: one is the instance attribute of sqlmeta which points back to the object and the other is the soObject attribute of SQLObjectState which also points back to the object. The attached patch fixes the issue (a test script with the above infinite loop stays still at 4Mb no matter for how long I run it). Also attached is a simple script to show the problem. It uses sqlite with in-memory tables. Run it before and after applying the patch to see the differences. -- Dan |
From: Simon C. <hod...@gm...> - 2006-05-29 21:24:37
|
SGkgRGFuLAoKT24gNS8yOS8wNiwgRGFuIFBhc2N1IDxkYW5AYWctcHJvamVjdHMuY29tPiB3cm90 ZToKPiBJJ3ZlIGZvdW5kIHRoYXQgdGhlIHByb2JsZW0gbGllcyBpbiAyIGNpcmN1bGFyIHJlZmVy ZW5jZXM6IG9uZSBpcyB0aGUKPiBpbnN0YW5jZSBhdHRyaWJ1dGUgb2Ygc3FsbWV0YSB3aGljaCBw b2ludHMgYmFjayB0byB0aGUgb2JqZWN0IGFuZAo+IHRoZSBvdGhlciBpcyB0aGUgc29PYmplY3Qg YXR0cmlidXRlIG9mIFNRTE9iamVjdFN0YXRlIHdoaWNoIGFsc28KPiBwb2ludHMgYmFjayB0byB0 aGUgb2JqZWN0LgoKUHl0aG9uJ3Mgb3B0aW9uYWwgZ2FyYmFnZSBjb2xsZWN0b3IgaGFuZGxlcyBj aXJjdWxhciByZWZlcmVuY2VzIGp1c3QKZmluZSAoYXQgbGVhc3QgdGhpcyBpcyB0aGUgY2FzZSBm b3IgQyBQeXRob24pLgoKVGhlIHByb2JsZW0gaXMgdGhhdAoKIG8gPSBTb21lU1FMT2JqZWN0Q2xh c3MoLi4uKQoKYWRkcyBhbiBuZXcgb2JqZWN0IHRvIHRoZSBkYXRhYmFzZSAoaW4geW91ciBjYXNl IHRoZSBpbiBtZW1vcnkKZGF0YWJhc2Ugc28gbWVtb3J5IHVzYWdlIGdyb3dzKS4KClRoYXQgc2Fp ZCwgaWYgeW91IGNhbiByZW1vdmUgdGhlIGNpcmN1bGFyIHJlZmVyZW5jZXMgd2l0aG91dCByZWR1 Y2luZwpmdW5jdGlvbmFsaXR5LCBJJ20gYWxsIGZvciBpdC4gOikKClNjaGlhdm8KU2ltb24K |
From: Dan P. <da...@ag...> - 2006-05-30 14:08:25
|
On Monday 29 May 2006 23:59, Simon Cross wrote: > Hi Dan, > > On 5/29/06, Dan Pascu <da...@ag...> wrote: > > I've found that the problem lies in 2 circular references: one is the > > instance attribute of sqlmeta which points back to the object and > > the other is the soObject attribute of SQLObjectState which also > > points back to the object. > > Python's optional garbage collector handles circular references just > fine (at least this is the case for C Python). That is true, but why should the application waste time collecting garbage on a constant basis when it can be avoided? The garbage collector is there to protect you against common mistakes like these and to avoid memory to be leaked in such a case, but one shouldn't grow careless just because the garbage collector is there and does the job, because it does the job at a cost (extra CPU cycles) > > The problem is that > > o = SomeSQLObjectClass(...) > > adds an new object to the database (in your case the in memory > database so memory usage grows). I think you misread my example. The test script I attached creates a database object and immediately destroys it with o.destroySelf() so the net result should be zero because the object is removed from both the database and the sqlobject cache when it's destroyed. > > That said, if you can remove the circular references without reducing > functionality, I'm all for it. :) > > Schiavo > Simon -- Dan |
From: Simon C. <hod...@gm...> - 2006-05-30 14:41:59
|
Hi Dan, On 5/30/06, Dan Pascu <da...@ag...> wrote: > That is true, but why should the application waste time collecting > garbage on a constant basis when it can be avoided? Agreed. > > The problem is that > > > > o = SomeSQLObjectClass(...) > > > > adds an new object to the database (in your case the in > > memory database so memory usage grows). > > I think you misread my example. The test script I attached > creates a database object and immediately destroys it with > o.destroySelf() so the net result should be zero because the > object is removed from both the database and the sqlobject > cache when it's destroyed. My bad. However, the reason the objects aren't being freed in your example code is that you tell the garbage collector not to with gc.set_debug(gc.DEBUG_LEAK) which also sets DEBUG_SAVEALL. From the gc Python docs: DEBUG_SAVEALL When set, all unreachable objects found will be appended to garbage rather than being freed. This can be useful for debugging a leaking program. DEBUG_LEAK The debugging flags necessary for the collector to print information about a leaking program (equal to DEBUG_COLLECTABLE | DEBUG_UNCOLLECTABLE | DEBUG_INSTANCES | DEBUG_OBJECTS | DEBUG_SAVEALL). If you comment out gc.set_debug(gc.DEBUG_LEAK) the SQLObject objects should get cleaned up. Note that I have personally used SQLObject 0.7.0 to insert millions of objects into a MySQL database and had the garbage collector automatically work for me, which is why I'm fairly confident that your issue goes a bit deeper than a simple set of cyclic references. One issue I have had with SQLObject when inserting large numbers of objects is that the cache didn't check to see how many items it was storing when objects were inserted into it, only when they were fetched. So storing lots of objects without doing any queries can cause the cache to grow substantially (I don't know if this has been fixed yet, I know I patched the version I was using). Not sure if this might be your issue, just letting you know. Schiavo Simon |
From: Dan P. <da...@ag...> - 2006-05-30 17:42:12
|
On Tuesday 30 May 2006 17:41, Simon Cross wrote: > > I think you misread my example. The test script I attached > > creates a database object and immediately destroys it with > > o.destroySelf() so the net result should be zero because the > > object is removed from both the database and the sqlobject > > cache when it's destroyed. > > My bad. > > However, the reason the objects aren't being freed in your example > code is that you tell the garbage collector not to with > gc.set_debug(gc.DEBUG_LEAK) which also sets DEBUG_SAVEALL. I know that. But that is just to highlight the problem and make it obvious. Without that it's hidden by the gc. It's just a test script to show the issue, not a part of any real application. > Note that I have personally used SQLObject 0.7.0 to insert millions of > objects into a MySQL database and had the garbage collector > automatically work for me, which is why I'm fairly confident that your > issue goes a bit deeper than a simple set of cyclic references. I have no deep issue really. Just a patch to clean up 2 cyclic references. I know very well that if I disable the leak debugging, the memory usage will stay still, but take my test script for what it really is: a simple script to highlight the issue, not an application with problems. So, are you suggesting that since the garbage collector does the job of finding them, we shouldn't bother to remove some cyclic references we know about and avoid some unnecessary garbage collecting? -- Dan |
From: Oleg B. <ph...@ph...> - 2006-09-26 12:41:54
|
Hello! On Mon, May 29, 2006 at 10:26:49PM +0300, Dan Pascu wrote: > def __init__(self, instance): > - self.instance = instance > + self.instance = weakref.proxy(instance) > > def setClass(cls, soClass): > cls.soClass = soClass > @@ -1501,7 +1502,7 @@ > class SQLObjectState(object): > > def __init__(self, soObject): > - self.soObject = soObject > + self.soObject = weakref.proxy(soObject) > self.protocol = 'sql' The patch breaks test_cache.py: def test_cache(): setupClass(CacheTest) s = CacheTest(name='foo') obj_id = id(s) s_id = s.id assert CacheTest.get(s_id) is s assert not s.sqlmeta.expired CacheTest.sqlmeta.expireAll() assert s.sqlmeta.expired del s CacheTest.sqlmeta.expireAll() s = CacheTest.get(s_id) # We should have a new object: E assert id(s) != obj_id > assert -1219232756 != -1219232756 + where -1219232756 = id(<CacheTest 1 name='foo'>) Please investigate if this can be fixed. Oleg. -- Oleg Broytmann http://phd.pp.ru/ ph...@ph... Programmers don't die, they just GOSUB without RETURN. |
From: Oleg B. <ph...@ph...> - 2006-10-02 14:08:29
|
Hello! I'd like to return to the problem the patch caused in test_cache. On Mon, May 29, 2006 at 10:26:49PM +0300, Dan Pascu wrote: > def __init__(self, instance): > - self.instance = instance > + self.instance = weakref.proxy(instance) > > def setClass(cls, soClass): > cls.soClass = soClass > @@ -1501,7 +1502,7 @@ > class SQLObjectState(object): > > def __init__(self, soObject): > - self.soObject = soObject > + self.soObject = weakref.proxy(soObject) > self.protocol = 'sql' What do you think of the following test_cache? def test_cache(): setupClass(CacheTest) s = CacheTest(name='foo') obj_id = id(s) s_id = s.id assert CacheTest.get(s_id) is s weak_s = weakref.ref(s) assert not s.sqlmeta.expired CacheTest.sqlmeta.expireAll() assert s.sqlmeta.expired del s CacheTest.sqlmeta.expireAll() s = CacheTest.get(s_id) # We should have the old object deleted assert weak_s() is None Oleg. -- Oleg Broytmann http://phd.pp.ru/ ph...@ph... Programmers don't die, they just GOSUB without RETURN. |
From: Dan P. <da...@ag...> - 2006-10-02 17:53:51
|
Oleg, I will return to this and check what is wrong when I will get a little time. Currently I'm over my head with some work projects, sorry. It may be something about the expireAll method. When I implemented this, that method wasn't yet implemented. I tried to manually expire the record with rec.expire() and it that case it worked as expected (the cache returned a new object). So I'll have to look if there is something wrong with expireAll or is just the way the test was setup. Regarding your version below, even if you keep a reference to the object during the cache cleanup, the cache shouldn't return a reference to it after that. I'll need to better look at this and I'll get back to you about it as soon as I find some time to investigate. While your test below should work, it may hide some problems inside expireAll On Monday 02 October 2006 17:08, Oleg Broytmann wrote: > Hello! I'd like to return to the problem the patch caused in > test_cache. > > On Mon, May 29, 2006 at 10:26:49PM +0300, Dan Pascu wrote: > > def __init__(self, instance): > > - self.instance = instance > > + self.instance = weakref.proxy(instance) > > > > def setClass(cls, soClass): > > cls.soClass = soClass > > @@ -1501,7 +1502,7 @@ > > class SQLObjectState(object): > > > > def __init__(self, soObject): > > - self.soObject = soObject > > + self.soObject = weakref.proxy(soObject) > > self.protocol = 'sql' > > What do you think of the following test_cache? > > def test_cache(): > setupClass(CacheTest) > s = CacheTest(name='foo') > obj_id = id(s) > s_id = s.id > assert CacheTest.get(s_id) is s > weak_s = weakref.ref(s) > assert not s.sqlmeta.expired > CacheTest.sqlmeta.expireAll() > assert s.sqlmeta.expired > del s > CacheTest.sqlmeta.expireAll() > s = CacheTest.get(s_id) > # We should have the old object deleted > assert weak_s() is None > > Oleg. -- Dan |
From: Oleg B. <ph...@ph...> - 2006-10-03 15:18:09
|
I don't think the problem is in .expireAll(). What is going on, as I understand, is that .expireAll() clears the cache and as now there are only weak reference to the row, Python garbage-collects it immediately. After that Python is free to create another object at the same address. the solution would be either to check weak references or to hold a real reference, like this: def test_cache(): setupClass(CacheTest) s = CacheTest(name='foo') obj_id = id(s) s_id = s.id assert CacheTest.get(s_id) is s assert not s.sqlmeta.expired CacheTest.sqlmeta.expireAll() assert s.sqlmeta.expired CacheTest.sqlmeta.expireAll() s1 = CacheTest.get(s_id) # We should have a new object: assert id(s1) != obj_id obj_id2 = id(s1) CacheTest._connection.expireAll() s2 = CacheTest.get(s_id) assert id(s2) != obj_id and id(s2) != obj_id2 (I have removed "del" statments and created s1 and s2 instead of s.) If I understand it right, the failure in test_cache() actually shows that your patch really works and helps to save memory! (-: Oleg. -- Oleg Broytmann http://phd.pp.ru/ ph...@ph... Programmers don't die, they just GOSUB without RETURN. |
From: Dan P. <da...@ag...> - 2006-10-03 16:10:20
|
On Tuesday 03 October 2006 18:17, Oleg Broytmann wrote: > I don't think the problem is in .expireAll(). What is going on, as I > understand, is that .expireAll() clears the cache and as now there are > only weak reference to the row, Python garbage-collects it immediately. > After that Python is free to create another object at the same address. I thought about this in the beginning, but I tried some simple test which was unsuccessful, so I thought it must be something else. The test showed that addresses were not reused: >>> s = {'foo': 1} >>> id(s) -1210595428 >>> del s >>> s = {'bar': 1} >>> id(s) -1210595972 But today I tried the test differently: >>> class Obj(object): ... def __del__(self): ... print "deleted" ... >>> o = Obj() >>> id(o) -1210983732 >>> del o deleted >>> o = Obj() >>> id(o) -1210983636 >>> class A: ... def __del__(self): ... print "deleted" ... >>> a = A() >>> id(a) -1211041652 >>> del a deleted >>> a = A() >>> id(a) -1211041652 So it seems that for objects that are obtained from classes derived from object (strings, dictionaries, Obj(object), ...) it won't reuse the addresses, but for objects that result from old style (pre 2.2) classes it seems it will. So if those objects that are referenced by the weakref are not derived from new style classes, this would explain it indeed. > the solution would be either to check weak references or to hold a real > reference, like this: > > def test_cache(): > setupClass(CacheTest) > s = CacheTest(name='foo') > obj_id = id(s) > s_id = s.id > assert CacheTest.get(s_id) is s > assert not s.sqlmeta.expired > CacheTest.sqlmeta.expireAll() > assert s.sqlmeta.expired > CacheTest.sqlmeta.expireAll() > s1 = CacheTest.get(s_id) > # We should have a new object: > assert id(s1) != obj_id > obj_id2 = id(s1) > CacheTest._connection.expireAll() > s2 = CacheTest.get(s_id) > assert id(s2) != obj_id and id(s2) != obj_id2 > If this test works I think it's a better version of the original one. This test looks fine to me as it shows that the object the cache returns after an expiration is different that the one before even if it has the same id (or at least it would have the same id if the old object would not be kept around by a reference) > (I have removed "del" statments and created s1 and s2 instead of s.) > > If I understand it right, the failure in test_cache() actually shows > that your patch really works and helps to save memory! (-: That thing I already knew without the test ;) -- Dan |
From: Oleg B. <ph...@ph...> - 2006-10-03 17:28:45
|
The patch is applied and committed in the revision 1974 in the trunk. Thank you! Oleg. -- Oleg Broytmann http://phd.pp.ru/ ph...@ph... Programmers don't die, they just GOSUB without RETURN. |
From: Oleg B. <ph...@ph...> - 2006-10-03 17:10:22
|
On Tue, Oct 03, 2006 at 07:10:09PM +0300, Dan Pascu wrote: > So it seems that for objects that are obtained from classes derived from > object (strings, dictionaries, Obj(object), ...) it won't reuse the > addresses, but for objects that result from old style (pre 2.2) classes > it seems it will. I think this is caused by the different handling of new and old-style classes by the garbage collector. Instances of the old-style classes are freed immediately, instances of the new style classes are left dormant until gc collects them. Or may be the existence of __del__ changes the handling. With __del__ instances are freed immediately, without it are left for later collecting. Oleg. -- Oleg Broytmann http://phd.pp.ru/ ph...@ph... Programmers don't die, they just GOSUB without RETURN. |
From: Dan P. <da...@ag...> - 2006-10-03 17:33:29
|
On Tuesday 03 October 2006 20:10, Oleg Broytmann wrote: > On Tue, Oct 03, 2006 at 07:10:09PM +0300, Dan Pascu wrote: > > So it seems that for objects that are obtained from classes derived > > from object (strings, dictionaries, Obj(object), ...) it won't reuse > > the addresses, but for objects that result from old style (pre 2.2) > > classes it seems it will. > > I think this is caused by the different handling of new and > old-style classes by the garbage collector. Instances of the old-style > classes are freed immediately, instances of the new style classes are > left dormant until gc collects them. > Or may be the existence of __del__ changes the handling. With > __del__ instances are freed immediately, without it are left for later > collecting. I had a __del__ in the Obj(object) class but still the id was different for the new object, even though I've seen the __del__ being called, so the object was actually deleted. > > Oleg. -- Dan |