Thread: [Modeling-users] slooow delete
Status: Abandoned
Brought to you by:
sbigaret
From: John L. <jl...@gm...> - 2005-03-04 20:38:24
|
I have a table that has ~28k rows, and doing a simple ec.delete(obj) ec.saveChanges() takes about 60 seconds. This is for a web app, so the delay is way beyond what a user would find reasonable. Is there any way to speed this up? -- John Lenton (jl...@gm...) -- Random fortune: Don't anthropomorphise computers and cars, They hate that. |
From: John L. <jl...@gm...> - 2005-03-05 15:36:26
|
On 05 Mar 2005 16:31:38 +0100, Sebastien Bigaret <sbi...@us...> wrote: > > That's waaaaayyy to slow, even for a regular app. Before investigating > any further, could you integrate the following patch and send the result > of ec.currentStateSummary() just before the saveChanges()? I'm guessing you forgot the patch? -- John Lenton (jl...@gm...) -- Random fortune: Don't anthropomorphise computers and cars, They hate that. |
From: Sebastien B. <sbi...@us...> - 2005-03-05 21:09:42
|
John Lenton <jl...@gm...> wrote: > I'm guessing you forgot the patch? That's right! Here it is: Index: Modeling/EditingContext.py =================================================================== RCS file: /cvsroot/modeling/ProjectModeling/Modeling/EditingContext.py,v retrieving revision 1.34 diff -u -r1.34 EditingContext.py --- Modeling/EditingContext.py 20 Jul 2004 06:21:37 -0000 1.34 +++ Modeling/EditingContext.py 5 Mar 2005 15:41:26 -0000 @@ -351,6 +351,23 @@ # Registering for notifications __TBD + def currentStateSummary(self): + """ + Returns some informations about the EditingContext, such as the number of + objects it holds, the number of inserted objects, etc. + """ + info={} + import time + info['time'] = time.time() + info['nb_objects'] = len(self._uniquingTable.globalIDs()) + info['inserted'] = len(self._insertedObjects) + info['updated' ] = len(self._updatedObjects) + info['deleted' ] = len(self._deletedObjects) + info['pending_inserted'] = len(self._pendingInsertedObjects) + info['pending_updated' ] = len(self._pendingUpdatedObjects) + info['pending_deleted' ] = len(self._pendingDeletedObjects) + info['auto_insertion'] = self._propagatesInsertionForRelatedObjects + return info # Changes within the editing context def allDeletedGlobalIDs(self): |
From: Sebastien B. <sbi...@us...> - 2005-03-05 15:58:27
|
Hi John, John Lenton <jl...@gm...> wrote: > I have a table that has ~28k rows, and doing a simple >=20 > ec.delete(obj) > ec.saveChanges() >=20 > takes about 60 seconds. This is for a web app, so the delay is way > beyond what a user would find reasonable. Is there any way to speed > this up? > --=20 > John Lenton (jl...@gm...) -- Random fortune: > Don't anthropomorphise computers and cars, They hate that. That's waaaaayyy to slow, even for a regular app. Before investigating any further, could you integrate the following patch and send the result of ec.currentStateSummary() just before the saveChanges()? I'd like to see whether this is actually the only change or not, and how much objects are handled by the ec. Even better, if you can, change ec.delete/ec.saveChange into: ec.delete(obj) print ec.currentStateSummary() # print it or log it if this is a web app ec.processRecentChanges() print ec.currentStateSummary() ec.saveChanges() print ec.currentStateSummary() -- S=E9bastien. |
From: John L. <jl...@gm...> - 2005-03-09 15:55:34
|
On 05 Mar 2005 16:31:38 +0100, Sebastien Bigaret <sbi...@us...> wrote: > > Even better, if you can, change ec.delete/ec.saveChange into: > > ec.delete(obj) > print ec.currentStateSummary() # print it or log it if this is a web app > ec.processRecentChanges() > print ec.currentStateSummary() > ec.saveChanges() > print ec.currentStateSummary() instead of the print I did print >> sys.stderr, '***', pprint.pformat(ec.currentStateSummary()) and here is the output: *** {'auto_insertion': 0, 'deleted': 0, 'inserted': 0, 'nb_objects': 2311, 'pending_deleted': 1, 'pending_inserted': 0, 'pending_updated': 2, 'time': 1110383332.976902, 'updated': 0} *** {'auto_insertion': 0, 'deleted': 1, 'inserted': 0, 'nb_objects': 21184, 'pending_deleted': 0, 'pending_inserted': 0, 'pending_updated': 0, 'time': 1110383439.4831209, 'updated': 3} *** {'auto_insertion': 0, 'deleted': 0, 'inserted': 0, 'nb_objects': 21183, 'pending_deleted': 0, 'pending_inserted': 0, 'pending_updated': 0, 'time': 1110383446.224946, 'updated': 0} -- John Lenton (jl...@gm...) -- Random fortune: Don't anthropomorphise computers and cars, They hate that. |
From: Sebastien B. <sbi...@us...> - 2005-03-10 14:13:12
|
John Lenton <jl...@gm...> wrote: > On 05 Mar 2005 16:31:38 +0100, Sebastien Bigaret > <sbi...@us...> wrote: > >=20 > > Even better, if you can, change ec.delete/ec.saveChange into: > >=20 > > ec.delete(obj) > > print ec.currentStateSummary() # print it or log it if this is a web = app > > ec.processRecentChanges() > > print ec.currentStateSummary() > > ec.saveChanges() > > print ec.currentStateSummary() >=20 > instead of the print I did >=20 > print >> sys.stderr, '***', pprint.pformat(ec.currentStateSummary()) >=20 > and here is the output: >=20 > *** {'auto_insertion': 0, > 'deleted': 0, > 'inserted': 0, > 'nb_objects': 2311, > 'pending_deleted': 1, > 'pending_inserted': 0, > 'pending_updated': 2, > 'time': 1110383332.976902, > 'updated': 0} > *** {'auto_insertion': 0, > 'deleted': 1, > 'inserted': 0, > 'nb_objects': 21184, > 'pending_deleted': 0, > 'pending_inserted': 0, > 'pending_updated': 0, > 'time': 1110383439.4831209, > 'updated': 3} > *** {'auto_insertion': 0, > 'deleted': 0, > 'inserted': 0, > 'nb_objects': 21183, > 'pending_deleted': 0, > 'pending_inserted': 0, > 'pending_updated': 0, > 'time': 1110383446.224946, > 'updated': 0} Bad news, most of the time is eaten by processRecentChanges()... which fetches 18873 objects and modifies 3 of them. I can't see where it can occur except in EntityClassDexcription's propagateDeleteForObject(). Can you be a little more explicit about the model (maybe w/ names like A, B and Cs if you cannot disclose the real names), and esp. the relationship between the deleted object's entity and other entities? As far as I can see, the propagation of the deletion implies the modification of 3 objects, so the relationship linking these objects to the deleted object must be a DELETE_NULLIFY one; now I'd like to know more about the type of the relationships (toone/tomany). In particular, I'd like to understand why so many objects are fetched (is there, by any chance, a to-many relationships without an inverse relationship defined pointing back to the object?) -- S=E9bastien. NB: In case it might help: you can quickly see how many objects of which type are in the ec w/ something like:: d=3D{} # { "entity name": nb_of_objects } for gID in ec._uniquingTable.globalIDs(): d[gID.entityName()] =3D d.setdefault(gID.entityName(), 0) + 1 |
From: John L. <jl...@gm...> - 2005-03-10 16:20:54
|
On 10 Mar 2005 15:10:59 +0100, Sebastien Bigaret <sbi...@us...> wrote: > > I'd like to understand why so many objects are fetched (is there, by any > chance, a to-many relationships without an inverse relationship defined > pointing back to the object?) I'm pretty certain there isn't; this is the entity I'm deleting (sans the non-relationship properties): Entity ('Task', properties= [ RToOne ('need', 'Need', joinSemantic=2, inverse='tasks'), RToOne ('team', 'Team', joinSemantic=2, inverse='tasks'), RToOne ('mealType', 'MealType', joinSemantic=2, inverse='tasks'), ], ), the weird thing is that the entities it loads are all Tasks, the same thing (a whole load of Tasks getting loaded) happens upon insert; what triggers it in that case is team.addToTasks(task) whereas on the previous line I have a need.addToTasks(task) and that doesn't trigger it. Here are the Relationships defined in both Need and Team: Entity ('Team', properties= [ RToOne ('offer', 'Offer', joinSemantic=2, inverse='teams'), RToMany ('tasks', 'Task', inverse='team'), RToMany ('userTeams', 'UserTeam', inverse='team'), ], ), Entity ('Need', properties= [ RToOne ('category', 'Category', joinSemantic=2, inverse='needs'), RToOne ('sponsor', 'User', joinSemantic=2, inverse='needers'), RToOne ('offer', 'Offer', joinSemantic=2, inverse='needs'), RToOne ('user', 'User', joinSemantic=2, inverse='needs'), RToOne ('visibility', 'Visibility', joinSemantic=2, inverse='needs'), RToMany ('tasks', 'Task', inverse='need'), ], ), -- John Lenton (jl...@gm...) -- Random fortune: Don't anthropomorphise computers and cars, They hate that. |
From: Sebastien B. <sbi...@us...> - 2005-03-11 21:19:48
|
John Lenton <jl...@gm...> wrote: > On 10 Mar 2005 15:10:59 +0100, Sebastien Bigaret > <sbi...@us...> wrote: > >=20 > > I'd like to understand why so many objects are fetched (is there, by any > > chance, a to-many relationships without an inverse relationship defined > > pointing back to the object?) >=20 > I'm pretty certain there isn't; this is the entity I'm deleting (sans > the non-relationship properties): Just to be clear: I did not meant that having such a relationship (without an inverse defined) is a bad thing, I was just thinking loudly about the reason why so much objects were fetched... but I did not think about the (now) obvious reasons: >=20 > Entity ('Task', > properties=3D [ > RToOne ('need', 'Need', joinSemantic=3D2, inverse=3D'task= s'), > RToOne ('team', 'Team', joinSemantic=3D2, inverse=3D'task= s'), > RToOne ('mealType', 'MealType', joinSemantic=3D2, > inverse=3D'tasks'), > ], > ), >=20 > the weird thing is that the entities it loads are all Tasks, the same > thing (a whole load of Tasks getting loaded) happens upon insert; what > triggers it in that case is >=20 > team.addToTasks(task) >=20 > whereas on the previous line I have a >=20 > need.addToTasks(task) >=20 > and that doesn't trigger it. Okay: when addToTasks() is called: 1. the to-many fault is triggered, fetching all related objects and populating the corresponding list, here 'self._tasks' probably, 2. the new task is added to the list. The fact that need.addToTasks(task) does not trigger any fetch simply means that the to-many faults is already cleared --or that it also triggers the fetch but you do not notice it since that 'need' object does not have (many) tasks assigned. Back to your problem in ec.delete(task): when ec.processRecentChanges() is called, it examines your object's relationships: 'need', 'team' and 'mealType', all three being of type DELETE_NULLIFY. So: at processRecentChanges() time the ec takes care of the deletion policy. For that purpose, in this case it gets these 3 related objects: need, team and mealType (using KeyValueCoding) and removes the deleted object 'task' from their tasks... but when it does that, *it triggers all three to-many faults*: need.tasks, team.tasks and mealType.tasks... and I suspect that at least one of them is quite big. Now we know why we have 1 deleted object, 3 updated ones (the related need, team and mealType), and a bunch of fetched Tasks in the ec after processRecentChanges(). And your next question will probably be: okay, and how to change that behaviour? Here is a description of what should be a definitive solution, and then a workaround you may use till that solution is implemented. Changing the default baheviour requires that ec.processRecentChanges() becomes smarter: it should be changed so that it detects that the related objects 'need', 'team' and 'mealType' do not need to be fetched if they are not cached yet, or that their to-many faults 'tasks' should not be cleared if they hasn't been yet (including: in an other EC), because simply deleting the object removes the only information where both the relationship and its inverse is stored: the Foreign Key stored in the deleted object tasks's DB-row pointing to resp. tables NEED, TEAM and MEAL_TYPE. [1] I probably won't have the time for this in the coming weeks; I'm currently overwhelmed by work and (luckily!) personal activities. In the mean time you can bypass that behaviour by removing the related objects: task.setNeed(None) task.setTeam(None) task.setMealType(None) ec.delete(task) ec.saveChanges() Note: if, for example, the related 'team' object is the only one that has a big bunch of 'Tasks in stored in its inverse relationship, you can simply setTeam(None) and leave the other objects be fetched and updated by the default behaviour. I'm saying this because there is a=20 *Big fat warning* (!): do that if you're absolutely sure that the related 'need', 'team' and 'mealType' objects are not already fetched in other ECs, *or*, if they are fetched, that the "inverse" tomany faults are not cleared. Respectively, check that (for 'need', same for 'team' and 'mealType'): db=3Dec.rootObjectStore().objectStoreForObject(task).database() task_need_gID=3Dtask.getNeed().globalID() db.snapshotForGlobalID(task_need_gID=3Dtask) # None if unfetched yet and if it's not None: db.snapshotForSourceGlobalID(task_need_gID, 'tasks') # should be () Otherwise you may experience weird side-effects on the corresponding objects already fetched in other ECs, esp. when saving them (no pb. if you use a single EC in the app., however). Hopefully this will help. -- S=E9bastien. [1] some further notes about the internals you can safely ignore (I'll use this post when working on the problem later): to be precise, propagateDeleteForObject() should be changed here. And the Database's cache should also be changed accordingly, esp. in case where the related objects are already fetched and the corresponding to-many faults are already cleared. Plus, when notification of changes is implemented, notification of changes for related objects should be posted as well (even if the objects are not actually fetched nor their tomany faults cleared for efficiency reasons)--this implies that EntityClassDescription's method propagateDeleteForObject() has a mean to communicate that information back to the EC for use at the end of the saveChanges() process when notifications are broadcasted) >=20 > Here are the Relationships defined in both Need and Team: >=20 > Entity ('Team', > properties=3D [ > RToOne ('offer', 'Offer', joinSemantic=3D2, inverse=3D'te= ams'), > RToMany ('tasks', 'Task', inverse=3D'team'), > RToMany ('userTeams', 'UserTeam', inverse=3D'team'), > ], > ), > Entity ('Need', > properties=3D [ > RToOne ('category', 'Category', joinSemantic=3D2, > inverse=3D'needs'), > RToOne ('sponsor', 'User', joinSemantic=3D2, inverse=3D'n= eeders'), > RToOne ('offer', 'Offer', joinSemantic=3D2, inverse=3D'ne= eds'), > RToOne ('user', 'User', joinSemantic=3D2, inverse=3D'need= s'), > RToOne ('visibility', 'Visibility', joinSemantic=3D2, > inverse=3D'needs'), > RToMany ('tasks', 'Task', inverse=3D'need'), > ], > ), >=20 >=20 > --=20 > John Lenton (jl...@gm...) -- Random fortune: > Don't anthropomorphise computers and cars, They hate that. |
From: John L. <jl...@gm...> - 2005-03-15 16:44:45
|
On 11 Mar 2005 22:19:37 +0100, Sebastien Bigaret <sbi...@us...> wrote: > > *Big fat warning* (!): do that if you're absolutely sure that the > related 'need', 'team' and 'mealType' objects are not already > fetched in other ECs, *or*, if they are fetched, that the "inverse" > tomany faults are not cleared. If I read you correctly then I'm safe, because I never do need.setTasks(None) (is this what you mean by clearing the tomany faults?) > > Respectively, check that (for 'need', same for 'team' and > 'mealType'): > > db=ec.rootObjectStore().objectStoreForObject(task).database() > > task_need_gID=task.getNeed().globalID() > db.snapshotForGlobalID(task_need_gID=task) # None if unfetched yet > > and if it's not None: > > db.snapshotForSourceGlobalID(task_need_gID, 'tasks') # should be () > > Otherwise you may experience weird side-effects on the corresponding > objects already fetched in other ECs, esp. when saving them (no > pb. if you use a single EC in the app., however). I'm not clear what the purpose of this is: is it to ensure that the task.setMeal(None) is safe? basically what I'll be doing, unless you tell me otherwise, is: - for creating, do the task.setMeal(meal) but not the meal.addToTasks(task) - for deleteing, do the task.setMeal(None) but not the meal.removeFromTasks(task) given that I never, ever (promise! :D) clear the tasks, and that I always saveChanges after deleteing or creating each task, although I have many editingContexts (inside zope, one per session using ZEditingContextSessioning), it is my understanding that I'll be safe. Right? -- John Lenton (jl...@gm...) -- Random fortune: Don't anthropomorphise computers and cars, They hate that. |
From: Sebastien B. <sbi...@us...> - 2005-03-18 11:19:06
|
John Lenton <jl...@gm...> wrote: > On 11 Mar 2005 22:19:37 +0100, Sebastien Bigaret > <sbi...@us...> wrote: > > > > *Big fat warning* (!): do that if you're absolutely sure that the > > related 'need', 'team' and 'mealType' objects are not already > > fetched in other ECs, *or*, if they are fetched, that the "inverse" > > tomany faults are not cleared. > > If I read you correctly then I'm safe, because I never do > > need.setTasks(None) > > (is this what you mean by clearing the tomany faults?) My apologize John, I'm afraid I was not clear (!) at all, too much jargon I guess. What I meant by "clearing the fault" was the same as "triggering the fault", in clear: fetching the objects corresponding to a to-many fault to turn the fault into a real list populated w/ the corresponding objects. For example, need.getTasks() returns a fault, but calling len() on it, or accessing a member as in need.getTasks()[0], or inserting a new element in the list, triggers the fault and makes the framework fetch the objects==the related tasks. > > > > Respectively, check that (for 'need', same for 'team' and > > 'mealType'): > > > > db=ec.rootObjectStore().objectStoreForObject(task).database() > > > > task_need_gID=task.getNeed().globalID() > > db.snapshotForGlobalID(task_need_gID=task) # None if unfetched yet > > > > and if it's not None: > > > > db.snapshotForSourceGlobalID(task_need_gID, 'tasks') # should be () > > > > Otherwise you may experience weird side-effects on the corresponding > > objects already fetched in other ECs, esp. when saving them (no > > pb. if you use a single EC in the app., however). > > I'm not clear what the purpose of this is: is it to ensure that the > task.setMeal(None) is safe? The purpose of the code I posted was to make sure that there does *not* exist an other EditingContext where the same object task.getNeed() has been fetched and where the list task.getNeed().getTasks() is not a fault anymore. However I must now admit that the warning I wrote in my last post was in fact exagerated [1]: the only problem you can have is to override changes that have been made in an other EC --and the problem we're discussing here is just a particular point in a more general category of problems, the one exposed at http://modeling.sf.net/UserGuide/framework-integration-sessioning-ec.html and which has nothing to do in particular with relationships, but w/ an object's properties in general. So you can forget that code, see below for a simpler one. A related questions BTW: are you using patch #911567 ? https://sf.net/tracker/index.php?func=detail&aid=911567&group_id=58935&atid=489337 (which is a first step toward the resolution of the problem) > basically what I'll be doing, unless you tell me otherwise, is: > > - for creating, do the task.setMeal(meal) but not the meal.addToTasks(task) > - for deleteing, do the task.setMeal(None) but not the > meal.removeFromTasks(task) > > given that I never, ever (promise! :D) clear the tasks, and that I > always saveChanges after deleteing or creating each task, although I > have many editingContexts (inside zope, one per session using > ZEditingContextSessioning), it is my understanding that I'll be safe. > Right? Okay, you can do this to *accelerate* the process of inserting, and removing tasks. However, say, if a meal always have a reasonable number of related tasks, then the overhead of fetching those meal's tasks won't be noticeable, so in this case you'd better update the two sides of the relationships (same for Teams and Needs). So the answer is: yes, do this w/ every kind of objects that *may* hold a real bunch of tasks (the kind(s) of objects that may be related to +20,000 tasks, as seen in the previous posts). And, if you do this: just make sure that - either task.getMeal().isFault() == true - or that, in case it is not, that task.getMeal().getTasks().isFault()==true In this last case, if it is false, then you can do the meal.addToTasks(task) without any overhead, since this indicate that the fetch has already be done. Sorry again for having obfuscated the problem by my previous answer, hopefully this is clearer now. -- Sébastien. PS: my previous remarks about the possibility for the framework to be smarter in the situation where the only modification is the addition or removal of objects from a list are still valid, though. [1] I had in mind that the cache of to-many faults might become desynchronized wrt the database state, but since I checked the code and the TODO list and realized that since cache is not used for now, no such problem can happen ;) |
From: John L. <jl...@gm...> - 2005-03-18 14:22:34
|
Thank you very much for your reply. Just one small doubt remains: On Fri, 18 Mar 2005 12:18:57 +0100, Sebastien Bigaret <sbi...@us...> wrote: > > And, if you do this: just make sure that > > - either task.getMeal().isFault() == true > > - or that, in case it is not, that > task.getMeal().getTasks().isFault()==true > > In this last case, if it is false, then you can do the > meal.addToTasks(task) without any overhead, since this indicate > that > the fetch has already be done. is there any case where the last line would return false, instead of raising an AttributeError? -- John Lenton (jl...@gm...) -- Random fortune: Don't anthropomorphise computers and cars, They hate that. |
From: Sebastien B. <sbi...@us...> - 2005-03-18 18:01:05
|
John Lenton <jl...@gm...> wrote: > Thank you very much for your reply. Just one small doubt > remains: > > > On Fri, 18 Mar 2005 12:18:57 +0100, Sebastien Bigaret > <sbi...@us...> wrote: > > > > And, if you do this: just make sure that > > > > - either task.getMeal().isFault() == true > > > > - or that, in case it is not, that > > task.getMeal().getTasks().isFault()==true > > > > In this last case, if it is false, then you can do the > > meal.addToTasks(task) without any overhead, since this > indicate > > that > > the fetch has already be done. > > is there any case where the last line would return false, > instead of > raising an AttributeError? Good question ;) Here are the details First situation: you have a newly created object ; in this case, meal.getTasks() will be a real python list (actually a list or a tuple), assigned by the constructor, and you'll get an AttributeError. Second situation: the object has been fetched by the framework. In this case, any attribute corresponding to a tomany relationship is initialized with the so-called <<to-many fault>> object, responsible for lazily initializing the list when appropriate. If you're curious, you'll have a look at FaultHandler.AccessArrayFaultHandler, which is the corresponding class proxying the underlying list. So, in this case, you'll get a true or false answer when asking meal.getTasks().isFault(). As a conclusion, you're right, you have to check for AttributeError as well and your code will handle all possible situations. -- Sébastien. -- Sébastien. |