Re: [Modeling-users] Modeling performance for large number of objects?
Status: Abandoned
Brought to you by:
sbigaret
From: Sebastien B. <sbi...@us...> - 2004-12-21 14:05:21
|
Hi Wolfgang, John and all, Thanks John for giving the figures for the overhead induced by the framework when creating/manipulating objects. I'm currently away from my computer and I only have a web access, so that's hard to try and compare anything in those conditions ;) Wolfgang Keller <wol...@gm...> wrote: > The question is for me whether Modeling tries to and/or whether there > would be some other way to cut the hourglass-display-time to the > unavoidable minimum (dependend on the database) by some kind of smart > caching of objects or by maintaining some kind of pool of pre-created > object instances. For the moment being the framework does not use any kind of pool of pre-created objects. On the other hand, it caches database snapshots so that the data that have already been fetched is not fetched again (this avoids a time-expensive round-trip to the database). John Lenton <jl...@gm...> wrote: > Modeling [...] only saves those objects that have effectively changed, > so depending on your actual use cases you might be surprised at how > well it works. The loading, modifying and saving of all the objects is > pretty much the worse case; Modeling isn't meant (AFAICT) for that > kind of batch processing. It certainly is convenient, though :) Right: when saving changes the framework uses that cache to save only the objects that were actually modified/deleted. And I definitely agree w/ John, in that performance will highly depend on your particular use-case --maybe you could be more explicit on it? When you say that you need <<to process (create - transform - store) rather important amounts of data >>, do you mean that every single fetched object will be updated and stored back in the database? If this is the case, as John already pointed it out, this is the worst case and the most time-comsuming process you'll have w/ the framework. John Lenton <jl...@gm...> wrote: > Of course, in both cases, writing the sql script would've taken a > *lot* longer than the difference in run time, for me. However, it's > obvious that there are cases where the runtime difference overpowers > the developer time difference... ...and when runtime matters, you can also use the framework on sample data, extract the generated SQL statements and then directly use those statements in the real batch. Wolfgang Keller <wol...@gm...> wrote: > In fact my question was raised when I read the article about ERP5 on > pythonology.org, with the performance values they claimed for ZODB with > their ZSQLCatalog add-on. I would guess that their performance claims are > only valid if all the queried objects are in fact in memory...? I didn't read that article (didn't search it either I admit, do you have the hyperlink at hand?), but I suspect that the performance mostly comes from the fact that the ZODB.Persistent mixin class is written in C: while the overhead for object creation is probably still the same, the process of fully initializing an object (assign values to attributes) is pretty much quicker (as far as I remember, it directly sets the object's __dict__, so yes that's fast ;) The framework spend most of its initializing time in KeyValueCoding, (http://modeling.sourceforge.net/UserGuide/customobject-key-value-coding.html) examining objects and finding the correct way of setting the attributes. While this allows a certain flexibility, I now tend to believe that most applications pay the price for a feature they do not need (for example, the way attributes'values are assigned by the framework should probably be cached per class rather then systematically determined for every object). [1] John Lenton <jl...@gm...> wrote: > Of course, maybe Sébastien has a trick up his sleeve as to how one > could go about using Modeling for batch processing... Well, that's always hard to tell without specific use-cases, but the general advices are: - use the latest python, - use new-style classes rather than old-styles ones, - activate MDL_ENABLE_SIMPLE_METHOD_CACHE (http://modeling.sourceforge.net/UserGuide/env-vars-core.html) - specifically for batch processing: be sure to read: http://modeling.sourceforge.net/UserGuide/ec-discard-changes.html And of course, we'll be happy to examine your particular use-cases to help you optimize the process. -- Sébastien. [1] and thinking a little more about this, I now realize that the way this is done in the framework at initialization time is pretty stupid (the KVC mechanism should definitely be cached somehow at this point: since the framework creates the object before initializing it there is absolutely no reason for different objects of the same class to behave differently wrt KVC at this point)... I'll get back on this for sure. For the curious this is done in DatabaseContext.initializeObject(), lines 1588-1594. For the records I'll add that the prepareForInitializationWithKeys() stuff is also not needed. |