Re: [pygccxml-development] Another performance tweak
Brought to you by:
mbaas,
roman_yakovenko
From: Allen B. <al...@vr...> - 2006-08-27 16:38:26
|
I have included comments below. Before that though... I have found a few more optimizations: - Making create_identifier return identity adds about 15% to performance - Adding a cache to module_builder that caches the global namespace tree after parsing, joining, and running the query optimizer. This boosts performance by another 40%. (both by saving more startup and making the query optimization initialization less costly so it can be used) So now I have my run time down to aroung 59 seconds. Although I would really like to make it go even faster I think taking the running time down from the original 12 minutes to 1 minute now is a good enough improvement to make it so I can use py++ much easier in normal development. Roman: Do you want a patch for module_builder caching or do you just want me to commit it? (all the changes are in the __init__ method and are very clear so you can see what is in there and change it if you prefer) Now on to the comments.... Roman Yakovenko wrote: > On 8/26/06, Allen Bierbaum <al...@vr...> wrote: > >> >> - Caching the results of type_traits.remove_alias improved >> performance >> >> by 20% >> > >> > >> > Yes, this algorithms is used every where. Please don't commit this >> > change, I'd like >> > to think about it. I see next problem here: there are a lot of types, >> > so the cache >> > memory size could go out of control. Can you find out how many memory >> > Py++ takes with\without the cache? >> >> I can try to take a look at this. I don't have an easy way to do it >> right now but I will try later. > > > I am think that I don't want to introduce cache of results of > type_traits. > I don't feel comfortable with them. One of the reasons is that it is > not that easy to > disable them. Maybe I am missing something, but why can't we control them by introducing a flag in type_traits like "caching_enabled" or something and then just testing that whenever the method is called. It would be a shame not to optimize this method when it gives a 20% performance boost. >> If you never call add_namespace_usage or add_namespace_alias, then will >> create_identifier ever need to do anything? Maybe we could make an >> optimization that keeps a global flag around and just skips the work in >> this method if you never set any namespace information. What do you >> think? > > > I think that you can replace create_identifier with "identity" > function and this solution > will always work :-) > > def create_identifier_fast( creator, full_name ): > return full_name > Are you suggesting I add this as a new method and change the code that calls it? In my own local copy I have replaced "created_identifier" with an identify function and am getting very good results. > And than replace create_identifier with the fast one from goodies. > > I committed small changes to "optimization" feature. New feature: it > is possible to > control cache in all project: > > for d in mb.decls(): > d.cache.disable() > > It is not easy to achieve same goal with types cache. > > Would you mind to add documentation strings to the module? Right now I am not quite ready to update to your changes. I have some concerns about the complexity and performance of the way you implemented this. - Why is there an "algorithms_cache_t" object as the base class to "declaration_algs_cache_t"? The split with a base class doesn't seem to serve much of a purpose. - What is the performance implication of using "properties"? I know you like to use them in your code but how do they affect performance? - Handling of enabled flag. I think the handling of the enabled flag should be done in the "set" method instead of the get method. As it stands with your change, our optimized path will require two if tests (one in the local method to test for None, and one in the get method to test enabled). If we moved the enabled test to the set method we would only pay the cost of that if when we have optimizations diabled. > I left some interesting problem to you: it is possible to optimize > declaration_path > algorithm even more. In most cases its complexity could be O(1). This > could be done > by saving intermediate results. Another way to say it: to re-use > parent declaration path > cache. > > Can you publish top N lines of your benchmark result? > Sure. Give me a bit to rerun it. I can probably just send you a compress hotspot file so you can see the entire details. -Allen |