Re: [pygccxml-development] Any recent commits that may affect performance
Brought to you by:
mbaas,
roman_yakovenko
From: Allen B. <al...@vr...> - 2007-02-16 19:41:48
|
Roman Yakovenko wrote: > On 2/16/07, Allen Bierbaum <al...@vr...> wrote: >> Is there any user API to disable the dependency manager (atleast during >> debugging and development)? > > No, it does not exists. > >> What is the basic structure of the dependency management search >> algorithm? Is it something like: >> >> - For every part of an interface exposed >> - For each item used in the interface >> - Look up the item and make sure that is exposed >> >> If it is something like this, then that could definitely be biting me. >> I have a *huge* number of symbols and something O(N^2) like this could >> really consume a lot of time. > > Yes. > >> In the meantime, I am trying to collect some profiler information to >> find out exactly where the time is being spent. Once I have those >> numbers collected I will pass them along in case there is anything that >> can be done to increase performance. > > Lets wait for results. I want to be sure before I introduce > functionality, which > disables dependency manager. You are right, removing the dependency manager did not help performance. I have attached the detailed performance profile results to this e-mail with the results sorted by total time in the given methods. The summary is: ncalls tottime percall cumtime percall filename:lineno(function) 49849182 452.827 0.000 755.786 0.000 .../pygccxml/declarations/algorithm.py:37(full_name) 50225914 338.388 0.000 1228.453 0.000 .../pygccxml/declarations/matchers.py:224(check_name) 50490184 244.846 0.000 1538.742 0.000 .../pygccxml/declarations/matchers.py:205(__call__) 151030576 198.249 0.000 198.249 0.000 .../pygccxml/declarations/matchers.py:160(_get_name) 102343991 169.654 0.000 169.654 0.000 .../pygccxml/declarations/declaration.py:241(cache) 49819921 153.485 0.000 1682.796 0.000 .../pygccxml/declarations/scopedef.py:273(<lambda>) 99681702 136.791 0.000 136.791 0.000 .../pygccxml/declarations/algorithms_cache.py:30(_get_full_name) 50591 84.602 0.002 1783.335 0.035 .../pygccxml/declarations/matcher.py:33(find) 1 46.046 46.046 46.047 46.047 .../pygccxml/parser/declarations_cache.py:166(flush) 1 32.485 32.485 32.535 32.535 .../pygccxml/parser/source_reader.py:151(create_xml_file) 6483/1 26.067 0.004 49.694 49.694 .../pygccxml/declarations/scopedef.py:144(init_optimizer) 6219137 12.618 0.000 12.618 0.000 .../pyplusplus/decl_wrappers/algorithm.py:88(<lambda>) 12960 9.767 0.001 27.079 0.002 .../pyplusplus/decl_wrappers/algorithm.py:80(create_identifier) 3353123 9.533 0.000 21.040 0.000 .../pygccxml/declarations/declaration.py:153(_get_name) I need to look into it closer, but it looks like there is something in the code that is checking and/or getting the name of decls a *huge* number of times as part of some algorithm. This may be a good opportunity for caching if we can be sure the value is set. I know from our previous discussions of caching that you had two big worries (that I share with you by the way). 1. Caching doesn't work well in some cases where user code may want to change the values being cached (runtime modification) 2. It makes the code ugly putting all these caches everywhere. That got me thinking about the python memoization pattern that can be handled by decorators. See: http://www.phyast.pitt.edu/~micheles/python/documentation.html#memoize and http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/325205 What I was thinking is that in the context of py++ it would be nice if the user could make whatever changes they want to the system and then once the state is "set", call a method that turns on caching from that point forward. Something like: ------------------------------ mb = ModuleBuilder(....) <do user changes> pyplusplus.memoizer.enable() # Everything we cache is set from here on ------------------------------ We could also introduce an enabled state that asserted that the cache value equals the value that would have been calculated. This would give a nice sanity check to use during debugging to make sure the cache is not breaking things. Anyway..... with a memoization pattern then the py++ internal code could look something like this: class declaration_t: .... @memoize def getName(...): return value as normal. So you see I think we could address both issues aboved. The code would allow for runtime modification by users until they say they are done. It would also allow for clean code because you could just put a decorator (@memoize) on methods that needed it and then they would automatically work. What do you think? Is this a way we could increase the performance but still keep the code clean and flexible? -Allen |