Re: [pygccxml-development] Another way to speed up the generation process...

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

Hi,

I spent a little more time on reducing the execution time of the code 
generation process and now I'm below 1 minute on both, Linux and OSX. :)

Linux: 59s (previously 4:40 minutes)
OSX: 44s (previously 7:20 minutes)

Here are the timings of the individual steps on OSX:

Parsing: 3s
Decoration: 13s
Building code creators: 14s
Writing files: 14s

(this is a run where all caches could be fully utilized and where no 
file had to be written)

What I did were two things: pruning the declaration tree and caching 
query operations. I don't use the pruning function from my earlier mail 
anymore because this pruning is only done after the header files were 
already parsed and stored in the cache. So while the function could 
speed up the decoration stage it could not speed up the parsing stage 
(which took more than a minute on OSX using the regular cache). So now I 
prune the declarations at an earlier stage, namely right after the XML 
file was created by gccxml and before pygccxml reads it. I wrote a 
little utility that directly operates on the XML file and outputs a 
pruned XML file which I use as input for Py++ (I think I'll put it into 
contrib eventually). This should now even be safer than before because I 
also keep all dependencies even when they are not inside any of the 
allowed headers. With the pruned tree alone I am already at the above 
parsing time (when the cache is used which required a small modification 
to pygccxml) and decoration was also noticeably faster (~1:40min) 
because there were less declarations to consider (as already mentioned 
in my earlier mail).
The next step in speeding things up was to cache the query operations. I 
did an experimental implementation in pypp_api that works in conjunction 
with the regular cache. Using this query cache, I get the above results 
and the decoration stage is no longer the bottleneck in my case.

I'm afraid any further optimizations require quite a bit of 
restructuring inside Py++. Running the profiler on the last two stages 
(building code creators and writing files) reports the following "hot 
spots" (the list is sorted by total time):

          26608350 function calls (26440374 primitive calls) in 341.650 
CPU seconds

    Ordered by: internal time
    List reduced from 856 to 10 due to restriction <10>

    ncalls  tottime  percall  cumtime  percall filename:lineno(function)
   9370513   60.550    0.000   60.550    0.000 :0(isinstance)
   4574860   53.200    0.000   82.480    0.000 
decl_wrappers/algorithm.py:87(<lambda>)
61199/49370   36.710    0.001  194.330    0.004 :0(filter)
    948954   20.230    0.000   35.240    0.000 
type_traits.py:42(remove_alias)
    320939    7.480    0.000   23.030    0.000 matchers.py:205(__call__)
    211696    6.120    0.000   12.180    0.000 matchers.py:224(check_name)
398580/387050    5.300    0.000    8.350    0.000 
code_creators/algorithm.py:42(proceed_single)
    369799    5.190    0.000   13.540    0.000 
code_creators/algorithm.py:40(make_flatten_generator)
    105595    4.990    0.000    8.060    0.000 
class_declaration.py:105(_get_name_impl)
    102200    4.790    0.000   36.450    0.000 
container_traits.py:40(get_container_or_none)

- Matthias -