Re: [pygccxml-development] Re: recent changes to pygccxml\pyplusplus

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Roman Yakovenko wrote:
>> first needs to be resolved before I can finish the class (so is
>> source_reader_t.__parse_gccxml_created_file() supposed to return the
>> files as a dictionary or as a list?)
> 
> May be I missed something, but what cache do you implement?
> Any way the answer: list, I already fixed this bug.

I've implemented a new cache class because I had some issues with the 
file_cache_t class:

- The cache file is about 39MB and on a machine with 512MB main memory I 
couldn't use that cache anymore. The machine started memory thrashing 
while parsing the headers (when the cache was already there) and CPU 
usage dropped to around 1%.

- When I was creating wrappers for only a few selected classes using the 
cache took much more time than without cache (because the cache always 
loads all cached declarations, no matter if they are required or not).

That's why I had a look at the caching mechanism and implemented my own 
class that fixed the above issues. Instead of one single cache file the 
cache now uses a directory and stores individual files (one file per 
header). Here is a comparison of the time it takes to do the parse() 
step for 222 headers from the Maya SDK (the table is best viewed with a 
fixed pitch font):

                       | Parsing time | Cache size | Parameters
                       |   (min:sec)  |    (MB)    |
----------------------+--------------+------------+--------------
Without cache         |     2:53     |     -      |
                       |              |            |
File cache (initial)  |     4:12     |    39.1    |
File cache (cached)   |     1:58     |    39.1    |
                       |              |            |
Dir cache (initial)   |     3:40     |    38.4    | -compression
Dir cache (cached)    |     0:34     |    38.4    | -compression
Dir cache (initial)   |     4:03     |    11.8    | +compression
Dir cache (cached)    |     2:18     |    11.8    | +compression
----------------------+--------------+------------+--------------

The "initial" rows refer to the cases when the cache didn't exist yet 
and had to be build. But of course, this only has to be done once, so 
the "cached" rows are the more important ones. The directory cache has 
an option to compress the cache files which was used in the last two 
rows (so in my case, compression isn't really useful for me).

Memory usage of the directory cache is much lower, so I could also use 
that cache on the machine with "only" 512MB main memory. There's also no 
disadvantage anymore when only a few headers are parsed while the cache 
actually contains a lot more headers. Cached declarations that are not 
requested by the main program are never touched.

Roman, is it ok when I commit that cache into the pygccxml directory?
The implementation consists of one single file "directory_cache.py" 
which I would put into the pygccxml.parser directory. The new class is 
called "directory_cache_t" (because the user has to specify a directory 
name instead of a file name). There are also a few internal helper 
classes, but those aren't meant to be instantiated by the user. I was 
using your naming conventions as far as I could figure them out (lower 
case class/method names with underscores between words and the classes 
have a "_t" suffix. Private methods have a leading underscore). Doc 
strings are available. I haven't modified any other file, so everything 
will still work as before (and the file cache is still the default). To 
activate the directory cache the user currently has to instantiate a 
class himself and pass this instance to the parse() method.
I can also email you the file first if you want to have a look at it 
before it is actually added to the repository.

There's still one more question I have regarding the 
source_reader_t.__parse_gccxml_created_file() method. What is the exact 
meaning of the returned file list? The update() method of a cache class 
receives this list as "included_files" argument, so one might think the 
list only contains the files that were included from the corresponding 
header file. But I noticed that the list also contains the header file 
itself. Is this intentional (and can I rely on that behavior) or is this 
a bug and the header file does not belong into this list?

> 2. I updated setup file before release, when I have something like
> "feature freeze" period.
>     I suppose, that every one who use CVS will be able to use it
> without setup. I could be
>     wrong, but right now I prefer to concentrate my attention on something else

Well, it's only two lines that need to be added to setup_pyplusplus.py:

                     , 'pyplusplus.decl_wrappers'
                     , 'pyplusplus.module_builder'

If you want I can commit that change back into the repository myself.

>> Well, you could fill in some extra words (such as class_wrapper_t) but
>> as Python already organizes code in a hierarchy I guess it would be a
>> better idea to use decl_wrapper.class_t explicitly instead of importing
>> the classes into the namespace of another module. Then it's also clear
>> to the reader which class is being referred to.
> 
> So, basically you would like to stay with name I proposed,  but user
> is forced to use "fully qualified" names, am I right?

No. I don't want the user having to deal with all those classes anyway, 
so the user should only need *one* such class (of each kind) which he 
might import into his namespace if he wishes to.
My suggestion to use the fully qualified names only refers to the 
internal implementation (of course, this is entirely up to you what 
conventions you want to use as you're the maintainer of the package. I 
was just thinking that it might prevent confusion among those people who 
also want to have a look at the sources of pyplusplus... :)

>> In the Maya SDK there are a couple of related classes that basically
>> have the same interface (e.g. vector, float vector, point, float point,
>> color, then the same thing for array versions etc). When decorating
>> those classes I can treat every class of such a group the same and apply
>> the same operations. This is where being able to select stuff from
>> several classes at once can be quite handy.
> 
> May I give you a small advice? You can combine between power of pyplusplus and
> power of C++. I think that using creating single template for every
> group is better solution.

I'm not sure if I understand what you mean. The above classes aren't 
implemented by myself, they are part of the Maya SDK and, of course, I'm 
not in the position to change that SDK.

- Matthias -