Re: [pygccxml-development] Comments on module_builder cache functionality
Brought to you by:
mbaas,
roman_yakovenko
From: Allen B. <al...@vr...> - 2006-08-29 14:09:16
|
Allen Bierbaum wrote: >Roman Yakovenko wrote: > > > >>On 8/29/06, *Allen Bierbaum* <al...@vr... >><mailto:al...@vr...>> wrote: >> >> As requested, I have backed out these changes. The implementation is >> now available for people to use in goodies.goodies_perf_overrides.py. >> Just import that file and you will get the full module caching as well >> as the create_identifier override. >> >> >>I don't like the way you do it. There was good reason to remove it - >>critical bug. >>I did not see the code you added to goodies, but I assume it contains >>same bug. >> >>Bug description: >> >>Lets say you have 2 header files: >>implementation_details.h >> ... >> >>and >> >>to_be_exported.h: >> >> #include "implementation_details.h" >> .... >> >>Py++ code: >> >>mb = module_builder_t( "to_be_exported.h" ) >> >>The problem with Allen implementation is that you can change >>"implementation_details.h" >>file, but "cache" will remain valid. In this case module_builder_t >>should rebuild the cache, otherwise >>Py++ will generate wrong code. This is a very critical bug. pygccxml >>cache classes know >>to deal with this situation. >> >> > >Actually, if you change any text in implementation_details.h it will >rebuild the cache because the md5 hash will catch it. This is the same >with the entire list of header files. I have been using it and I can >say without a doubt this is how it works and it works exactly the same >as the pygccxml decl cache. > >As I said on IM though there are changes that both of them miss. > >The thing that both cache implementations miss is that if a file >included by to_be_exported.h changes, then the caches will be invalid >but they will not know it. The module cache misses this and the >existing pygccxml cache misses it. (if you doubt this, take a look at >the code. Although pygccxml passing a set of included files into the >cache update method those files are not used for the key signature. >Thus they are not used to validate the cache entry on load). > >I have seen this problem before in the pygccxml cache and there is >really no good way around it. The only way that pygccxml has to get a >full list of dependent header files (ie. build a recursive list of >includes) is to either a) run pygccxml on the file or b) add a scanner >that does this internally. Option a is out because it would make the >cache worthless since we would be doing the exact work we want to >cache. Option b doesn't exist in the code base and in my case I would >want the use to be optional because it would take significant time to >run. I haven't tested for sure but I would guess that in my case it >would have to find and scan well over 10,000 files if not more. > >What both of these bugs mean is that it is possible to make a change in >your code and have it not get picked up. That is definitely true and I >don't see a great way around it. > >One hybrid solution that may work is that we could add an optional >parameter to the module builder that is a list of "dependent" headers >that the user cares about checking. This list could be scanned and made >part of the signature. Still not perfect, but it would allow the user >to tailor their caching needs. > > > I just committed a change to the module builder override in goodies that implements this feature. I have started using it for my projects and it seems to work well so far. -Allen |