Thread: [pygccxml-development] recent changes to pygccxml\pyplusplus
Brought to you by:
mbaas,
roman_yakovenko
From: Roman Y. <rom...@gm...> - 2006-02-28 15:33:14
|
Hi. In order to create more user friendly API I had to make few changes to both projects. Here is a list of changes: Allen changes: =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D - Added verbose flag for additional output - pygccxml: parser (through parser config), project_reader, cache - Added timing information to output - Outputs gccxml command line Accepted, but in future we will replace print statemnet with logging module =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D - Cache refactoring Accepted =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D Detection of gccxml errors Not accepted, it does not work with my version of gccxml, may be we need to define with what version of gcc xml we are working? =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D - Decl Decorator Not accepted. I created declaration wrapper for every declaration. See pyplusplus\decl_wrappers, But decorators is much better name. Should we rename them? =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D - Additional information when finalize fails Not accepted. The reason is simple, I think that I don't have to provide this functionality on code creators at all. I will remove it. If you don't want to create a wrapper you will say it using standard way - customizing declarations. =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D - Extended multiple file writer Accepted as is. =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D Spelling fixes Accepted. Please fix as much as you can :-) =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D The latest versions of gccxml mark all classes and structs as artificial. Accepted =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D Added/extended comments in some places Accepted =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D My changes: pyplusplus: New classes that derives from declaration_t class. Every derived decl_wrapper has relevant properties for customizing code creators\module_creator. All code creators properties it is just an redirection to decl_wrapper properties. pygccxml: For every declaration class I am going to add *_matcher_t classes. This will help user to find declaration by type using some predefined criterias. Also we should add to pygccxml all code that Allen wrote in filter.py modul= e. But it should be changed a little to use pygccxml coding convention + instead of using flags, I think, it should use isinstance function. I also added module_builder module. Please take a look on it. Thanks -- Roman Yakovenko C++ Python language binding http://www.language-binding.net/ |
From: Allen B. <al...@vr...> - 2006-02-28 21:14:19
|
Roman Yakovenko wrote: >I send this mail to mailing list, but forgot to send it to you. > >Please, subscribe to the mailing list and replay to mailing list > >Thank you > >On 2/28/06, Roman Yakovenko <rom...@gm...> wrote: > > >>Hi. In order to create more user friendly API I had to make few >>changes to both projects. >>Here is a list of changes: >> >>Allen changes: >>============= >>- Added verbose flag for additional output >> - pygccxml: parser (through parser config), project_reader, >> cache >> - Added timing information to output >> - Outputs gccxml command line >> >>Accepted, but in future we will replace print statemnet with logging module >> >> Sounds good. >>============= >>- Cache refactoring >> >>Accepted >> >>============= >>Detection of gccxml errors >> >>Not accepted, it does not work with my version of gccxml, may be we >>need to define >>with what version of gcc xml we are working? >> >> okay. I will check at home. I believe I have gccxml from cvs 1 week old. I am running it on Linux (FC4). >>============= >>- Decl Decorator >> >>Not accepted. I created declaration wrapper for every declaration. See >>pyplusplus\decl_wrappers, But decorators is much better name. Should >>we rename them? >> >> This seems like a lot of code to accomplish something that really amounts to having a 4-5 flags per declaration. I can live with the implementation but it seems like overkill IMHO. >>============= >>- Additional information when finalize fails >> >>Not accepted. The reason is simple, I think that I don't have to >>provide this functionality >>on code creators at all. I will remove it. If you don't want to create >>a wrapper you will >>say it using standard way - customizing declarations. >> >> I don't understand what you are saying here. How will we finalize now? How will the user be warned if things can't be finalized? >>============= >>- Extended multiple file writer >> >>Accepted as is. >> >>============= >>Spelling fixes >> >>Accepted. Please fix as much as you can :-) >> >>============= >>The latest versions of gccxml mark all classes and structs as artificial. >> >>Accepted >>============= >>Added/extended comments in some places >> >>Accepted >>============= >> >> >>My changes: >> >>pyplusplus: >> >>New classes that derives from declaration_t class. >>Every derived decl_wrapper has relevant properties for customizing code >>creators\module_creator. >> >>All code creators properties it is just an redirection to decl_wrapper >>properties. >> >>pygccxml: >>For every declaration class I am going to add *_matcher_t classes. >>This will help >>user to find declaration by type using some predefined criterias. >> >> Can you give some example of this? -Allen >>Also we should add to pygccxml all code that Allen wrote in filter.py module. >>But it should be changed a little to use pygccxml coding convention + >>instead of using >>flags, I think, it should use isinstance function. >> >>I also added module_builder module. Please take a look on it. >> >>Thanks >> >> >>-- >>Roman Yakovenko >>C++ Python language binding >>http://www.language-binding.net/ >> >> >> > > >-- >Roman Yakovenko >C++ Python language binding >http://www.language-binding.net/ > > > |
From: Roman Y. <rom...@gm...> - 2006-03-02 07:05:17
|
On 2/28/06, Allen Bierbaum <al...@vr...> wrote: > >>Not accepted. I created declaration wrapper for every declaration. See > >>pyplusplus\decl_wrappers, But decorators is much better name. Should > >>we rename them? > >> > >> > This seems like a lot of code to accomplish something that really > amounts to having a 4-5 flags per declaration. I can live with the > implementation but it seems like overkill IMHO. I explain my self in an other post. > I don't understand what you are saying here. How will we finalize now? > How will the user be warned if things can't be finalized? :-). I am planning to remove finalize from pyplusplus.code_creators. If you don't want pyplusplus create wrapper for some declaration - you will= use that declaration to say this to pyplusplus: decl.finalize() I think this is better. What do you think? Also from finalize method I have access -- Roman Yakovenko C++ Python language binding http://www.language-binding.net/ |
From: Matthias B. <ba...@ir...> - 2006-03-01 14:01:33
|
Allen Bierbaum wrote: >>> Accepted, but in future we will replace print statemnet with logging >>> module >>> > Sounds good. Sounds good to me as well. >>> - Decl Decorator >>> >>> Not accepted. I created declaration wrapper for every declaration. See >>> pyplusplus\decl_wrappers, But decorators is much better name. Should >>> we rename them? >>> > This seems like a lot of code to accomplish something that really > amounts to having a 4-5 flags per declaration. I can live with the > implementation but it seems like overkill IMHO. I agree with Allen, this looks fairly complex. Is this supposed to be the standardized low level API that Allen's version should use internally or is this already supposed to be the high level API replacing our previous versions? >>> New classes that derives from declaration_t class. >>> Every derived decl_wrapper has relevant properties for customizing code >>> creators\module_creator. >>> >>> All code creators properties it is just an redirection to decl_wrapper >>> properties. The latest changes confused me quite a bit. It seems pyplusplus currently is under heavy restructuring (source code layout and already existing interfaces) which isn't easy to follow for me and which currently breaks my "driver" script for creating my bindings (that's what I meant the other day on the c++-sig list when I was against exposing the entire internal API as public API because the chances a future pyplusplus version will break a user's script are much higher then. And here we are... ;) And while I'm all for consistent coding guidelines I currently find it quite confusing to have several different classes with the same name (I spotted at least three different classes called "class_t"). For me, this doesn't really help understanding the code. >>> pygccxml: >>> For every declaration class I am going to add *_matcher_t classes. >>> This will help >>> user to find declaration by type using some predefined criterias. >>> > Can you give some example of this? Yes, what will the interface look like and what do those matchers return? (Maybe it would be a good idea to add a fictitious FAQ to the wiki that contains some tasks and how this should be done with an "ideal" API. This could then serve as a "checklist" to test new API proposals against) By the way, I've started adding some stuff to the discussion page in the wiki. As I've never used a wiki for this kind of collaborative software developement yet, are there any guidelines or rules about how the issues should be layed out or how and what information is to be added? I also still have to reply on some stuff in the c++-sig list, I just wondered if I should add those replies right to the wiki or do so by mail (and I need some more time to do so. Replying to all the stuff and getting the current stuff to work has become rather time-consuming lately...) - Matthias - |
From: Roman Y. <rom...@gm...> - 2006-03-02 06:48:21
|
On 3/1/06, Matthias Baas <ba...@ir...> wrote: > Allen Bierbaum wrote: > >>> Accepted, but in future we will replace print statemnet with logging > >>> module > >>> > > Sounds good. > > Sounds good to me as well. Done, will check in shortly. > >>> - Decl Decorator > >>> > >>> Not accepted. I created declaration wrapper for every declaration. Se= e > >>> pyplusplus\decl_wrappers, But decorators is much better name. Should > >>> we rename them? > >>> > > This seems like a lot of code to accomplish something that really > > amounts to having a 4-5 flags per declaration. I can live with the > > implementation but it seems like overkill IMHO. > I think that I did mistake - I did not explain what fundamental changes I am going to do/did. I am sorry. ( I did not have internet connection and did not want to sit doing nothing. ) > I agree with Allen, this looks fairly complex. Is this supposed to be > the standardized low level API that Allen's version should use > internally or is this already supposed to be the high level API > replacing our previous versions? Those decl wrappers should be used as high level interface for user to conf= igure code creators. It does not really make the different how it implemented ins= ide. In our versions it is possible to say mb =3D module_builder_t(...) mb.namespace( ... ).ignore() module_builder_t I created it is just proof of concept: I wanted to see that what you ( you and Allen ), could be implemented and that it really simplifies user interface. Also as you can see I did not implemented multi decl wrapper - I don't know how to implement it. In short I want you and Allen concentrate your efforts on user interface: class module_builder_t and multi decl wrapper and I will serve you as progr= ammer who provide you all functionality you need. Do you agree? > I agree with Allen, this looks fairly complex. But here are the list of futures that this will allow us to have: 1. Analysis. Using new model I can add code that will analyze declaration f= or: is exportable and if not why is finalizable and if not why 2. Better interface - it is easy for user to see what he can customize. 3. It allowed me, not to introduce new concept - declaration type flags, but rather to use isinstance of Python. > >>> New classes that derives from declaration_t class. > >>> Every derived decl_wrapper has relevant properties for customizing co= de > >>> creators\module_creator. > >>> > >>> All code creators properties it is just an redirection to decl_wrappe= r > >>> properties. > > The latest changes confused me quite a bit. It seems pyplusplus > currently is under heavy restructuring (source code layout and already > existing interfaces) which isn't easy to follow for me and which > currently breaks my "driver" script for creating my bindings (that's > what I meant the other day on the c++-sig list when I was against > exposing the entire internal API as public API because the chances a > future pyplusplus version will break a user's script are much higher > then. And here we are... ;) :-(. I tested my changes. They should work as is without any changes from user code. It seems that I introduce a bug. Also I would like to explain them: We decided on next concepts, correct me if I wrong: declarations tree is used to configure code creators code creators tree will be used to configure file writers That is exactly, what I did. In order to introduce, as minimum changes as I= can, I just redirected code_creators properties to declaration properties. > And while I'm all for consistent coding guidelines I currently find it > quite confusing to have several different classes with the same name (I > spotted at least three different classes called "class_t"). For me, this > doesn't really help understanding the code. I know, this should be fixed. Can you propose naming scheme for pyplusplus code creators and declarations wrappers/decorators ? Thanks > >>> pygccxml: > >>> For every declaration class I am going to add *_matcher_t classes. > >>> This will help > >>> user to find declaration by type using some predefined criterias. > >>> > > Can you give some example of this? > > Yes, what will the interface look like and what do those matchers return? > > (Maybe it would be a good idea to add a fictitious FAQ to the wiki that > contains some tasks and how this should be done with an "ideal" API. > This could then serve as a "checklist" to test new API proposals against) Here what I thought: Finding declaration by some criteria is very useful functionality for pygcc= xml. I adopted Allen idea, also I changed few implementation details. All code that shows the usage of *matcher could be found in relevant unit testers. ( variable_matcher_tester.py, namespace_matcher_tester.py, filters_tester.p= y, calldef_tester.py ) Example: criteria =3D declarations.variable_matcher_t( name=3D'x', type=3D'unsigned int' ) x =3D declarations.matcher.get_single( criteria, self.declarations = ) criteria =3D declarations.variable_matcher_t( name=3D'::bit_fields::fields_t::x' , type=3Ddeclarations.unsigned_int_t() , value=3Dx.value , header_dir=3Dos.path.dirname(x.location.file_name) , header_file=3Dx.location.file_name) Main differences: 1. Better user interface/implementation: declaration_matcher_t.__call__ is smart enough to find out whether user wants to match by full name or name only. type could be or string or cpptypes.type_t instance. As I already said, there is own place where user can go and find out what criteria he can use. Take a look on calldef_matcher_t class. The main difference in implementati= on is that I allow user to specify something like this: void f( int, double ); matcher =3D calldef_matcher_t( arg_types=3D[None, 'double'] ) This means that I want to select all function that have 2 arguments and their second type is double. An other change is in regular expression matcher. Here is the class: class regex_matcher_t: def __init__( self, regex, function ): self.regex =3D re.compile( regex ) self.function =3D function def __call__( self, decl ): text =3D self.function( decl ) return bool( self.regex.match( text ) ) If user want to match not only name but something else by regular expressio= n, he should not derive from the class, but rather to write function that takes as argument declaration and returns some text. I found this approach easier then starting to find out class hierarchy. > By the way, I've started adding some stuff to the discussion page in the > wiki. As I've never used a wiki for this kind of collaborative software > developement yet, are there any guidelines or rules about how the issues > should be layed out or how and what information is to be added? I think, we don't have to start with rules, but common sense. Rules will come later. > I also still have to reply on some stuff in the c++-sig list, I just > wondered if I should add those replies right to the wiki or do so by > mail (and I need some more time to do so. Replying to all the stuff and > getting the current stuff to work has become rather time-consuming > lately...) Matthias, I think you should stop to use latest CVS in production environme= nt, until we stabilize interface. That's said. > - Matthias - > -- Roman Yakovenko C++ Python language binding http://www.language-binding.net/ |
From: Matthias B. <ba...@ir...> - 2006-03-02 20:45:40
|
Roman Yakovenko wrote: > Those decl wrappers should be used as high level interface for user to configure > code creators. It does not really make the different how it implemented inside. > In our versions it is possible to say > > mb = module_builder_t(...) > mb.namespace( ... ).ignore() > > module_builder_t I created it is just proof of concept: I wanted to > see that what you > ( you and Allen ), could be implemented and that it really simplifies > user interface. Just out of curiosity, have there been any problems in getting our API versions to run? (as the above is just what Allen's version provides, just with different names) I noticed myself only later, that my version depends on a directory called "tmp" being in the current directory.... > In short I want you and Allen concentrate your efforts on user interface: > class module_builder_t and multi decl wrapper and I will serve you as programmer > who provide you all functionality you need. Do you agree? ok However, apart from that I was implementing a new cache class as mentioned in another mail. It's not yet finished because of that "dict vs list" bug in the handling of the "included_files" parameter. This first needs to be resolved before I can finish the class (so is source_reader_t.__parse_gccxml_created_file() supposed to return the files as a dictionary or as a list?) >> currently breaks my "driver" script for creating my bindings (that's >> what I meant the other day on the c++-sig list when I was against >> exposing the entire internal API as public API because the chances a >> future pyplusplus version will break a user's script are much higher >> then. And here we are... ;) > > :-(. I tested my changes. They should work as is without any changes > from user code. > It seems that I introduce a bug. It seems that for your tests you're using pyplusplus right from the source tree (instead of building/installing them using distutils), is this correct? You added some new sub packages to pyplusplus but forgot to add them to the setup script (decl_wrappers, module_builder). So anybody who's installing the package will miss those sub package and then, of course, pyplusplus will raise an exception right at import time (btw, obviously the cyclic thing I mentioned in the mailing list was not the source of the exception) And the things that broke my script were: 1. call_policies.py was moved to a different location (so my import and usage was failing). 2. The recursive flag was removed from the module_creator.creator_t() constructor which broke my instantiation of that class. I fixed those in my script so it wasn't a big deal, especially when considering that pyplusplus is still under devlopment (but it sort of confirmed my previous concerns... ;) But even after fixing all of the above, there still seems to be a genuine bug which still prevents me from generating the bindings. When executing my script, I get the following traceback: File "pyplusplus/module_creator/creator.py", line 54, in __init__ self.__decls = self._filter_decls( self._reorder_decls( self._prepare_decls( decls ) ) ) File "pyplusplus/module_creator/creator.py", line 68, in _prepare_decls decls = filter( lambda x: not x.ignore, decls ) File "pyplusplus/module_creator/creator.py", line 68, in <lambda> decls = filter( lambda x: not x.ignore, decls ) AttributeError: 'class_t' object has no attribute 'ignore' > Also I would like to explain them: > We decided on next concepts, correct me if I wrong: > > declarations tree is used to configure code creators Exactly. > code creators tree will be used to configure file writers Probably. Sort of. I'm not sure. I haven't put that much thought into this area yet as I have a configuration that actually works for now. It's not ideal, but as it does work, improving it is only of second priority for me right now. >> And while I'm all for consistent coding guidelines I currently find it >> quite confusing to have several different classes with the same name (I >> spotted at least three different classes called "class_t"). For me, this >> doesn't really help understanding the code. > > I know, this should be fixed. Can you propose naming scheme for > pyplusplus code creators > and declarations wrappers/decorators ? Well, you could fill in some extra words (such as class_wrapper_t) but as Python already organizes code in a hierarchy I guess it would be a better idea to use decl_wrapper.class_t explicitly instead of importing the classes into the namespace of another module. Then it's also clear to the reader which class is being referred to. > [...] > > class regex_matcher_t: > def __init__( self, regex, function ): > self.regex = re.compile( regex ) > self.function = function > > def __call__( self, decl ): > text = self.function( decl ) > return bool( self.regex.match( text ) ) > > If user want to match not only name but something else by regular expression, > he should not derive from the class, but rather to write function that > takes as argument > declaration and returns some text. I found this approach easier then > starting to find out class hierarchy. I cannot comment on that right now, first I need to have a closer look on how to use that stuff... > Matthias, I think you should stop to use latest CVS in production environment, > until we stabilize interface. That's said. That's ok. I mean, I do want to be able to take advantage of the latest features, so it's ok that this might lead to some inconveniences first. And the sooner someone notices bugs, the sooner they can be fixed. :) - Matthias - |
From: Roman Y. <rom...@gm...> - 2006-03-05 06:35:36
|
On 3/2/06, Matthias Baas <ba...@ir...> wrote: > However, apart from that I was implementing a new cache class as > mentioned in another mail. It's not yet finished because of that "dict > vs list" bug in the handling of the "included_files" parameter. This > first needs to be resolved before I can finish the class (so is > source_reader_t.__parse_gccxml_created_file() supposed to return the > files as a dictionary or as a list?) May be I missed something, but what cache do you implement? Any way the answer: list, I already fixed this bug. > >> currently breaks my "driver" script for creating my bindings (that's > >> what I meant the other day on the c++-sig list when I was against > >> exposing the entire internal API as public API because the chances a > >> future pyplusplus version will break a user's script are much higher > >> then. And here we are... ;) > > > > :-(. I tested my changes. They should work as is without any changes > > from user code. > > It seems that I introduce a bug. > > It seems that for your tests you're using pyplusplus right from the > source tree (instead of building/installing them using distutils), is > this correct? You added some new sub packages to pyplusplus but forgot > to add them to the setup script (decl_wrappers, module_builder). So > anybody who's installing the package will miss those sub package and > then, of course, pyplusplus will raise an exception right at import time > (btw, obviously the cyclic thing I mentioned in the mailing list was not > the source of the exception) 1. I think I found and kill the bug. When I switched to decl_wrapper's classes I started to return wrong "alias"es, so this bug has been fixed 2. I updated setup file before release, when I have something like "feature freeze" period. I suppose, that every one who use CVS will be able to use it without setup. I could be wrong, but right now I prefer to concentrate my attention on something = else > And the things that broke my script were: > > 1. call_policies.py was moved to a different location (so my import and > usage was failing). I can not change the history, I learned the lesson. Next time we will need = to do some changes I hope you will be not surprised. Sorry. > 2. The recursive flag was removed from the module_creator.creator_t() > constructor which broke my instantiation of that class. :-( > I fixed those in my script so it wasn't a big deal, especially when > considering that pyplusplus is still under devlopment (but it sort of > confirmed my previous concerns... ;) Guilty :-) > But even after fixing all of the above, there still seems to be a > genuine bug which still prevents me from generating the bindings. When > executing my script, I get the following traceback: > > File "pyplusplus/module_creator/creator.py", line 54, in __init__ > self.__decls =3D self._filter_decls( self._reorder_decls( > self._prepare_decls( decls ) ) ) > File "pyplusplus/module_creator/creator.py", line 68, in _prepare_decl= s > decls =3D filter( lambda x: not x.ignore, decls ) > File "pyplusplus/module_creator/creator.py", line 68, in <lambda> > decls =3D filter( lambda x: not x.ignore, decls ) > AttributeError: 'class_t' object has no attribute 'ignore' I hope that until end of this week we will have some stable interface. At least I will try to work hard to do it. > > Also I would like to explain them: > > We decided on next concepts, correct me if I wrong: > > > > declarations tree is used to configure code creators > > Exactly. This change had broken your scripts. > >> And while I'm all for consistent coding guidelines I currently find it > >> quite confusing to have several different classes with the same name (= I > >> spotted at least three different classes called "class_t"). For me, th= is > >> doesn't really help understanding the code. > > > > I know, this should be fixed. Can you propose naming scheme for > > pyplusplus code creators > > and declarations wrappers/decorators ? > > Well, you could fill in some extra words (such as class_wrapper_t) but > as Python already organizes code in a hierarchy I guess it would be a > better idea to use decl_wrapper.class_t explicitly instead of importing > the classes into the namespace of another module. Then it's also clear > to the reader which class is being referred to. So, basically you would like to stay with name I proposed, but user is forced to use "fully qualified" names, am I right? > > > Matthias, I think you should stop to use latest CVS in production envir= onment, > > until we stabilize interface. That's said. > > That's ok. I mean, I do want to be able to take advantage of the latest > features, so it's ok that this might lead to some inconveniences first. > And the sooner someone notices bugs, the sooner they can be fixed. :) > > - Matthias - > -- Roman Yakovenko C++ Python language binding http://www.language-binding.net/ |
From: Matthias B. <ba...@ir...> - 2006-03-05 18:41:53
|
Roman Yakovenko wrote: >> first needs to be resolved before I can finish the class (so is >> source_reader_t.__parse_gccxml_created_file() supposed to return the >> files as a dictionary or as a list?) > > May be I missed something, but what cache do you implement? > Any way the answer: list, I already fixed this bug. I've implemented a new cache class because I had some issues with the file_cache_t class: - The cache file is about 39MB and on a machine with 512MB main memory I couldn't use that cache anymore. The machine started memory thrashing while parsing the headers (when the cache was already there) and CPU usage dropped to around 1%. - When I was creating wrappers for only a few selected classes using the cache took much more time than without cache (because the cache always loads all cached declarations, no matter if they are required or not). That's why I had a look at the caching mechanism and implemented my own class that fixed the above issues. Instead of one single cache file the cache now uses a directory and stores individual files (one file per header). Here is a comparison of the time it takes to do the parse() step for 222 headers from the Maya SDK (the table is best viewed with a fixed pitch font): | Parsing time | Cache size | Parameters | (min:sec) | (MB) | ----------------------+--------------+------------+-------------- Without cache | 2:53 | - | | | | File cache (initial) | 4:12 | 39.1 | File cache (cached) | 1:58 | 39.1 | | | | Dir cache (initial) | 3:40 | 38.4 | -compression Dir cache (cached) | 0:34 | 38.4 | -compression Dir cache (initial) | 4:03 | 11.8 | +compression Dir cache (cached) | 2:18 | 11.8 | +compression ----------------------+--------------+------------+-------------- The "initial" rows refer to the cases when the cache didn't exist yet and had to be build. But of course, this only has to be done once, so the "cached" rows are the more important ones. The directory cache has an option to compress the cache files which was used in the last two rows (so in my case, compression isn't really useful for me). Memory usage of the directory cache is much lower, so I could also use that cache on the machine with "only" 512MB main memory. There's also no disadvantage anymore when only a few headers are parsed while the cache actually contains a lot more headers. Cached declarations that are not requested by the main program are never touched. Roman, is it ok when I commit that cache into the pygccxml directory? The implementation consists of one single file "directory_cache.py" which I would put into the pygccxml.parser directory. The new class is called "directory_cache_t" (because the user has to specify a directory name instead of a file name). There are also a few internal helper classes, but those aren't meant to be instantiated by the user. I was using your naming conventions as far as I could figure them out (lower case class/method names with underscores between words and the classes have a "_t" suffix. Private methods have a leading underscore). Doc strings are available. I haven't modified any other file, so everything will still work as before (and the file cache is still the default). To activate the directory cache the user currently has to instantiate a class himself and pass this instance to the parse() method. I can also email you the file first if you want to have a look at it before it is actually added to the repository. There's still one more question I have regarding the source_reader_t.__parse_gccxml_created_file() method. What is the exact meaning of the returned file list? The update() method of a cache class receives this list as "included_files" argument, so one might think the list only contains the files that were included from the corresponding header file. But I noticed that the list also contains the header file itself. Is this intentional (and can I rely on that behavior) or is this a bug and the header file does not belong into this list? > 2. I updated setup file before release, when I have something like > "feature freeze" period. > I suppose, that every one who use CVS will be able to use it > without setup. I could be > wrong, but right now I prefer to concentrate my attention on something else Well, it's only two lines that need to be added to setup_pyplusplus.py: , 'pyplusplus.decl_wrappers' , 'pyplusplus.module_builder' If you want I can commit that change back into the repository myself. >> Well, you could fill in some extra words (such as class_wrapper_t) but >> as Python already organizes code in a hierarchy I guess it would be a >> better idea to use decl_wrapper.class_t explicitly instead of importing >> the classes into the namespace of another module. Then it's also clear >> to the reader which class is being referred to. > > So, basically you would like to stay with name I proposed, but user > is forced to use "fully qualified" names, am I right? No. I don't want the user having to deal with all those classes anyway, so the user should only need *one* such class (of each kind) which he might import into his namespace if he wishes to. My suggestion to use the fully qualified names only refers to the internal implementation (of course, this is entirely up to you what conventions you want to use as you're the maintainer of the package. I was just thinking that it might prevent confusion among those people who also want to have a look at the sources of pyplusplus... :) >> In the Maya SDK there are a couple of related classes that basically >> have the same interface (e.g. vector, float vector, point, float point, >> color, then the same thing for array versions etc). When decorating >> those classes I can treat every class of such a group the same and apply >> the same operations. This is where being able to select stuff from >> several classes at once can be quite handy. > > May I give you a small advice? You can combine between power of pyplusplus and > power of C++. I think that using creating single template for every > group is better solution. I'm not sure if I understand what you mean. The above classes aren't implemented by myself, they are part of the Maya SDK and, of course, I'm not in the position to change that SDK. - Matthias - |
From: Allen B. <al...@vr...> - 2006-03-05 23:05:47
|
Matthias: I really like anything that can help increase performance. :) I don't know if this would help you or not, but one thing I found that helped my cache performance greatly (and was on of the reasons I refactored the code to use md5 signatures) was that I could create a temporary header file that included all the headers I wanted to parse. Then I included that file instead of using a full "project" of header files. This made it so gccxml only ran once and it removed all the redundancy of seeing the same decls from multiple included headers. This was able to help me take a parse that was around 2 hours down to about 1.5 minutes. Overall a very good improvement in speed. :) -Allen Matthias Baas wrote: > Roman Yakovenko wrote: > >>> first needs to be resolved before I can finish the class (so is >>> source_reader_t.__parse_gccxml_created_file() supposed to return the >>> files as a dictionary or as a list?) >> >> >> May be I missed something, but what cache do you implement? >> Any way the answer: list, I already fixed this bug. > > > I've implemented a new cache class because I had some issues with the > file_cache_t class: > > - The cache file is about 39MB and on a machine with 512MB main memory > I couldn't use that cache anymore. The machine started memory > thrashing while parsing the headers (when the cache was already there) > and CPU usage dropped to around 1%. > > - When I was creating wrappers for only a few selected classes using > the cache took much more time than without cache (because the cache > always loads all cached declarations, no matter if they are required > or not). > > That's why I had a look at the caching mechanism and implemented my > own class that fixed the above issues. Instead of one single cache > file the cache now uses a directory and stores individual files (one > file per header). Here is a comparison of the time it takes to do the > parse() step for 222 headers from the Maya SDK (the table is best > viewed with a fixed pitch font): > > | Parsing time | Cache size | Parameters > | (min:sec) | (MB) | > ----------------------+--------------+------------+-------------- > Without cache | 2:53 | - | > | | | > File cache (initial) | 4:12 | 39.1 | > File cache (cached) | 1:58 | 39.1 | > | | | > Dir cache (initial) | 3:40 | 38.4 | -compression > Dir cache (cached) | 0:34 | 38.4 | -compression > Dir cache (initial) | 4:03 | 11.8 | +compression > Dir cache (cached) | 2:18 | 11.8 | +compression > ----------------------+--------------+------------+-------------- > > The "initial" rows refer to the cases when the cache didn't exist yet > and had to be build. But of course, this only has to be done once, so > the "cached" rows are the more important ones. The directory cache has > an option to compress the cache files which was used in the last two > rows (so in my case, compression isn't really useful for me). > > Memory usage of the directory cache is much lower, so I could also use > that cache on the machine with "only" 512MB main memory. There's also > no disadvantage anymore when only a few headers are parsed while the > cache actually contains a lot more headers. Cached declarations that > are not requested by the main program are never touched. > > Roman, is it ok when I commit that cache into the pygccxml directory? > The implementation consists of one single file "directory_cache.py" > which I would put into the pygccxml.parser directory. The new class is > called "directory_cache_t" (because the user has to specify a > directory name instead of a file name). There are also a few internal > helper classes, but those aren't meant to be instantiated by the user. > I was using your naming conventions as far as I could figure them out > (lower case class/method names with underscores between words and the > classes have a "_t" suffix. Private methods have a leading > underscore). Doc strings are available. I haven't modified any other > file, so everything will still work as before (and the file cache is > still the default). To activate the directory cache the user currently > has to instantiate a class himself and pass this instance to the > parse() method. > I can also email you the file first if you want to have a look at it > before it is actually added to the repository. > > There's still one more question I have regarding the > source_reader_t.__parse_gccxml_created_file() method. What is the > exact meaning of the returned file list? The update() method of a > cache class receives this list as "included_files" argument, so one > might think the list only contains the files that were included from > the corresponding header file. But I noticed that the list also > contains the header file itself. Is this intentional (and can I rely > on that behavior) or is this a bug and the header file does not belong > into this list? > >> 2. I updated setup file before release, when I have something like >> "feature freeze" period. >> I suppose, that every one who use CVS will be able to use it >> without setup. I could be >> wrong, but right now I prefer to concentrate my attention on >> something else > > > Well, it's only two lines that need to be added to setup_pyplusplus.py: > > , 'pyplusplus.decl_wrappers' > , 'pyplusplus.module_builder' > > If you want I can commit that change back into the repository myself. > >>> Well, you could fill in some extra words (such as class_wrapper_t) but >>> as Python already organizes code in a hierarchy I guess it would be a >>> better idea to use decl_wrapper.class_t explicitly instead of importing >>> the classes into the namespace of another module. Then it's also clear >>> to the reader which class is being referred to. >> >> >> So, basically you would like to stay with name I proposed, but user >> is forced to use "fully qualified" names, am I right? > > > No. I don't want the user having to deal with all those classes > anyway, so the user should only need *one* such class (of each kind) > which he might import into his namespace if he wishes to. > My suggestion to use the fully qualified names only refers to the > internal implementation (of course, this is entirely up to you what > conventions you want to use as you're the maintainer of the package. I > was just thinking that it might prevent confusion among those people > who also want to have a look at the sources of pyplusplus... :) > >>> In the Maya SDK there are a couple of related classes that basically >>> have the same interface (e.g. vector, float vector, point, float point, >>> color, then the same thing for array versions etc). When decorating >>> those classes I can treat every class of such a group the same and >>> apply >>> the same operations. This is where being able to select stuff from >>> several classes at once can be quite handy. >> >> >> May I give you a small advice? You can combine between power of >> pyplusplus and >> power of C++. I think that using creating single template for every >> group is better solution. > > > I'm not sure if I understand what you mean. The above classes aren't > implemented by myself, they are part of the Maya SDK and, of course, > I'm not in the position to change that SDK. > > - Matthias - > |
From: Roman Y. <rom...@gm...> - 2006-03-06 04:43:18
|
On 3/6/06, Allen Bierbaum <al...@vr...> wrote: > Matthias: > > I really like anything that can help increase performance. :) I did not see the code, but my filling is that Matthias did pretty good wor= k. > I don't know if this would help you or not, but one thing I found that > helped my cache performance greatly (and was on of the reasons I > refactored the code to use md5 signatures) was that I could create a > temporary header file that included all the headers I wanted to parse. > Then I included that file instead of using a full "project" of header > files. This made it so gccxml only ran once and it removed all the > redundancy of seeing the same decls from multiple included headers. > This was able to help me take a parse that was around 2 hours down to > about 1.5 minutes. Overall a very good improvement in speed. :) My experience is a little bit different: I learn, that instead of using cache, it is better to leave gccxml created files and to parse them again. But now with all you improvements I should re-check my approach. > > -Allen -- Roman Yakovenko C++ Python language binding http://www.language-binding.net/ |
From: Matthias B. <ba...@ir...> - 2006-03-06 14:42:17
|
Allen Bierbaum wrote: > I don't know if this would help you or not, but one thing I found that > helped my cache performance greatly (and was on of the reasons I > refactored the code to use md5 signatures) was that I could create a > temporary header file that included all the headers I wanted to parse. > Then I included that file instead of using a full "project" of header > files. This made it so gccxml only ran once and it removed all the > redundancy of seeing the same decls from multiple included headers. > This was able to help me take a parse that was around 2 hours down to > about 1.5 minutes. Overall a very good improvement in speed. :) Wow, you're right! Using a single temporary header I get the following timings: Without cache: 12s File cache: 4s (cache size: 3.1MB) Dir cache: 2s (cache size: 2.4MB) 2 seconds for parsing 222 header files is absolutely ok with me. :) And now I finally understand what the compilation mode ALL_AT_ONCE as opposed to FILE_BY_FILE actually does (unfortunately, it doesn't use the cache, so it's still more efficient to use FILE_BY_FILE and create a temporary header file manually. This should be fixed eventually and once this is fixed, ALL_AT_ONCE should probably be default). Now my new cache class is not as useful anymore as I thought, but as it's still smaller and faster than the file cache I committed it anyway. - Matthias - |
From: Allen B. <al...@vr...> - 2006-03-06 15:02:25
|
Matthias Baas wrote: > Allen Bierbaum wrote: > >> I don't know if this would help you or not, but one thing I found >> that helped my cache performance greatly (and was on of the reasons I >> refactored the code to use md5 signatures) was that I could create a >> temporary header file that included all the headers I wanted to >> parse. Then I included that file instead of using a full "project" >> of header files. This made it so gccxml only ran once and it removed >> all the redundancy of seeing the same decls from multiple included >> headers. This was able to help me take a parse that was around 2 >> hours down to about 1.5 minutes. Overall a very good improvement in >> speed. :) > > > Wow, you're right! Using a single temporary header I get the following > timings: > > Without cache: 12s > File cache: 4s (cache size: 3.1MB) > Dir cache: 2s (cache size: 2.4MB) > > 2 seconds for parsing 222 header files is absolutely ok with me. :) > > And now I finally understand what the compilation mode ALL_AT_ONCE as > opposed to FILE_BY_FILE actually does (unfortunately, it doesn't use > the cache, so it's still more efficient to use FILE_BY_FILE and create > a temporary header file manually. This should be fixed eventually and > once this is fixed, ALL_AT_ONCE should probably be default). Interesting. I would like to hear more about that. I was never able to figure out what ALL_AT_ONCE did let alone get it to work. I have been creating my own temporary file in the wrapper generation script. Please tell me more. :) -A |
From: Matthias B. <ba...@ir...> - 2006-03-06 15:21:09
|
Allen Bierbaum wrote: > Interesting. I would like to hear more about that. I was never able to > figure out what ALL_AT_ONCE did let alone get it to work. I have been > creating my own temporary file in the wrapper generation script. Please > tell me more. :) The parser() function has an option "compilation_mode" which can be set to pygccxml.parser.COMPILATION_MODE.ALL_AT_ONCE (default is FILE_BY_FILE): decls = pygccxml.parser.parse( ...<your options here>...., compilation_mode = pygccxml.parser.COMPILATION_MODE.ALL_AT_ONCE ) With this option pyplusplus generates a temporary header (it seems this is even done in memory so no temporary file is used, but I haven't inspected the sources that closely, so I could be wrong) and invokes gccxml only once. So it would relieve you from creating the temporary header yourself. The drawback is that this compilation mode totally ignores the cache. :( - Matthias - |
From: Matthias B. <ba...@ir...> - 2006-03-07 16:39:13
|
Roman Yakovenko wrote: >> With this option pyplusplus generates a temporary header (it seems this >> is even done in memory so no temporary file is used, > > You are wrong. parser.source_reader_t creates temporal file. oops, ok. >> but I haven't >> inspected the sources that closely, so I could be wrong) and invokes >> gccxml only once. So it would relieve you from creating the temporary >> header yourself. The drawback is that this compilation mode totally >> ignores the cache. :( > > Your thoughts are welcome. The problem I did not solve here is next: > how I should build the key for cache dictionary? Until now every key we built > use full file name. If you can build an other key without temporal > file name, then we can > implement it. Well, instead of passing only one file name to the cache we could simply pass all relevant file names to the cache. It would then be up to the cache implementation to deal with it and create a proper key... > Also I don't think we should: 0.5 - 2 seconds do you > think it will make the difference to you? Well, it's 12s vs 2s for me, so it still gives a little bit of performance. But you're right, it's not top priority anymore and I am already at those 2s with the current version (by still using FILE_BY_FILE and creating a temporary header myself). I might get back to it once I'm able to create my bindings again and all the "top priority" issues have been dealt with. :) - Matthias - |
From: Roman Y. <rom...@gm...> - 2006-03-07 05:41:52
|
On 3/6/06, Allen Bierbaum <al...@vr...> wrote: > Interesting. I would like to hear more about that. I was never able to > figure out what ALL_AT_ONCE did let alone get it to work. I have been > creating my own temporary file in the wrapper generation script. Please > tell me more. :) I hope this will help: http://www.language-binding.net/pygccxml/pygccxml.html#id12 If not I will clarify all missing details > -A > > -- Roman Yakovenko C++ Python language binding http://www.language-binding.net/ |
From: Roman Y. <rom...@gm...> - 2006-03-06 04:37:54
|
On 3/5/06, Matthias Baas <ba...@ir...> wrote: > Roman, is it ok when I commit that cache into the pygccxml directory? > The implementation consists of one single file "directory_cache.py" > which I would put into the pygccxml.parser directory. The new class is > called "directory_cache_t" (because the user has to specify a directory > name instead of a file name). There are also a few internal helper > classes, but those aren't meant to be instantiated by the user. I was > using your naming conventions as far as I could figure them out (lower > case class/method names with underscores between words and the classes > have a "_t" suffix. Private methods have a leading underscore). Doc > strings are available. I haven't modified any other file, so everything > will still work as before (and the file cache is still the default). To > activate the directory cache the user currently has to instantiate a > class himself and pass this instance to the parse() method. > I can also email you the file first if you want to have a look at it > before it is actually added to the repository. Please commit your changes to repository. Don't forget to put license within the file. Also, I am sure that you tested your directory cache class, right? Can you create also unit test for it? If not I will do that. Can you save your numbers somewhere? I think we will need them for release notes. Thanks > There's still one more question I have regarding the > source_reader_t.__parse_gccxml_created_file() method. What is the exact > meaning of the returned file list? All files that have been parsed > The update() method of a cache class > receives this list as "included_files" argument, so one might think the > list only contains the files that were included from the corresponding > header file. But I noticed that the list also contains the header file > itself. Is this intentional (and can I rely on that behavior) or is this > a bug and the header file does not belong into this list? It is not intentional, but we can say that from now this is a protocol. > > 2. I updated setup file before release, when I have something like > > "feature freeze" period. > > I suppose, that every one who use CVS will be able to use it > > without setup. I could be > > wrong, but right now I prefer to concentrate my attention on someth= ing else > > Well, it's only two lines that need to be added to setup_pyplusplus.py: > > , 'pyplusplus.decl_wrappers' > , 'pyplusplus.module_builder' > > If you want I can commit that change back into the repository myself. Yes, go ahead please. > >> Well, you could fill in some extra words (such as class_wrapper_t) but > >> as Python already organizes code in a hierarchy I guess it would be a > >> better idea to use decl_wrapper.class_t explicitly instead of importin= g > >> the classes into the namespace of another module. Then it's also clear > >> to the reader which class is being referred to. > > > > So, basically you would like to stay with name I proposed, but user > > is forced to use "fully qualified" names, am I right? > > No. I don't want the user having to deal with all those classes anyway, > so the user should only need *one* such class (of each kind) which he > might import into his namespace if he wishes to. > My suggestion to use the fully qualified names only refers to the > internal implementation (of course, this is entirely up to you what > conventions you want to use as you're the maintainer of the package. > I > was just thinking that it might prevent confusion among those people who > also want to have a look at the sources of pyplusplus... :) So I don't understand. Within the code I always use fully qualified name - package.class . Even within package I use module.class. Do you think that this is a right approach? > > May I give you a small advice? You can combine between power of pyplusp= lus and > > power of C++. I think that using creating single template for every > > group is better solution. > > I'm not sure if I understand what you mean. The above classes aren't > implemented by myself, they are part of the Maya SDK and, of course, I'm > not in the position to change that SDK. Those classes have same interface, may be you can create boost python wrapp= ers using templates? > - Matthias - -- Roman Yakovenko C++ Python language binding http://www.language-binding.net/ |
From: Allen B. <al...@vr...> - 2006-03-06 15:27:16
|
Matthias Baas wrote: > Allen Bierbaum wrote: > >> Interesting. I would like to hear more about that. I was never able >> to figure out what ALL_AT_ONCE did let alone get it to work. I have >> been creating my own temporary file in the wrapper generation >> script. Please tell me more. :) > > > The parser() function has an option "compilation_mode" which can be > set to pygccxml.parser.COMPILATION_MODE.ALL_AT_ONCE (default is > FILE_BY_FILE): > > decls = pygccxml.parser.parse( > ...<your options here>...., > compilation_mode = pygccxml.parser.COMPILATION_MODE.ALL_AT_ONCE > ) > > With this option pyplusplus generates a temporary header (it seems > this is even done in memory so no temporary file is used, but I > haven't inspected the sources that closely, so I could be wrong) and > invokes gccxml only once. So it would relieve you from creating the > temporary header yourself. The drawback is that this compilation mode > totally ignores the cache. :( Ah. That is probably why I didn't end up using it. Using temporary files for headers with the md5-based cache is really fairly magical in some cases. (for example I use this to force template template instantiation in the module builder) :) -Allen > > > - Matthias - > > > ------------------------------------------------------- > This SF.Net email is sponsored by xPML, a groundbreaking scripting > language > that extends applications into web and mobile media. Attend the live > webcast > and join the prime developer group breaking into this new coding > territory! > http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642 > _______________________________________________ > pygccxml-development mailing list > pyg...@li... > https://lists.sourceforge.net/lists/listinfo/pygccxml-development > |
From: Roman Y. <rom...@gm...> - 2006-03-07 05:48:53
|
> Ah. That is probably why I didn't end up using it. Using temporary > files for headers with the md5-based cache is really fairly magical in > some cases. (for example I use this to force template template > instantiation in the module builder) :) Right, this is a main difference. In order to use ALL_AT_ONCE you have to p= ass as argument source files, that exists on disk, only. You can not freely mix text, gccxml generated file. > -Allen -- Roman Yakovenko C++ Python language binding http://www.language-binding.net/ |