Thread: [pygccxml-development] Another performance tweak
Brought to you by:
mbaas,
roman_yakovenko
From: Allen B. <al...@vr...> - 2006-08-25 22:16:01
|
I just tracked down and fixed another major performance sink. I saw in the profiler output that the majority of the time from my run was spent in __eq__ in declaration.py (line 121) and __eq_calldef.py line 121. This is the code that compares two calldefs to see if they are equal. (note this was also where most of the calls to algorithm.declration_path were coming from). I was interested in tracking down where all these calls (over 6 million of them) were coming from so I added some code to the __eq__ method of calldef to keep track of all the ways it was called and store how many hits it gets (an example of the output is at the end of the e-mail). As a side note, I also counted the return value of true and false separately just for fun. I found that out of the over 6million times it was called, it returned True only 33 times and those only came from a call path starting with _join_declarations. Every other test was false every time, so there may be another optimization hiding in here to just not call this test. As it ended up I found that the vast majority of these calls cam from the member_functions method in scopedef.py. I traced through there and found that all the the __eq__ calls were coming from some nested calls to _find_out_member_access_type that were coming from access_type_matcher_t. I never did find out where access_type_matcher_t was coming from since I was just asking for all the members. Anyway, the way pygccxml works the decls don't actually know their access type. Only their parents do. So if you want to know a decl access type you have to ask the parent and then it loops over all of it's internal members for each access type until it find the one you are asking about. This meant that the member_functions method was at least O(N^2) and possibly O(N^3). So back to what I did to fix it. It seemed to me that for pygccxml the access type of a member should remain static through a single execution. So I added a caching mechanism to the find_out_member_access_type that just stores the access type with the member decl. Then the next time it is check we return it directly and skip looping over all the lists and calling __eq__ so many millions of times. In the end the number of __eq__ calls from 6,010,000 to 271,500. This took my build type from 344 seconds down to 116 seconds. So when you combine this change with the one from yesterday the generation process is now 7 times faster. Not bad for just modifying two methods. :) -Allen PS. You can see the PerformanceTuning page on the wiki for pointers to the tools I have been using. ---------- Example call chaining for __eq__: Eq: Called 238772 times and *always* returned false ------ [0, 238772]: [('gen_bindings.py', 722, '?'), ('gen_bindings.py', 673, 'main'), ('/home/allenb/python/lib/python/pyplusplus/module_builder/builder.py', 236, 'build_code_creator'), ('/home/allenb/python/lib/python/pyplusplus/module_creator/creator.py', 541, 'create'), ('/home/allenb/python/lib/python/pygccxml/declarations/algorithm.py', 268, 'apply_visitor'), ('/home/allenb/python/lib/python/pyplusplus/module_creator/creator.py', 704, 'visit_class'), ('/home/allenb/python/lib/python/pyplusplus/module_creator/creator.py', 348, '_is_wrapper_needed'), ('/home/allenb/python/lib/python/pyplusplus/module_creator/creator.py', 287, 'redefined_funcs'), ('/home/allenb/python/lib/python/pygccxml/declarations/scopedef.py', 473, 'member_functions'), ('/home/allenb/python/lib/python/pygccxml/declarations/scopedef.py', 326, '_find_multiple'), ('/home/allenb/python/lib/python/pygccxml/declarations/matcher.py', 49, 'find'), ('/home/allenb/python/lib/python/pygccxml/declarations/scopedef.py', 258, '<lambda>'), ('/home/allenb/python/lib/python/pygccxml/declarations/matchers.py', 83, '__call__'), ('/home/allenb/python/lib/python/pygccxml/declarations/matchers.py', 61, '__call__'), ('/home/allenb/python/lib/python/pygccxml/declarations/matchers.py', 478, '__call__'), ('/home/allenb/python/lib/python/pygccxml/declarations/class_declaration.py', 321, 'find_out_member_access_type'), ('/home/allenb/python/lib/python/pygccxml/declarations/calldef.py', 310, '__eq__'), ('/home/allenb/python/lib/python/pygccxml/declarations/calldef.py', 139, '__eq__')] |
From: Roman Y. <rom...@gm...> - 2006-08-26 05:09:37
|
On 8/26/06, Allen Bierbaum <al...@vr...> wrote: > I just tracked down and fixed another major performance sink. Allen, these are good news, but I explicitly asked you not to introduce new optimizations, untill we finish with previous one. > I saw in the profiler output that the majority of the time from my run > was spent in __eq__ in declaration.py (line 121) and __eq_calldef.py > line 121. This is the code that compares two calldefs to see if they > are equal. (note this was also where most of the calls to > algorithm.declration_path were coming from). > > I was interested in tracking down where all these calls (over 6 million > of them) were coming from so I added some code to the __eq__ method of > calldef to keep track of all the ways it was called and store how many > hits it gets (an example of the output is at the end of the e-mail). > > As a side note, I also counted the return value of true and false > separately just for fun. I found that out of the over 6million times it > was called, it returned True only 33 times and those only came from a > call path starting with _join_declarations. Every other test was false > every time, so there may be another optimization hiding in here to just > not call this test. > > As it ended up I found that the vast majority of these calls cam from > the member_functions method in scopedef.py. I traced through there and > found that all the the __eq__ calls were coming from some nested calls > to _find_out_member_access_type that were coming from > access_type_matcher_t. I never did find out where access_type_matcher_t > was coming from since I was just asking for all the members. > > Anyway, the way pygccxml works the decls don't actually know their > access type. Only their parents do. So if you want to know a decl > access type you have to ask the parent and then it loops over all of > it's internal members for each access type until it find the one you are > asking about. This meant that the member_functions method was at least > O(N^2) and possibly O(N^3). > > So back to what I did to fix it. It seemed to me that for pygccxml the > access type of a member should remain static through a single > execution. So I added a caching mechanism to the > find_out_member_access_type that just stores the access type with the > member decl. Then the next time it is check we return it directly and > skip looping over all the lists and calling __eq__ so many millions of > times. > > In the end the number of __eq__ calls from 6,010,000 to 271,500. This > took my build type from 344 seconds down to 116 seconds. > > So when you combine this change with the one from yesterday the > generation process is now 7 times faster. Not bad for just modifying > two methods. :) > > -Allen > > PS. You can see the PerformanceTuning page on the wiki for pointers to > the tools I have been using. > > > ---------- Example call chaining for __eq__: Eq: Called 238772 times > and *always* returned false ------ > [0, 238772]: [('gen_bindings.py', 722, '?'), ('gen_bindings.py', 673, > 'main'), > ('/home/allenb/python/lib/python/pyplusplus/module_builder/builder.py', > 236, 'build_code_creator'), > ('/home/allenb/python/lib/python/pyplusplus/module_creator/creator.py', > 541, 'create'), > ('/home/allenb/python/lib/python/pygccxml/declarations/algorithm.py', > 268, 'apply_visitor'), > ('/home/allenb/python/lib/python/pyplusplus/module_creator/creator.py', > 704, 'visit_class'), > ('/home/allenb/python/lib/python/pyplusplus/module_creator/creator.py', > 348, '_is_wrapper_needed'), > ('/home/allenb/python/lib/python/pyplusplus/module_creator/creator.py', > 287, 'redefined_funcs'), > ('/home/allenb/python/lib/python/pygccxml/declarations/scopedef.py', > 473, 'member_functions'), > ('/home/allenb/python/lib/python/pygccxml/declarations/scopedef.py', > 326, '_find_multiple'), > ('/home/allenb/python/lib/python/pygccxml/declarations/matcher.py', 49, > 'find'), > ('/home/allenb/python/lib/python/pygccxml/declarations/scopedef.py', > 258, '<lambda>'), > ('/home/allenb/python/lib/python/pygccxml/declarations/matchers.py', 83, > '__call__'), > ('/home/allenb/python/lib/python/pygccxml/declarations/matchers.py', 61, > '__call__'), > ('/home/allenb/python/lib/python/pygccxml/declarations/matchers.py', > 478, '__call__'), > ('/home/allenb/python/lib/python/pygccxml/declarations/class_declaration.py', > 321, 'find_out_member_access_type'), > ('/home/allenb/python/lib/python/pygccxml/declarations/calldef.py', 310, > '__eq__'), > ('/home/allenb/python/lib/python/pygccxml/declarations/calldef.py', 139, > '__eq__')] > > > ------------------------------------------------------------------------- > Using Tomcat but need to do more? Need to support web services, security? > Get stuff done quickly with pre-integrated technology to make your job easier > Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo > http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 > _______________________________________________ > pygccxml-development mailing list > pyg...@li... > https://lists.sourceforge.net/lists/listinfo/pygccxml-development > -- Roman Yakovenko C++ Python language binding http://www.language-binding.net/ |
From: Allen B. <al...@vr...> - 2006-08-26 12:02:57
|
Roman Yakovenko wrote: >On 8/26/06, Allen Bierbaum <al...@vr...> wrote: > > >>I just tracked down and fixed another major performance sink. >> >> > >Allen, these are good news, but I explicitly asked you not to introduce new >optimizations, untill we finish with previous one. > > Sorry about this one then. I can back out the revision if you like. See my previous e-mail, I don't know how to finish up the previous change because I don't understand the issue involved. I made sure the code passed the test suites, but that is about all I can do. I think you are going to have to look at it or come up with a test case that shows the problem to me. I will just keep any further performance increases in my own tree for now. If you are interested in looking into some of this I would point you at check_name. This method is now one of the most time consuming. I can't really understand what the implementation is doing but there may be room for improvement. It is also worth noting that when I turn on optimize_queries, the code runs slower. -Allen > > >>I saw in the profiler output that the majority of the time from my run >>was spent in __eq__ in declaration.py (line 121) and __eq_calldef.py >>line 121. This is the code that compares two calldefs to see if they >>are equal. (note this was also where most of the calls to >>algorithm.declration_path were coming from). >> >>I was interested in tracking down where all these calls (over 6 million >>of them) were coming from so I added some code to the __eq__ method of >>calldef to keep track of all the ways it was called and store how many >>hits it gets (an example of the output is at the end of the e-mail). >> >>As a side note, I also counted the return value of true and false >>separately just for fun. I found that out of the over 6million times it >>was called, it returned True only 33 times and those only came from a >>call path starting with _join_declarations. Every other test was false >>every time, so there may be another optimization hiding in here to just >>not call this test. >> >>As it ended up I found that the vast majority of these calls cam from >>the member_functions method in scopedef.py. I traced through there and >>found that all the the __eq__ calls were coming from some nested calls >>to _find_out_member_access_type that were coming from >>access_type_matcher_t. I never did find out where access_type_matcher_t >>was coming from since I was just asking for all the members. >> >>Anyway, the way pygccxml works the decls don't actually know their >>access type. Only their parents do. So if you want to know a decl >>access type you have to ask the parent and then it loops over all of >>it's internal members for each access type until it find the one you are >>asking about. This meant that the member_functions method was at least >>O(N^2) and possibly O(N^3). >> >>So back to what I did to fix it. It seemed to me that for pygccxml the >>access type of a member should remain static through a single >>execution. So I added a caching mechanism to the >>find_out_member_access_type that just stores the access type with the >>member decl. Then the next time it is check we return it directly and >>skip looping over all the lists and calling __eq__ so many millions of >>times. >> >>In the end the number of __eq__ calls from 6,010,000 to 271,500. This >>took my build type from 344 seconds down to 116 seconds. >> >>So when you combine this change with the one from yesterday the >>generation process is now 7 times faster. Not bad for just modifying >>two methods. :) >> >>-Allen >> >>PS. You can see the PerformanceTuning page on the wiki for pointers to >>the tools I have been using. >> >> >>---------- Example call chaining for __eq__: Eq: Called 238772 times >>and *always* returned false ------ >> [0, 238772]: [('gen_bindings.py', 722, '?'), ('gen_bindings.py', 673, >>'main'), >>('/home/allenb/python/lib/python/pyplusplus/module_builder/builder.py', >>236, 'build_code_creator'), >>('/home/allenb/python/lib/python/pyplusplus/module_creator/creator.py', >>541, 'create'), >>('/home/allenb/python/lib/python/pygccxml/declarations/algorithm.py', >>268, 'apply_visitor'), >>('/home/allenb/python/lib/python/pyplusplus/module_creator/creator.py', >>704, 'visit_class'), >>('/home/allenb/python/lib/python/pyplusplus/module_creator/creator.py', >>348, '_is_wrapper_needed'), >>('/home/allenb/python/lib/python/pyplusplus/module_creator/creator.py', >>287, 'redefined_funcs'), >>('/home/allenb/python/lib/python/pygccxml/declarations/scopedef.py', >>473, 'member_functions'), >>('/home/allenb/python/lib/python/pygccxml/declarations/scopedef.py', >>326, '_find_multiple'), >>('/home/allenb/python/lib/python/pygccxml/declarations/matcher.py', 49, >>'find'), >>('/home/allenb/python/lib/python/pygccxml/declarations/scopedef.py', >>258, '<lambda>'), >>('/home/allenb/python/lib/python/pygccxml/declarations/matchers.py', 83, >>'__call__'), >>('/home/allenb/python/lib/python/pygccxml/declarations/matchers.py', 61, >>'__call__'), >>('/home/allenb/python/lib/python/pygccxml/declarations/matchers.py', >>478, '__call__'), >>('/home/allenb/python/lib/python/pygccxml/declarations/class_declaration.py', >>321, 'find_out_member_access_type'), >>('/home/allenb/python/lib/python/pygccxml/declarations/calldef.py', 310, >>'__eq__'), >>('/home/allenb/python/lib/python/pygccxml/declarations/calldef.py', 139, >>'__eq__')] >> >> >>------------------------------------------------------------------------- >>Using Tomcat but need to do more? Need to support web services, security? >>Get stuff done quickly with pre-integrated technology to make your job easier >>Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo >>http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 >>_______________________________________________ >>pygccxml-development mailing list >>pyg...@li... >>https://lists.sourceforge.net/lists/listinfo/pygccxml-development >> >> >> > > > > |
From: Roman Y. <rom...@gm...> - 2006-08-26 12:12:44
|
On 8/26/06, Allen Bierbaum <al...@vr...> wrote: > Sorry about this one then. I can back out the revision if you like. No, no, no! You are doing grate job. I just want to create good unit tests for every case. It can be very difficult to find a bug if there is a cache functionality somewhere. > See my previous e-mail, I don't know how to finish up the previous > change because I don't understand the issue involved. I made sure the > code passed the test suites, but that is about all I can do. I think > you are going to have to look at it or come up with a test case that > shows the problem to me. I did not plan to work on this, but I think I will have some free time this evening, so I will work on both changes you introduced. > I will just keep any further performance increases in my own tree for now. Yes please. The better idea is to post them to the list. It will take me day or two to complete the task. > If you are interested in looking into some of this I would point you at > check_name. This method is now one of the most time consuming. I can't > really understand what the implementation is doing but there may be room > for improvement. Yes of course. Let me finish with previous 2 and we will fix most( all ) of them. > It is also worth noting that when I turn on optimize_queries, the code > runs slower. What do you mean? module_builder_t.__init__ optimize_queries is turned on by default. May be I am missing something? -- Roman Yakovenko C++ Python language binding http://www.language-binding.net/ |
From: Allen B. <al...@vr...> - 2006-08-26 12:22:46
|
Roman Yakovenko wrote: > On 8/26/06, Allen Bierbaum <al...@vr...> wrote: > >> Sorry about this one then. I can back out the revision if you like. > > > No, no, no! You are doing grate job. I just want to create good unit > tests > for every case. It can be very difficult to find a bug if there is a > cache functionality > somewhere. > Do you think there are cases that are not already covered by the current unit tests? >> See my previous e-mail, I don't know how to finish up the previous >> change because I don't understand the issue involved. I made sure the >> code passed the test suites, but that is about all I can do. I think >> you are going to have to look at it or come up with a test case that >> shows the problem to me. > > > I did not plan to work on this, but I think I will have some free time > this evening, > so I will work on both changes you introduced. > >> I will just keep any further performance increases in my own tree for >> now. > > > Yes please. The better idea is to post them to the list. It will take > me day or two > to complete the task. > >> If you are interested in looking into some of this I would point you at >> check_name. This method is now one of the most time consuming. I can't >> really understand what the implementation is doing but there may be room >> for improvement. > > > Yes of course. Let me finish with previous 2 and we will fix most( all > ) of them. > >> It is also worth noting that when I turn on optimize_queries, the code >> runs slower. > > > What do you mean? module_builder_t.__init__ optimize_queries is turned on > by default. May be I am missing something? I mean that a month or so ago I found that if I turned them off, my script would run faster. This is still true with the new optimizations. -Allen > > |
From: Roman Y. <rom...@gm...> - 2006-08-26 12:25:16
|
On 8/26/06, Allen Bierbaum <al...@vr...> wrote: > Do you think there are cases that are not already covered by the current > unit tests? Yes. > > What do you mean? module_builder_t.__init__ optimize_queries is turned on > > by default. May be I am missing something? > > I mean that a month or so ago I found that if I turned them off, my > script would run faster. This is still true with the new optimizations. Strange. We will take a look on this later, okay? -- Roman Yakovenko C++ Python language binding http://www.language-binding.net/ |
From: Allen B. <al...@vr...> - 2006-08-26 14:19:30
|
I just found some more tweaks. - Caching the results of type_traits.remove_alias improved performance by 20% - (danger) Having create_identifier just return the exact same string passed as full_name improved performance another 20%. This last one was interesting because what I found was that (at least in my project) create_identifier was called about 16,000 times and it always returned the same value passed in. This could be just because of how my project works (no namespace aliases that I know of) and is probably not portable. But, one very strange thing is why is this method being called for names like "boost::python::arg" and "boost::python::default_call_policies"? As far as I understand how py++ is working, these symbols are always going to be valid. So why waste the time calling an expensive method if we already know the outcome? -Allen Allen Bierbaum wrote: >I just tracked down and fixed another major performance sink. > >I saw in the profiler output that the majority of the time from my run >was spent in __eq__ in declaration.py (line 121) and __eq_calldef.py >line 121. This is the code that compares two calldefs to see if they >are equal. (note this was also where most of the calls to >algorithm.declration_path were coming from). > >I was interested in tracking down where all these calls (over 6 million >of them) were coming from so I added some code to the __eq__ method of >calldef to keep track of all the ways it was called and store how many >hits it gets (an example of the output is at the end of the e-mail). > >As a side note, I also counted the return value of true and false >separately just for fun. I found that out of the over 6million times it >was called, it returned True only 33 times and those only came from a >call path starting with _join_declarations. Every other test was false >every time, so there may be another optimization hiding in here to just >not call this test. > >As it ended up I found that the vast majority of these calls cam from >the member_functions method in scopedef.py. I traced through there and >found that all the the __eq__ calls were coming from some nested calls >to _find_out_member_access_type that were coming from >access_type_matcher_t. I never did find out where access_type_matcher_t >was coming from since I was just asking for all the members. > >Anyway, the way pygccxml works the decls don't actually know their >access type. Only their parents do. So if you want to know a decl >access type you have to ask the parent and then it loops over all of >it's internal members for each access type until it find the one you are >asking about. This meant that the member_functions method was at least >O(N^2) and possibly O(N^3). > >So back to what I did to fix it. It seemed to me that for pygccxml the >access type of a member should remain static through a single >execution. So I added a caching mechanism to the >find_out_member_access_type that just stores the access type with the >member decl. Then the next time it is check we return it directly and >skip looping over all the lists and calling __eq__ so many millions of >times. > >In the end the number of __eq__ calls from 6,010,000 to 271,500. This >took my build type from 344 seconds down to 116 seconds. > >So when you combine this change with the one from yesterday the >generation process is now 7 times faster. Not bad for just modifying >two methods. :) > >-Allen > >PS. You can see the PerformanceTuning page on the wiki for pointers to >the tools I have been using. > > >---------- Example call chaining for __eq__: Eq: Called 238772 times >and *always* returned false ------ > [0, 238772]: [('gen_bindings.py', 722, '?'), ('gen_bindings.py', 673, >'main'), >('/home/allenb/python/lib/python/pyplusplus/module_builder/builder.py', >236, 'build_code_creator'), >('/home/allenb/python/lib/python/pyplusplus/module_creator/creator.py', >541, 'create'), >('/home/allenb/python/lib/python/pygccxml/declarations/algorithm.py', >268, 'apply_visitor'), >('/home/allenb/python/lib/python/pyplusplus/module_creator/creator.py', >704, 'visit_class'), >('/home/allenb/python/lib/python/pyplusplus/module_creator/creator.py', >348, '_is_wrapper_needed'), >('/home/allenb/python/lib/python/pyplusplus/module_creator/creator.py', >287, 'redefined_funcs'), >('/home/allenb/python/lib/python/pygccxml/declarations/scopedef.py', >473, 'member_functions'), >('/home/allenb/python/lib/python/pygccxml/declarations/scopedef.py', >326, '_find_multiple'), >('/home/allenb/python/lib/python/pygccxml/declarations/matcher.py', 49, >'find'), >('/home/allenb/python/lib/python/pygccxml/declarations/scopedef.py', >258, '<lambda>'), >('/home/allenb/python/lib/python/pygccxml/declarations/matchers.py', 83, >'__call__'), >('/home/allenb/python/lib/python/pygccxml/declarations/matchers.py', 61, >'__call__'), >('/home/allenb/python/lib/python/pygccxml/declarations/matchers.py', >478, '__call__'), >('/home/allenb/python/lib/python/pygccxml/declarations/class_declaration.py', >321, 'find_out_member_access_type'), >('/home/allenb/python/lib/python/pygccxml/declarations/calldef.py', 310, >'__eq__'), >('/home/allenb/python/lib/python/pygccxml/declarations/calldef.py', 139, >'__eq__')] > > >------------------------------------------------------------------------- >Using Tomcat but need to do more? Need to support web services, security? >Get stuff done quickly with pre-integrated technology to make your job easier >Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo >http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 >_______________________________________________ >pygccxml-development mailing list >pyg...@li... >https://lists.sourceforge.net/lists/listinfo/pygccxml-development > > > |
From: Roman Y. <rom...@gm...> - 2006-08-26 17:55:35
|
On 8/26/06, Allen Bierbaum <al...@vr...> wrote: > I just found some more tweaks. Cool. I just committed a set of changes that will allow us to keep under control cache results of different algorithms. I still have to write unit tests. These changes allow you to continue grate work. Please take a look on them, what do you think? > - Caching the results of type_traits.remove_alias improved performance > by 20% Yes, this algorithms is used every where. Please don't commit this change, I'd like to think about it. I see next problem here: there are a lot of types, so the cache memory size could go out of control. Can you find out how many memory Py++ takes with\without the cache? > - (danger) Having create_identifier just return the exact same string > passed as full_name improved performance another 20%. > > This last one was interesting because what I found was that (at least in > my project) create_identifier was called about 16,000 times and it > always returned the same value passed in. This could be just because of > how my project works (no namespace aliases that I know of) and is > probably not portable. > > But, one very strange thing is why is this method being called for names > like "boost::python::arg" and "boost::python::default_call_policies"? > As far as I understand how py++ is working, these symbols are always > going to be valid. So why waste the time calling an expensive method if > we already know the outcome? :-) It seems that you missed the point. Py++ has few cool features. mb = module_builder_t( ... ) mb.build_code_creator(...) mb.code_creator.add_namespace_alias( "bpl", "::boost::python" ) or mb.code_creator.add_namespace_usage( "::boost::python" ) See what happens. The short version is that you can add to the code creators tree the namespace usage and aliase and Py++ will generate code that takes into account them. create_identfier function is responsible for creating such identifier. It is safe to add such optimization, but similar work should be done. -- Roman Yakovenko C++ Python language binding http://www.language-binding.net/ |
From: Allen B. <al...@vr...> - 2006-08-26 20:43:45
|
Roman Yakovenko wrote: > On 8/26/06, Allen Bierbaum <al...@vr...> wrote: > >> I just found some more tweaks. > > > Cool. I just committed a set of changes that will allow us to keep > under control > cache results of different algorithms. I still have to write unit > tests. These > changes allow you to continue grate work. Please take a look on them, > what > do you think? Not quite how I would have done it, but it looks like it should work. It may eliminate an extra attribute lookup I had to do so that could increase the performance even more. >> - Caching the results of type_traits.remove_alias improved performance >> by 20% > > > Yes, this algorithms is used every where. Please don't commit this > change, I'd like > to think about it. I see next problem here: there are a lot of types, > so the cache > memory size could go out of control. Can you find out how many memory > Py++ takes with\without the cache? I can try to take a look at this. I don't have an easy way to do it right now but I will try later. > >> - (danger) Having create_identifier just return the exact same string >> passed as full_name improved performance another 20%. >> >> This last one was interesting because what I found was that (at least in >> my project) create_identifier was called about 16,000 times and it >> always returned the same value passed in. This could be just because of >> how my project works (no namespace aliases that I know of) and is >> probably not portable. >> >> But, one very strange thing is why is this method being called for names >> like "boost::python::arg" and "boost::python::default_call_policies"? >> As far as I understand how py++ is working, these symbols are always >> going to be valid. So why waste the time calling an expensive method if >> we already know the outcome? > > > :-) It seems that you missed the point. Py++ has few cool features. > > mb = module_builder_t( ... ) > mb.build_code_creator(...) > mb.code_creator.add_namespace_alias( "bpl", "::boost::python" ) > or > mb.code_creator.add_namespace_usage( "::boost::python" ) > > See what happens. > > The short version is that you can add to the code creators tree the > namespace > usage and aliase and Py++ will generate code that takes into account > them. > > create_identfier function is responsible for creating such identifier. > > It is safe to add such optimization, but similar work should be done. > If you never call add_namespace_usage or add_namespace_alias, then will create_identifier ever need to do anything? Maybe we could make an optimization that keeps a global flag around and just skips the work in this method if you never set any namespace information. What do you think? -Allen |
From: Roman Y. <rom...@gm...> - 2006-08-27 19:19:35
|
On 8/27/06, Allen Bierbaum <al...@vr...> wrote: > I had trouble putting it into goodies as an override because it looks > like the method comes in through multiple module aliases. In other > words I haven't found a way to override all the of the uses of the method. Well, global variable will do the job :-(. Can you implement it? -- Roman Yakovenko C++ Python language binding http://www.language-binding.net/ |
From: Allen B. <al...@vr...> - 2006-08-27 19:40:09
|
Roman Yakovenko wrote: > On 8/27/06, Allen Bierbaum <al...@vr...> wrote: > >> I had trouble putting it into goodies as an override because it looks >> like the method comes in through multiple module aliases. In other >> words I haven't found a way to override all the of the uses of the >> method. > > > Well, global variable will do the job :-(. Can you implement it? > How would a global variable fix a symbol alias issue? -Allen |
From: Roman Y. <rom...@gm...> - 2006-08-27 19:43:24
|
On 8/27/06, Allen Bierbaum <al...@vr...> wrote: > Roman Yakovenko wrote: > > > On 8/27/06, Allen Bierbaum <al...@vr...> wrote: > > > >> I had trouble putting it into goodies as an override because it looks > >> like the method comes in through multiple module aliases. In other > >> words I haven't found a way to override all the of the uses of the > >> method. > > > > > > Well, global variable will do the job :-(. Can you implement it? > > > How would a global variable fix a symbol alias issue? def create_identifier_smart( ... ) ... def create_identifier_quick( ... ) ... CREATE_IDENTIFIER_USE_SMART = True def create_identifier( ... ): if CREATE_IDENTIFIER_USE_SMART: ... else: ... Or another solution is to add function in the same module as create_identifier def setup_create_identifier( ... ): replace the reference of create_identifier with another one -- Roman Yakovenko C++ Python language binding http://www.language-binding.net/ |
From: Allen B. <al...@vr...> - 2006-08-27 21:22:53
|
>>>Right now I am not quite ready to update to your changes. I have some >>>concerns about the complexity and performance of the way you implemented >>>this. >>> >>> >>Do you have the numbers? I did not read the benchmark results. I will >>do this later. >> >> > >I don't have numbers. I didn't want to update and deal with the >conflicts only to find I didn't want the update. With the changes you >have made now I think it will probably be safe for me to update. I will >let you know if something goes bad. > > I now have preliminary numbers. My build that used to take 58 seconds now takes 76 seconds with your caching changes. So it looks right now like the implementations I was using were about 25% faster for some reason. Any thoughts? -Allen > > >>You are right, I introduced one more "get attribute". I have 1 good >>reason >>for this maintainability. I'd like to keep all "cached" values in >>single place. >>Thus, I don't have to scan sources for "what I am caching" and more over >>it is very easy to create documentation to it. I am sure you will not >>see this >>"get attribute" in the benchmark >> >> >> >> >>>- Why is there an "algorithms_cache_t" object as the base class to >>>"declaration_algs_cache_t"? The split with a base class doesn't seem to >>>serve much of a purpose. >>> >>> >>Enable\disable functionality. But I think it is useless now. The >>better idea is to >>introduce type_algs_cache_t class with class variable , that will >>control all >>instances of type_algs_cache_t. >> >> >> >>>- What is the performance implication of using "properties"? I know you >>>like to use them in your code but how do they affect performance? >>> >>> >>I tried with and without them and I did not see the difference. May be >>you will >>see. If you do see the difference, we can leave set_* methods and remove >>property >> >> >> >>>- Handling of enabled flag. I think the handling of the enabled flag >>>should be done in the "set" method instead of the get method. As it >>>stands with your change, our optimized path will require two if tests >>>(one in the local method to test for None, and one in the get method to >>>test enabled). If we moved the enabled test to the set method we would >>>only pay the cost of that if when we have optimizations diabled. >>> >>> >>You are right. Fixed. >> >> >> >>>>I left some interesting problem to you: it is possible to optimize >>>>declaration_path >>>>algorithm even more. In most cases its complexity could be O(1). This >>>>could be done >>>>by saving intermediate results. Another way to say it: to re-use >>>>parent declaration path >>>>cache. >>>> >>>> >>What about this? >> >> > >I haven't looked at this at all. I am not sure I understand what is >going on here well enough to do it right and it isn't showing up high in >any of my traces anymore. So it really isn't a high priority for me. > >-Allen > > >------------------------------------------------------------------------- >Using Tomcat but need to do more? Need to support web services, security? >Get stuff done quickly with pre-integrated technology to make your job easier >Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo >http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 >_______________________________________________ >pygccxml-development mailing list >pyg...@li... >https://lists.sourceforge.net/lists/listinfo/pygccxml-development > > > |
From: Roman Y. <rom...@gm...> - 2006-08-28 04:51:28
|
On 8/28/06, Allen Bierbaum <al...@vr...> wrote: > I now have preliminary numbers. My build that used to take 58 seconds > now takes 76 seconds with your caching changes. So it looks right now > like the implementations I was using were about 25% faster for some reason. > > Any thoughts? I don't know why, but I think I prefer to pay this price. I prefer code, that I can maintain. Sorry. You already achieved x10 improvement. This is a grate result. Lets stop here. -- Roman Yakovenko C++ Python language binding http://www.language-binding.net/ |
From: Roman Y. <rom...@gm...> - 2006-08-27 07:08:39
|
On 8/26/06, Allen Bierbaum <al...@vr...> wrote: > >> - Caching the results of type_traits.remove_alias improved performance > >> by 20% > > > > > > Yes, this algorithms is used every where. Please don't commit this > > change, I'd like > > to think about it. I see next problem here: there are a lot of types, > > so the cache > > memory size could go out of control. Can you find out how many memory > > Py++ takes with\without the cache? > > I can try to take a look at this. I don't have an easy way to do it > right now but I will try later. I am think that I don't want to introduce cache of results of type_traits. I don't feel comfortable with them. One of the reasons is that it is not that easy to disable them. > If you never call add_namespace_usage or add_namespace_alias, then will > create_identifier ever need to do anything? Maybe we could make an > optimization that keeps a global flag around and just skips the work in > this method if you never set any namespace information. What do you think? I think that you can replace create_identifier with "identity" function and this solution will always work :-) def create_identifier_fast( creator, full_name ): return full_name And than replace create_identifier with the fast one from goodies. I committed small changes to "optimization" feature. New feature: it is possible to control cache in all project: for d in mb.decls(): d.cache.disable() It is not easy to achieve same goal with types cache. Would you mind to add documentation strings to the module? I left some interesting problem to you: it is possible to optimize declaration_path algorithm even more. In most cases its complexity could be O(1). This could be done by saving intermediate results. Another way to say it: to re-use parent declaration path cache. Can you publish top N lines of your benchmark result? -- Roman Yakovenko C++ Python language binding http://www.language-binding.net/ |
From: Allen B. <al...@vr...> - 2006-08-27 16:38:26
|
I have included comments below. Before that though... I have found a few more optimizations: - Making create_identifier return identity adds about 15% to performance - Adding a cache to module_builder that caches the global namespace tree after parsing, joining, and running the query optimizer. This boosts performance by another 40%. (both by saving more startup and making the query optimization initialization less costly so it can be used) So now I have my run time down to aroung 59 seconds. Although I would really like to make it go even faster I think taking the running time down from the original 12 minutes to 1 minute now is a good enough improvement to make it so I can use py++ much easier in normal development. Roman: Do you want a patch for module_builder caching or do you just want me to commit it? (all the changes are in the __init__ method and are very clear so you can see what is in there and change it if you prefer) Now on to the comments.... Roman Yakovenko wrote: > On 8/26/06, Allen Bierbaum <al...@vr...> wrote: > >> >> - Caching the results of type_traits.remove_alias improved >> performance >> >> by 20% >> > >> > >> > Yes, this algorithms is used every where. Please don't commit this >> > change, I'd like >> > to think about it. I see next problem here: there are a lot of types, >> > so the cache >> > memory size could go out of control. Can you find out how many memory >> > Py++ takes with\without the cache? >> >> I can try to take a look at this. I don't have an easy way to do it >> right now but I will try later. > > > I am think that I don't want to introduce cache of results of > type_traits. > I don't feel comfortable with them. One of the reasons is that it is > not that easy to > disable them. Maybe I am missing something, but why can't we control them by introducing a flag in type_traits like "caching_enabled" or something and then just testing that whenever the method is called. It would be a shame not to optimize this method when it gives a 20% performance boost. >> If you never call add_namespace_usage or add_namespace_alias, then will >> create_identifier ever need to do anything? Maybe we could make an >> optimization that keeps a global flag around and just skips the work in >> this method if you never set any namespace information. What do you >> think? > > > I think that you can replace create_identifier with "identity" > function and this solution > will always work :-) > > def create_identifier_fast( creator, full_name ): > return full_name > Are you suggesting I add this as a new method and change the code that calls it? In my own local copy I have replaced "created_identifier" with an identify function and am getting very good results. > And than replace create_identifier with the fast one from goodies. > > I committed small changes to "optimization" feature. New feature: it > is possible to > control cache in all project: > > for d in mb.decls(): > d.cache.disable() > > It is not easy to achieve same goal with types cache. > > Would you mind to add documentation strings to the module? Right now I am not quite ready to update to your changes. I have some concerns about the complexity and performance of the way you implemented this. - Why is there an "algorithms_cache_t" object as the base class to "declaration_algs_cache_t"? The split with a base class doesn't seem to serve much of a purpose. - What is the performance implication of using "properties"? I know you like to use them in your code but how do they affect performance? - Handling of enabled flag. I think the handling of the enabled flag should be done in the "set" method instead of the get method. As it stands with your change, our optimized path will require two if tests (one in the local method to test for None, and one in the get method to test enabled). If we moved the enabled test to the set method we would only pay the cost of that if when we have optimizations diabled. > I left some interesting problem to you: it is possible to optimize > declaration_path > algorithm even more. In most cases its complexity could be O(1). This > could be done > by saving intermediate results. Another way to say it: to re-use > parent declaration path > cache. > > Can you publish top N lines of your benchmark result? > Sure. Give me a bit to rerun it. I can probably just send you a compress hotspot file so you can see the entire details. -Allen |
From: Roman Y. <rom...@gm...> - 2006-08-27 18:57:59
|
On 8/27/06, Allen Bierbaum <al...@vr...> wrote: > I have included comments below. Before that though... > > I have found a few more optimizations: > > - Making create_identifier return identity adds about 15% to performance > - Adding a cache to module_builder that caches the global namespace tree > after parsing, joining, and running the query optimizer. This boosts > performance by another 40%. (both by saving more startup and making the > query optimization initialization less costly so it can be used) Can you explain relationship between it and declarations cache in pygccxml? > So now I have my run time down to aroung 59 seconds. Although I would > really like to make it go even faster I think taking the running time > down from the original 12 minutes to 1 minute now is a good enough > improvement to make it so I can use py++ much easier in normal development. :-) > Roman: Do you want a patch for module_builder caching or do you just > want me to commit it? (all the changes are in the __init__ method and > are very clear so you can see what is in there and change it if you prefer) Please commit them, but still explain the relationship :-) > Now on to the comments.... > > Roman Yakovenko wrote: > > > On 8/26/06, Allen Bierbaum <al...@vr...> wrote: > > > >> >> - Caching the results of type_traits.remove_alias improved > >> performance > >> >> by 20% > >> > > >> > > >> > Yes, this algorithms is used every where. Please don't commit this > >> > change, I'd like > >> > to think about it. I see next problem here: there are a lot of types, > >> > so the cache > >> > memory size could go out of control. Can you find out how many memory > >> > Py++ takes with\without the cache? > >> > >> I can try to take a look at this. I don't have an easy way to do it > >> right now but I will try later. > > > > > > I am think that I don't want to introduce cache of results of > > type_traits. > > I don't feel comfortable with them. One of the reasons is that it is > > not that easy to > > disable them. > > Maybe I am missing something, but why can't we control them by > introducing a flag in type_traits like "caching_enabled" or something > and then just testing that whenever the method is called. > > It would be a shame not to optimize this method when it gives a 20% > performance boost. You are right, I don't have good reason. I will commit the patch. By the way it was a tricky one. One of my test failed. It was very difficult ( few hours with debugger ) to find the problem. That is exactly the reason why I hate cache, but what can I do if you like it :-) ? > Are you suggesting I add this as a new method and change the code that > calls it? No. I just committed the results. > In my own local copy I have replaced "created_identifier" with an > identify function and am getting very good results. Why do you have goodies? For a time being please put it in a goodies. > Right now I am not quite ready to update to your changes. I have some > concerns about the complexity and performance of the way you implemented > this. Do you have the numbers? I did not read the benchmark results. I will do this later. You are right, I introduced one more "get attribute". I have 1 good reason for this maintainability. I'd like to keep all "cached" values in single place. Thus, I don't have to scan sources for "what I am caching" and more over it is very easy to create documentation to it. I am sure you will not see this "get attribute" in the benchmark > - Why is there an "algorithms_cache_t" object as the base class to > "declaration_algs_cache_t"? The split with a base class doesn't seem to > serve much of a purpose. Enable\disable functionality. But I think it is useless now. The better idea is to introduce type_algs_cache_t class with class variable , that will control all instances of type_algs_cache_t. > - What is the performance implication of using "properties"? I know you > like to use them in your code but how do they affect performance? I tried with and without them and I did not see the difference. May be you will see. If you do see the difference, we can leave set_* methods and remove property > - Handling of enabled flag. I think the handling of the enabled flag > should be done in the "set" method instead of the get method. As it > stands with your change, our optimized path will require two if tests > (one in the local method to test for None, and one in the get method to > test enabled). If we moved the enabled test to the set method we would > only pay the cost of that if when we have optimizations diabled. You are right. Fixed. > > > I left some interesting problem to you: it is possible to optimize > > declaration_path > > algorithm even more. In most cases its complexity could be O(1). This > > could be done > > by saving intermediate results. Another way to say it: to re-use > > parent declaration path > > cache. What about this? > > Can you publish top N lines of your benchmark result? > > > Sure. Give me a bit to rerun it. I can probably just send you a > compress hotspot file so you can see the entire details. Thanks, I got it and will take a look later, okay? -- Roman Yakovenko C++ Python language binding http://www.language-binding.net/ |
From: Allen B. <al...@vr...> - 2006-08-27 19:11:12
|
Roman Yakovenko wrote: > On 8/27/06, Allen Bierbaum <al...@vr...> wrote: > >> I have included comments below. Before that though... >> >> I have found a few more optimizations: >> >> - Making create_identifier return identity adds about 15% to performance >> - Adding a cache to module_builder that caches the global namespace tree >> after parsing, joining, and running the query optimizer. This boosts >> performance by another 40%. (both by saving more startup and making the >> query optimization initialization less costly so it can be used) > > > Can you explain relationship between it and declarations cache in > pygccxml? The module builder cache supercedes it. The module builder cache caches the decl tree after all loading, parsing in pygccxml and the additional processing in pyplusplus (optimization initialization). The decl cache can/is still used if you don't use the module builder cache, but when using the module builder cache the decl cache is not really needed. >> So now I have my run time down to aroung 59 seconds. Although I would >> really like to make it go even faster I think taking the running time >> down from the original 12 minutes to 1 minute now is a good enough >> improvement to make it so I can use py++ much easier in normal >> development. > > > :-) > >> Roman: Do you want a patch for module_builder caching or do you just >> want me to commit it? (all the changes are in the __init__ method and >> are very clear so you can see what is in there and change it if you >> prefer) > > > Please commit them, but still explain the relationship :-) Okay. I should have time to commit it later today. > >> Now on to the comments.... >> >> Roman Yakovenko wrote: >> >> > On 8/26/06, Allen Bierbaum <al...@vr...> wrote: >> > >> >> >> - Caching the results of type_traits.remove_alias improved >> >> performance >> >> >> by 20% >> >> > >> >> > >> >> > Yes, this algorithms is used every where. Please don't commit this >> >> > change, I'd like >> >> > to think about it. I see next problem here: there are a lot of >> types, >> >> > so the cache >> >> > memory size could go out of control. Can you find out how many >> memory >> >> > Py++ takes with\without the cache? >> >> >> >> I can try to take a look at this. I don't have an easy way to do it >> >> right now but I will try later. >> > >> > >> > I am think that I don't want to introduce cache of results of >> > type_traits. >> > I don't feel comfortable with them. One of the reasons is that it is >> > not that easy to >> > disable them. >> >> Maybe I am missing something, but why can't we control them by >> introducing a flag in type_traits like "caching_enabled" or something >> and then just testing that whenever the method is called. >> >> It would be a shame not to optimize this method when it gives a 20% >> performance boost. > > > You are right, I don't have good reason. I will commit the patch. By > the way it was > a tricky one. One of my test failed. It was very difficult ( few hours > with debugger ) > to find the problem. That is exactly the reason why I hate cache, but > what can I do > if you like it :-) ? > >> Are you suggesting I add this as a new method and change the code that >> calls it? > > > No. I just committed the results. > >> In my own local copy I have replaced "created_identifier" with an >> identify function and am getting very good results. > > > Why do you have goodies? For a time being please put it in a goodies. I had trouble putting it into goodies as an override because it looks like the method comes in through multiple module aliases. In other words I haven't found a way to override all the of the uses of the method. I have run into a problem like this on another project and it is really tough to track down. Basically it probably means there are multiple places where the module with create_identifier are being imported and the imports are using slightly different names to import the module. So sys.modules[] ends up with multiple entries that are both pointing to different imports of the same code. I think future versions of python are going to fix this problem by making you have to use the full package.module.name import no matter what file you are in, but for now this is a hard issue to track down. >> Right now I am not quite ready to update to your changes. I have some >> concerns about the complexity and performance of the way you implemented >> this. > > > Do you have the numbers? I did not read the benchmark results. I will > do this later. I don't have numbers. I didn't want to update and deal with the conflicts only to find I didn't want the update. With the changes you have made now I think it will probably be safe for me to update. I will let you know if something goes bad. > You are right, I introduced one more "get attribute". I have 1 good > reason > for this maintainability. I'd like to keep all "cached" values in > single place. > Thus, I don't have to scan sources for "what I am caching" and more over > it is very easy to create documentation to it. I am sure you will not > see this > "get attribute" in the benchmark > > >> - Why is there an "algorithms_cache_t" object as the base class to >> "declaration_algs_cache_t"? The split with a base class doesn't seem to >> serve much of a purpose. > > > Enable\disable functionality. But I think it is useless now. The > better idea is to > introduce type_algs_cache_t class with class variable , that will > control all > instances of type_algs_cache_t. > >> - What is the performance implication of using "properties"? I know you >> like to use them in your code but how do they affect performance? > > > I tried with and without them and I did not see the difference. May be > you will > see. If you do see the difference, we can leave set_* methods and remove > property > >> - Handling of enabled flag. I think the handling of the enabled flag >> should be done in the "set" method instead of the get method. As it >> stands with your change, our optimized path will require two if tests >> (one in the local method to test for None, and one in the get method to >> test enabled). If we moved the enabled test to the set method we would >> only pay the cost of that if when we have optimizations diabled. > > > You are right. Fixed. > >> >> > I left some interesting problem to you: it is possible to optimize >> > declaration_path >> > algorithm even more. In most cases its complexity could be O(1). This >> > could be done >> > by saving intermediate results. Another way to say it: to re-use >> > parent declaration path >> > cache. > > > What about this? I haven't looked at this at all. I am not sure I understand what is going on here well enough to do it right and it isn't showing up high in any of my traces anymore. So it really isn't a high priority for me. -Allen |
From: Allen B. <al...@vr...> - 2006-08-28 14:55:20
|
Roman Yakovenko wrote: > On 8/28/06, Allen Bierbaum <al...@vr...> wrote: > >> Roman Yakovenko wrote: >> >> > On 8/28/06, Allen Bierbaum <al...@vr...> wrote: >> > >> >> Roman Yakovenko wrote: >> >> >> >> > On 8/28/06, Allen Bierbaum <al...@vr...> wrote: >> >> > >> >> >> I now have preliminary numbers. My build that used to take 58 >> seconds >> >> >> now takes 76 seconds with your caching changes. So it looks >> right now >> >> >> like the implementations I was using were about 25% faster for >> some >> >> >> reason. >> >> >> >> >> >> Any thoughts? >> >> > >> >> > >> >> > I don't know why, but I think I prefer to pay this price. I prefer >> >> > code, that I can maintain. Sorry. >> >> >> >> It is probably no surprise that I disagree. The extra 25% >> performance >> >> seems like a good thing to me. >> > >> > >> > :-). It is not 25%. The original time was 12 minutes. Your >> > optimization brought it to >> > 1 minutes, than my changes brought it to 1.25 minutes. >> >> The 25% is that in my current version, my implementations take 58 >> seconds and with yours it takes 76 seconds. That is a 25% difference >> with my math at least. :) > > > I count from 12 minutes. Well that is just cheating then. :) > >> > >> >> Why were the extra layers of indirection needed in your >> implementation? >> >> >> >> In my implementation I just tried to keep everything local to the >> method >> >> I was optimizing. In that way I thought it was pretty maintainable >> >> because that method was the only place in the code that set or >> used the >> >> cache value. This still allowed for disabling caching by using a >> module >> >> level variable to prevent the cache from being set. >> >> >> >> What caused this local encapsulation to be less maintainable? >> > >> > >> > You can not answer the question "what optimization pygccxml does" >> without >> > scanning the whole source code. pygccxml supports "file by file" >> > compilation mode. >> > When it joins the declarations tree, it have to clear all declaration >> > and type caches. >> > It is very easy to write decl.cache.reset(). While in your case, >> > developer ( me ) has >> > always scan all sources and to find out what attributes should be >> > reset. Another problem >> > is when new "cache value" is introduced. I can bet, that I will forget >> > to add it to all >> > places where I need reset. Thus the software will become buggy. >> > >> > This is just an example to problem my implementation solves. There is >> > new concept >> > in pygccxml "cache" and I want it to be presented in a right way. >> >> Why didn't I need the ability to reset in my versions? > > > You do. You didn't run unit tests, right? Otherwise typedef_tester.py > would fail. I ran the unit tests but maybe I missed it for remove_alias. > >> Do all the >> methods really need reset. I mean why do we have to reset something >> that is always constant for a given decl? (ex: access type) > > > You treat pygccxml declarations tree as read only, right? If you think > about > read\write than you really need to reset it. > >> I agree, if there are multiple places in the code where the cache would >> have to be reset or touched then that is an issue. But I didn't run >> across that with my implementations. I know you had a corner case with >> remove_alias that took a long time to track down, was that because of >> needing a reset? > > > Yes. > removed_alias on a typedef, after join of declarations tree contained > reference to > the removed class. Some operation stopped working. After this I > understand, that to be > on a safe side I need to reset declaration cache too. > >> >> > You already achieved x10 improvement. This is a grate result. Lets >> >> > stop here. >> >> > >> >> I don't know if I will ever stop looking for ways to make this >> code run >> >> faster, but I will probably stop soon so I can just use the code >> instead >> >> of trying to improve it. >> > >> > >> > Please don't take it personal, without your ideas and work Py++ would >> > not become >> > such powerful and good tool. >> >> I don't take it personal. I just really need py++ to run faster so I >> can work with it more easily. It is much better now then it was, but >> now that I have seen how many opportunities there are for improvement I >> just want to make sure I don't miss any easy fixes. :) > > > I did not optimize Py++ at all. I just make it to run "fast enough" on > my projects. > So please do it. > I am still trying to make it "fast enough" for my projects. :) -Allen |
From: Allen B. <al...@vr...> - 2006-08-28 19:22:59
|
I found one other place. Using make_flatten_generator in pyplusplus (exposing it and then using it actually) shaves off another 8-10%. The profile list now looks pretty flat. In other words there is nothing dominating. All of the cached methods are now showing up as the most costly in the profile but they are all about equal in terms of time. I would still like to see a faster implementation of these things so we can get back that extra 25% increase but I guess i will live with it if you don't want to optimize it further. I think any remaining optimizations will need to be done by making algorithms smarter so they do less work. For now I think I am done with my optimization efforts though. I need to get real work done. -Allen Allen Bierbaum wrote: >Roman Yakovenko wrote: > > > >>On 8/28/06, Allen Bierbaum <al...@vr...> wrote: >> >> >> >>>Roman Yakovenko wrote: >>> >>> >>> >>>>On 8/28/06, Allen Bierbaum <al...@vr...> wrote: >>>> >>>> >>>> >>>>>Roman Yakovenko wrote: >>>>> >>>>> >>>>> >>>>>>On 8/28/06, Allen Bierbaum <al...@vr...> wrote: >>>>>> >>>>>> >>>>>> >>>>>>>I now have preliminary numbers. My build that used to take 58 >>>>>>> >>>>>>> >>>seconds >>> >>> >>>>>>>now takes 76 seconds with your caching changes. So it looks >>>>>>> >>>>>>> >>>right now >>> >>> >>>>>>>like the implementations I was using were about 25% faster for >>>>>>> >>>>>>> >>>some >>> >>> >>>>>>>reason. >>>>>>> >>>>>>>Any thoughts? >>>>>>> >>>>>>> >>>>>>I don't know why, but I think I prefer to pay this price. I prefer >>>>>>code, that I can maintain. Sorry. >>>>>> >>>>>> >>>>>It is probably no surprise that I disagree. The extra 25% >>>>> >>>>> >>>performance >>> >>> >>>>>seems like a good thing to me. >>>>> >>>>> >>>>:-). It is not 25%. The original time was 12 minutes. Your >>>>optimization brought it to >>>>1 minutes, than my changes brought it to 1.25 minutes. >>>> >>>> >>>The 25% is that in my current version, my implementations take 58 >>>seconds and with yours it takes 76 seconds. That is a 25% difference >>>with my math at least. :) >>> >>> >>I count from 12 minutes. >> >> > >Well that is just cheating then. :) > > > >>>>>Why were the extra layers of indirection needed in your >>>>> >>>>> >>>implementation? >>> >>> >>>>>In my implementation I just tried to keep everything local to the >>>>> >>>>> >>>method >>> >>> >>>>>I was optimizing. In that way I thought it was pretty maintainable >>>>>because that method was the only place in the code that set or >>>>> >>>>> >>>used the >>> >>> >>>>>cache value. This still allowed for disabling caching by using a >>>>> >>>>> >>>module >>> >>> >>>>>level variable to prevent the cache from being set. >>>>> >>>>>What caused this local encapsulation to be less maintainable? >>>>> >>>>> >>>>You can not answer the question "what optimization pygccxml does" >>>> >>>> >>>without >>> >>> >>>>scanning the whole source code. pygccxml supports "file by file" >>>>compilation mode. >>>>When it joins the declarations tree, it have to clear all declaration >>>>and type caches. >>>>It is very easy to write decl.cache.reset(). While in your case, >>>>developer ( me ) has >>>>always scan all sources and to find out what attributes should be >>>>reset. Another problem >>>>is when new "cache value" is introduced. I can bet, that I will forget >>>>to add it to all >>>>places where I need reset. Thus the software will become buggy. >>>> >>>>This is just an example to problem my implementation solves. There is >>>>new concept >>>>in pygccxml "cache" and I want it to be presented in a right way. >>>> >>>> >>>Why didn't I need the ability to reset in my versions? >>> >>> >>You do. You didn't run unit tests, right? Otherwise typedef_tester.py >>would fail. >> >> > >I ran the unit tests but maybe I missed it for remove_alias. > > > >>>Do all the >>>methods really need reset. I mean why do we have to reset something >>>that is always constant for a given decl? (ex: access type) >>> >>> >>You treat pygccxml declarations tree as read only, right? If you think >>about >>read\write than you really need to reset it. >> >> >> >>>I agree, if there are multiple places in the code where the cache would >>>have to be reset or touched then that is an issue. But I didn't run >>>across that with my implementations. I know you had a corner case with >>>remove_alias that took a long time to track down, was that because of >>>needing a reset? >>> >>> >>Yes. >>removed_alias on a typedef, after join of declarations tree contained >>reference to >>the removed class. Some operation stopped working. After this I >>understand, that to be >>on a safe side I need to reset declaration cache too. >> >> >> >>>>>>You already achieved x10 improvement. This is a grate result. Lets >>>>>>stop here. >>>>>> >>>>>> >>>>>> >>>>>I don't know if I will ever stop looking for ways to make this >>>>> >>>>> >>>code run >>> >>> >>>>>faster, but I will probably stop soon so I can just use the code >>>>> >>>>> >>>instead >>> >>> >>>>>of trying to improve it. >>>>> >>>>> >>>>Please don't take it personal, without your ideas and work Py++ would >>>>not become >>>>such powerful and good tool. >>>> >>>> >>>I don't take it personal. I just really need py++ to run faster so I >>>can work with it more easily. It is much better now then it was, but >>>now that I have seen how many opportunities there are for improvement I >>>just want to make sure I don't miss any easy fixes. :) >>> >>> >>I did not optimize Py++ at all. I just make it to run "fast enough" on >>my projects. >>So please do it. >> >> >> >I am still trying to make it "fast enough" for my projects. :) > >-Allen > > > >------------------------------------------------------------------------- >Using Tomcat but need to do more? Need to support web services, security? >Get stuff done quickly with pre-integrated technology to make your job easier >Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo >http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 >_______________________________________________ >pygccxml-development mailing list >pyg...@li... >https://lists.sourceforge.net/lists/listinfo/pygccxml-development > > > |
From: Allen B. <al...@vr...> - 2006-08-28 19:24:08
|
Allen Bierbaum wrote: > I found one other place. Using make_flatten_generator in pyplusplus > (exposing it and then using it actually) shaves off another 8-10%. > > The profile list now looks pretty flat. In other words there is > nothing dominating. All of the cached methods are now showing up as > the most costly in the profile but they are all about equal in terms > of time. I would still like to see a faster implementation of these > things so we can get back that extra 25% increase but I guess i will > live with it if you don't want to optimize it further. One correction here. remove_alias is definitely the most expensive method remaining. It currently takes up about 13% of the run-time. -Allen > > I think any remaining optimizations will need to be done by making > algorithms smarter so they do less work. For now I think I am done > with my optimization efforts though. I need to get real work done. > > -Allen > > > Allen Bierbaum wrote: > >> Roman Yakovenko wrote: >> >> >> >>> On 8/28/06, Allen Bierbaum <al...@vr...> wrote: >>> >>> >>> >>>> Roman Yakovenko wrote: >>>> >>>> >>>> >>>>> On 8/28/06, Allen Bierbaum <al...@vr...> wrote: >>>>> >>>>> >>>>> >>>>>> Roman Yakovenko wrote: >>>>>> >>>>>> >>>>>> >>>>>>> On 8/28/06, Allen Bierbaum <al...@vr...> wrote: >>>>>>> >>>>>>> >>>>>>> >>>>>>>> I now have preliminary numbers. My build that used to take 58 >>>>>>>> >>>>>>> >>>> seconds >>>> >>>> >>>>>>>> now takes 76 seconds with your caching changes. So it looks >>>>>>>> >>>>>>> >>>> right now >>>> >>>> >>>>>>>> like the implementations I was using were about 25% faster for >>>>>>>> >>>>>>> >>>> some >>>> >>>> >>>>>>>> reason. >>>>>>>> >>>>>>>> Any thoughts? >>>>>>>> >>>>>>> >>>>>>> I don't know why, but I think I prefer to pay this price. I prefer >>>>>>> code, that I can maintain. Sorry. >>>>>>> >>>>>> >>>>>> It is probably no surprise that I disagree. The extra 25% >>>>> >>>> performance >>>> >>>> >>>>>> seems like a good thing to me. >>>>>> >>>>> >>>>> :-). It is not 25%. The original time was 12 minutes. Your >>>>> optimization brought it to >>>>> 1 minutes, than my changes brought it to 1.25 minutes. >>>>> >>>> >>>> The 25% is that in my current version, my implementations take 58 >>>> seconds and with yours it takes 76 seconds. That is a 25% difference >>>> with my math at least. :) >>>> >>> >>> I count from 12 minutes. >>> >> >> >> Well that is just cheating then. :) >> >> >> >>>>>> Why were the extra layers of indirection needed in your >>>>> >>>> implementation? >>>> >>>> >>>>>> In my implementation I just tried to keep everything local to the >>>>>> >>>>> >>>> method >>>> >>>> >>>>>> I was optimizing. In that way I thought it was pretty maintainable >>>>>> because that method was the only place in the code that set or >>>>>> >>>>> >>>> used the >>>> >>>> >>>>>> cache value. This still allowed for disabling caching by using a >>>>>> >>>>> >>>> module >>>> >>>> >>>>>> level variable to prevent the cache from being set. >>>>>> >>>>>> What caused this local encapsulation to be less maintainable? >>>>>> >>>>> >>>>> You can not answer the question "what optimization pygccxml does" >>>>> >>>> >>>> without >>>> >>>> >>>>> scanning the whole source code. pygccxml supports "file by file" >>>>> compilation mode. >>>>> When it joins the declarations tree, it have to clear all declaration >>>>> and type caches. >>>>> It is very easy to write decl.cache.reset(). While in your case, >>>>> developer ( me ) has >>>>> always scan all sources and to find out what attributes should be >>>>> reset. Another problem >>>>> is when new "cache value" is introduced. I can bet, that I will >>>>> forget >>>>> to add it to all >>>>> places where I need reset. Thus the software will become buggy. >>>>> >>>>> This is just an example to problem my implementation solves. There is >>>>> new concept >>>>> in pygccxml "cache" and I want it to be presented in a right way. >>>>> >>>> >>>> Why didn't I need the ability to reset in my versions? >>>> >>> >>> You do. You didn't run unit tests, right? Otherwise typedef_tester.py >>> would fail. >>> >> >> >> I ran the unit tests but maybe I missed it for remove_alias. >> >> >> >>>> Do all the >>>> methods really need reset. I mean why do we have to reset something >>>> that is always constant for a given decl? (ex: access type) >>>> >>> >>> You treat pygccxml declarations tree as read only, right? If you >>> think about >>> read\write than you really need to reset it. >>> >>> >>> >>>> I agree, if there are multiple places in the code where the cache >>>> would >>>> have to be reset or touched then that is an issue. But I didn't run >>>> across that with my implementations. I know you had a corner case >>>> with >>>> remove_alias that took a long time to track down, was that because of >>>> needing a reset? >>>> >>> >>> Yes. >>> removed_alias on a typedef, after join of declarations tree contained >>> reference to >>> the removed class. Some operation stopped working. After this I >>> understand, that to be >>> on a safe side I need to reset declaration cache too. >>> >>> >>> >>>>>>> You already achieved x10 improvement. This is a grate result. Lets >>>>>>> stop here. >>>>>>> >>>>>>> >>>>>> >>>>>> I don't know if I will ever stop looking for ways to make this >>>>>> >>>>> >>>> code run >>>> >>>> >>>>>> faster, but I will probably stop soon so I can just use the code >>>>>> >>>>> >>>> instead >>>> >>>> >>>>>> of trying to improve it. >>>>>> >>>>> >>>>> Please don't take it personal, without your ideas and work Py++ would >>>>> not become >>>>> such powerful and good tool. >>>>> >>>> >>>> I don't take it personal. I just really need py++ to run faster so I >>>> can work with it more easily. It is much better now then it was, but >>>> now that I have seen how many opportunities there are for >>>> improvement I >>>> just want to make sure I don't miss any easy fixes. :) >>>> >>> >>> I did not optimize Py++ at all. I just make it to run "fast enough" on >>> my projects. >>> So please do it. >>> >>> >> >> I am still trying to make it "fast enough" for my projects. :) >> >> -Allen >> >> >> >> ------------------------------------------------------------------------- >> >> Using Tomcat but need to do more? Need to support web services, >> security? >> Get stuff done quickly with pre-integrated technology to make your >> job easier >> Download IBM WebSphere Application Server v.1.0.1 based on Apache >> Geronimo >> http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 >> _______________________________________________ >> pygccxml-development mailing list >> pyg...@li... >> https://lists.sourceforge.net/lists/listinfo/pygccxml-development >> >> >> > |
From: Allen B. <al...@vr...> - 2006-08-28 12:43:22
|
Roman Yakovenko wrote: > On 8/28/06, Allen Bierbaum <al...@vr...> wrote: > >> I now have preliminary numbers. My build that used to take 58 seconds >> now takes 76 seconds with your caching changes. So it looks right now >> like the implementations I was using were about 25% faster for some >> reason. >> >> Any thoughts? > > > I don't know why, but I think I prefer to pay this price. I prefer > code, that I can maintain. Sorry. It is probably no surprise that I disagree. The extra 25% performance seems like a good thing to me. Why were the extra layers of indirection needed in your implementation? In my implementation I just tried to keep everything local to the method I was optimizing. In that way I thought it was pretty maintainable because that method was the only place in the code that set or used the cache value. This still allowed for disabling caching by using a module level variable to prevent the cache from being set. What caused this local encapsulation to be less maintainable? > You already achieved x10 improvement. This is a grate result. Lets > stop here. > I don't know if I will ever stop looking for ways to make this code run faster, but I will probably stop soon so I can just use the code instead of trying to improve it. -Allen |
From: Roman Y. <rom...@gm...> - 2006-08-28 13:01:25
|
On 8/28/06, Allen Bierbaum <al...@vr...> wrote: > Roman Yakovenko wrote: > > > On 8/28/06, Allen Bierbaum <al...@vr...> wrote: > > > >> I now have preliminary numbers. My build that used to take 58 seconds > >> now takes 76 seconds with your caching changes. So it looks right now > >> like the implementations I was using were about 25% faster for some > >> reason. > >> > >> Any thoughts? > > > > > > I don't know why, but I think I prefer to pay this price. I prefer > > code, that I can maintain. Sorry. > > It is probably no surprise that I disagree. The extra 25% performance > seems like a good thing to me. :-). It is not 25%. The original time was 12 minutes. Your optimization brought it to 1 minutes, than my changes brought it to 1.25 minutes. > Why were the extra layers of indirection needed in your implementation? > > In my implementation I just tried to keep everything local to the method > I was optimizing. In that way I thought it was pretty maintainable > because that method was the only place in the code that set or used the > cache value. This still allowed for disabling caching by using a module > level variable to prevent the cache from being set. > > What caused this local encapsulation to be less maintainable? You can not answer the question "what optimization pygccxml does" without scanning the whole source code. pygccxml supports "file by file" compilation mode. When it joins the declarations tree, it have to clear all declaration and type caches. It is very easy to write decl.cache.reset(). While in your case, developer ( me ) has always scan all sources and to find out what attributes should be reset. Another problem is when new "cache value" is introduced. I can bet, that I will forget to add it to all places where I need reset. Thus the software will become buggy. This is just an example to problem my implementation solves. There is new concept in pygccxml "cache" and I want it to be presented in a right way. > > You already achieved x10 improvement. This is a grate result. Lets > > stop here. > > > I don't know if I will ever stop looking for ways to make this code run > faster, but I will probably stop soon so I can just use the code instead > of trying to improve it. Please don't take it personal, without your ideas and work Py++ would not become such powerful and good tool. -- Roman Yakovenko C++ Python language binding http://www.language-binding.net/ |
From: Allen B. <al...@vr...> - 2006-08-28 13:29:19
|
Roman Yakovenko wrote: > On 8/28/06, Allen Bierbaum <al...@vr...> wrote: > >> Roman Yakovenko wrote: >> >> > On 8/28/06, Allen Bierbaum <al...@vr...> wrote: >> > >> >> I now have preliminary numbers. My build that used to take 58 seconds >> >> now takes 76 seconds with your caching changes. So it looks right now >> >> like the implementations I was using were about 25% faster for some >> >> reason. >> >> >> >> Any thoughts? >> > >> > >> > I don't know why, but I think I prefer to pay this price. I prefer >> > code, that I can maintain. Sorry. >> >> It is probably no surprise that I disagree. The extra 25% performance >> seems like a good thing to me. > > > :-). It is not 25%. The original time was 12 minutes. Your > optimization brought it to > 1 minutes, than my changes brought it to 1.25 minutes. The 25% is that in my current version, my implementations take 58 seconds and with yours it takes 76 seconds. That is a 25% difference with my math at least. :) > >> Why were the extra layers of indirection needed in your implementation? >> >> In my implementation I just tried to keep everything local to the method >> I was optimizing. In that way I thought it was pretty maintainable >> because that method was the only place in the code that set or used the >> cache value. This still allowed for disabling caching by using a module >> level variable to prevent the cache from being set. >> >> What caused this local encapsulation to be less maintainable? > > > You can not answer the question "what optimization pygccxml does" without > scanning the whole source code. pygccxml supports "file by file" > compilation mode. > When it joins the declarations tree, it have to clear all declaration > and type caches. > It is very easy to write decl.cache.reset(). While in your case, > developer ( me ) has > always scan all sources and to find out what attributes should be > reset. Another problem > is when new "cache value" is introduced. I can bet, that I will forget > to add it to all > places where I need reset. Thus the software will become buggy. > > This is just an example to problem my implementation solves. There is > new concept > in pygccxml "cache" and I want it to be presented in a right way. Why didn't I need the ability to reset in my versions? Do all the methods really need reset. I mean why do we have to reset something that is always constant for a given decl? (ex: access type) I agree, if there are multiple places in the code where the cache would have to be reset or touched then that is an issue. But I didn't run across that with my implementations. I know you had a corner case with remove_alias that took a long time to track down, was that because of needing a reset? >> > You already achieved x10 improvement. This is a grate result. Lets >> > stop here. >> > >> I don't know if I will ever stop looking for ways to make this code run >> faster, but I will probably stop soon so I can just use the code instead >> of trying to improve it. > > > Please don't take it personal, without your ideas and work Py++ would > not become > such powerful and good tool. I don't take it personal. I just really need py++ to run faster so I can work with it more easily. It is much better now then it was, but now that I have seen how many opportunities there are for improvement I just want to make sure I don't miss any easy fixes. :) -Allen |
From: Roman Y. <rom...@gm...> - 2006-08-28 13:38:17
|
On 8/28/06, Allen Bierbaum <al...@vr...> wrote: > Roman Yakovenko wrote: > > > On 8/28/06, Allen Bierbaum <al...@vr...> wrote: > > > >> Roman Yakovenko wrote: > >> > >> > On 8/28/06, Allen Bierbaum <al...@vr...> wrote: > >> > > >> >> I now have preliminary numbers. My build that used to take 58 seconds > >> >> now takes 76 seconds with your caching changes. So it looks right now > >> >> like the implementations I was using were about 25% faster for some > >> >> reason. > >> >> > >> >> Any thoughts? > >> > > >> > > >> > I don't know why, but I think I prefer to pay this price. I prefer > >> > code, that I can maintain. Sorry. > >> > >> It is probably no surprise that I disagree. The extra 25% performance > >> seems like a good thing to me. > > > > > > :-). It is not 25%. The original time was 12 minutes. Your > > optimization brought it to > > 1 minutes, than my changes brought it to 1.25 minutes. > > The 25% is that in my current version, my implementations take 58 > seconds and with yours it takes 76 seconds. That is a 25% difference > with my math at least. :) I count from 12 minutes. > > > >> Why were the extra layers of indirection needed in your implementation? > >> > >> In my implementation I just tried to keep everything local to the method > >> I was optimizing. In that way I thought it was pretty maintainable > >> because that method was the only place in the code that set or used the > >> cache value. This still allowed for disabling caching by using a module > >> level variable to prevent the cache from being set. > >> > >> What caused this local encapsulation to be less maintainable? > > > > > > You can not answer the question "what optimization pygccxml does" without > > scanning the whole source code. pygccxml supports "file by file" > > compilation mode. > > When it joins the declarations tree, it have to clear all declaration > > and type caches. > > It is very easy to write decl.cache.reset(). While in your case, > > developer ( me ) has > > always scan all sources and to find out what attributes should be > > reset. Another problem > > is when new "cache value" is introduced. I can bet, that I will forget > > to add it to all > > places where I need reset. Thus the software will become buggy. > > > > This is just an example to problem my implementation solves. There is > > new concept > > in pygccxml "cache" and I want it to be presented in a right way. > > Why didn't I need the ability to reset in my versions? You do. You didn't run unit tests, right? Otherwise typedef_tester.py would fail. > Do all the > methods really need reset. I mean why do we have to reset something > that is always constant for a given decl? (ex: access type) You treat pygccxml declarations tree as read only, right? If you think about read\write than you really need to reset it. > I agree, if there are multiple places in the code where the cache would > have to be reset or touched then that is an issue. But I didn't run > across that with my implementations. I know you had a corner case with > remove_alias that took a long time to track down, was that because of > needing a reset? Yes. removed_alias on a typedef, after join of declarations tree contained reference to the removed class. Some operation stopped working. After this I understand, that to be on a safe side I need to reset declaration cache too. > >> > You already achieved x10 improvement. This is a grate result. Lets > >> > stop here. > >> > > >> I don't know if I will ever stop looking for ways to make this code run > >> faster, but I will probably stop soon so I can just use the code instead > >> of trying to improve it. > > > > > > Please don't take it personal, without your ideas and work Py++ would > > not become > > such powerful and good tool. > > I don't take it personal. I just really need py++ to run faster so I > can work with it more easily. It is much better now then it was, but > now that I have seen how many opportunities there are for improvement I > just want to make sure I don't miss any easy fixes. :) I did not optimize Py++ at all. I just make it to run "fast enough" on my projects. So please do it. -- Roman Yakovenko C++ Python language binding http://www.language-binding.net/ |