Re: [Pywinauto-users] Re: [pywinauto - Open Discussion] RE: suggestions for speedup
Status: Alpha
Brought to you by:
mcmahon_m
From: Stefaan H. <ste...@gm...> - 2006-03-27 23:57:53
|
Hello everyone, The replacement of Difflib with Levenshtein was not bad for a small test dialog, but it doesn't solve the deeper cause of the slowness I am experiencing. I only get marginally better results than before the replacement for a much larger real-life application. After some profiling I found the main bottleneck to be inside wraphandle.py, especially the _find_wrapper function. For some reason, it redoes the same regex matchings millions of times in my test script, being responsible for much of the slowness. So back at home I implemented a caching scheme (memoization table), which stores/reads the wrappers for classnames after matching in/from a dictionary instead of having to rematch on every invocation of WrapHandle. This removes most of the linear search at the bottom of the call stack, and should result in significant time savings. Of course, an interesting question arises: what causes these regex matchings to be redone so often? (Not entirely clear yet to me: I expect it is related to the liberal calling of WrapHandle everywhere.) To give some figures using a freely downloadable program (ssh putty), with a not too large dialog: def main(): application.set_timing(0,0,0,0,0,0,0,0,0,0) #this seems to work ok in this example app = application.Application().start_(r"""putty.exe""") ConfigurationDialog = app.PuttyConfiguration port = ConfigurationDialog.PortEdit.GetLine(0) The single GetLine(0) in "port = ConfigurationDialog.PortEdit.GetLine(0)" results in 8159 regex matches (done in function _find_wrapper, called by WrapHandle) Given the current cost of calling WrapHandle (in the current official pywinauto implementation most of the time it calls _find_wrapper which then performs a linear search with regex matching on every item), it really should be called with care -- in the above example of a single GetLine it was called >200 times. The more controls are present in the dialog, the more often WrapHandle is called. (Is that correct?) This scales poorly with the complexity of the dialogs of the application under test. The caching scheme, however, makes a call to _find_wrapper almost for free - at least compared to the current situation. With the caching scheme, for the above example, only 1 linear search is needed (41 regex matches), and the rest is replaced by a much faster than linear dictionary lookup which is expected to lead to very significant savings (this time for real ;). (Warning! still need to test out impact on my real-life testing script at work; but the test at home on the smaller dialog seems very, very promising.) A complementary optimization could be to drastically reduce the number of calls to WrapHandle throughout the code, but for now I hope that won't be necessary. A simple thing I tried was to remove the recursion from HwndWrapper.IsTopLevelParent(), (there's a WrapHandle call hidden inside the self.Parent() call), and then to call WrapHandle only at the end of the iteration instead of at every level of the recursion. The net result was a total of 394 less function calls being made for the given example, but the impact on performance was not measurable (too small). You may think that I am obsessed with performance, but that is not really true. I do want to make sure though that the testing scripts using pywinauto run reasonably fast, so as to execute as many as possible of them in the amount of nightly test PC time that can be reserved for this kind of tests. p.s. The caching scheme looks as follows: * Gmail tends to ruin layout of mails so I inserted points to indicate the indentation. * Also, I have changed some things without retesting so I can not guarantee this code is error free. * I left out the initialization of _wrapper_info as it hasn't changed. _wrapper_cache = {} # global variable, needed to remember cache entries across function calls for item in _all_classes: # initialize the cache with the results that would be found using direct matching ....try: ........for classname_ in item.windowclasses: ............_wrapper_cache[classname_] = item ....except AttributeError, e: ........pass def _find_wrapper(classname): ...."""return the wrapper that handles this classname ........If there is no match found then return None. ....""" ....try: ........#print "try cache" ........return _wrapper_cache[classname] ....except KeyError: ........#print "linear search" ........for regex, wrapper in _wrapper_info.values(): ............if regex.match(classname): ................#print "found" ................_wrapper_cache[classname] = wrapper # store wrapper in cache ................return wrapper ...._wrapper_cache[classname] = None # if nothing found, then also remember this through the cache ....return None --- As always, feel free to give feedback. Best regards, Stefaan. |