Work at SourceForge, help us to make it a better place! We have an immediate need for a Support Technician in our San Francisco or Denver office.

Close

suggestions for speedup

Stefaan
2006-03-24
2013-05-28
  • Stefaan
    Stefaan
    2006-03-24

    I have been experiencing some problems with speed for complex dialogs, on a not very recent type of industrial rack (= robust pc meant to be used in factories, but rather limited capabilities).

    I would like to propose some possible speed-ups, feel free to comment. None of these have been tested out yet, they are just ideas. Most of them should be easy to implement though, and if I find some time, I will probably try it myself, it no-one else has done it by then ;)

    - in function find_windows: probably better compile the regular expressions before using them in a loop:
    e.g.
    ...
    elif title_re  is not None and windows:
            regex = re.compile(title_re)
            windows = [win for win in windows
                if regex.match(handleprops.text(win))]
    ...

    - in function GetNonTextControlName
    I see you calculate distances, to find
    out which texts are closest to a given control.
    Instead of calculating the distance, you could just as well compare the squared distance, which removes two times the overhead of calculating the square root.  (in other words exploit the fact that |x| < |y| <=> x^2 < y^2 )
    Small optimizations like this matter if these calculations have to be done several thousand times.

    - it would be an interesting experiment to replace  the difflib for calculating similarity between strings (which is mostly optimized for (very) long strings) with the python levenshtein package, which also provides similarity between strings and is rumored to be 10x faster than difflib for shorter strings.

    All in all I have found pywinauto to be very impressive, and I hope some of these suggestions can contribute to make it even more so.

     
    • Mark Mc Mahon
      Mark Mc Mahon
      2006-03-24

      Hi,

      Those suggestions make complete and utter sense!

      I have no problem implementing those. I tend to avoid trying to think of performance (because so often I find that it doesn't matter) and with python and tests it is easy to refactor and make sure that it is still correct afterwards.

      Thanks for the suggestions. Even if these do not give a real practical speedup (who can tell without measuring) they should not make the code less understandable.

      Version 0.3.0 is slower then previous releases - because I have no implemented anyway to avoid saving information from every run (which is only saved to disk if you request it). If you are not using this feature - then it slows down scripts. I need to add an option that will disable this (I think that will probably give a bigger speedup!).

      To try that out you could comment out either the calls to RecordMatch in application.py or make it return directly without doing anything.

      Thanks
         Mark

      P.S. Any intention of joining the mailing list - I find it a bit easier to track items there. But feel free to continue here if you prefer that.

       
    • Stefaan
      Stefaan
      2006-03-25

      Hello Mark,

      Can the mailing list be accessed (I mean write access, not just reading it) via webinterface? Or alternatively, are the messages from the list tagged to make it easy to automatically classify them ?

      I am not too keen on yet another mailing list filling up my mail box. Of course at the current rate of posting messages on the list, this is mostly a theoretical problem, but given the potential of pywinauto, I foresee trouble in the future ;)

      By the way, I tend to agree with the saying that premature optimization is the root of all evil. But at some point, if speed becomes an issue, some optimization should be considered. For now, I only suggested some small changes, because I hope these will already make a noticable difference (after all they impact some inner loops in the code). If not, I might dig a little deeper and think about more fundamental changes (algorithm related).

      E.g. I do not believe the assumption that a linear search through all keys is needed in case of fuzzy matching: The matching problem seems similar to spell checking, which for certain does not use a linear search through the dictionary!. I expect better ways of organizing/searching the data can be found to reduce the work to be done (although I have no concrete suggestions yet). But suggesting and implementing such changes takes quite some work of course, and should be postponed until it is really needed to solve an actual speed problem.

      As a final remark, I did some profiling on my automation script last friday (unfortunately I cannot access any of my data during the weekend) and I saw that most time was spent in those places where I suggested some changes (and also in WrapHandle); Python during those function calls caused 99% processor load, indicating that indeed the Python code on the relatively slow computer is what is limiting the speed of the automation script.

      Thank you for the suggestion of commenting out the RecordMatch calls. I will try that out after the weekend. (But I don't recall seeing RecordMatch in the profiler's top ten output)

      I should probably stop typing now, before you fall asleep.

       
    • Stefaan
      Stefaan
      2006-03-26

      I tried out my proposed changes on my PC at home.
      (I even temporarily reinstalled windows for that purpose ! )

      The result is that the code becomes shorter,
      more readable and runs almost twice as fast
      for a simple dialog (i.e. the putty ssh configuration dialog) - without even touching RecordMatch.

      Happy happy, joy joy.
      I can send you my changes if you like,
      they are fairly trivial, consisting mostly
      of removing lines of code ;)

      The only drawbacks I see are
      1) the need to install the python Levenshtein package, a windows installer being available at https://dev.livingreviews.org/sec-cgi-bin/epubtktrac/wiki/WinInstall
      2) the Levenshtein ratio is not exactly the same number as the difflib ratio (usually equal to difflib's or a little higher), which in theory could break some existing automation scripts.

      Both drawbacks could be overcome though:
      maybe the matching algorithm can be made pluggable with difflib's as default?

      Best regards,
      Stefaan.

       
    • Mark Mc Mahon
      Mark Mc Mahon
      2006-03-27

      Hi Stefaan,

      OK - did the re.compile and the distance fix (about 1 second difference in the unit test run (from 61.5 seconds to 60.5 seconds). So the majority of the fix must come from the Levenshtein replacement of difflib.

      The only part of that that I don't like very much is that it has a GPL licence - and I don't really want to go there. But I don't see a problem with having it pluggable. I will look into that.

      Are you able write to the list using gmane?  (I don't know I have't used it) http://news.gmane.org/gmane.comp.python.pywinauto.user

      I did a quick test and modifing application.Application.RecordMatch to just return immediately shaves off another 2 seconds from the unit test run (from 60.5 to 58.7).

      Thanks for your suggestions!!
        Mark