Menu

Could we use mTokType to limit number of calls to Token::Match() in PathAnalysis::forwardRange()?

2022-06-02
2022-07-08
  • Jens Yllman

    Jens Yllman - 2022-06-02

    Currently there seem to be multiple complaints about cppcheck being slow. And in the two cases I have looked at it is because Token::multiCompare() gets called very many times from PathAnalysis::forwardRange().

    For instance, the first thing that happens in the loop of forwaredRange() is

    if (Token::Match(tok, "asm|goto|break|continue"))

    This will trigger multiCompare(). would it not be smart to first do tok->isKeyword()? I will do some testing and see if this might be something to do.

    Just some numbers. In a 'small' example I use to test with at the moment. It is big enough to notice it being slow. And profiling it gives me the numbers that forwardRange() gets called 66 times. And that in turn triggers like 2805817 calls to multiCompare(). Actually all of those calls are from forwardRange(). But most are. The reason why there are 66 calls to forwardRange() is that there are 66 lines of std::string x = y +"something"; in that code.

    Will be back with my results later.

     
  • Jens Yllman

    Jens Yllman - 2022-06-03

    OK, so doing this did speed up all my tests. Even running the example in #10933. It still took way to long but. But now I am trying to run profiling on that example. And I am soon at 24h waiting. But since I now know it can finish I will wait and look at the result.

    The example above about 2805817 calls to multiCompare() now only does 90000+ calls. And the existing test still report OK.

     
  • Jens Yllman

    Jens Yllman - 2022-06-04

    So, now when keeping the multiCompare() 'low' in the profiling the top 20 is things are related to std::string and something that looks like std::list.

    So, are we copying strings to much? Triggered by copying the token list? But the profile output I have does not show where the calls are coming from.

    13495308729 calls to std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_data() const is what seem to use the most time. with 1432802144 calls to void std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_construct<char*>(char*, char*, std::forward_iterator_tag) comes second. And there have only been 162 calls to PathAnalysis::forwardRange(). So it looks like the loop in forwardRange() grows a lot when the token grows big. Or I might be wrong. And in time forwardRange() actually puts in in 14th spot, and that is after removing the time in function calls.

    I have to admit I am not sure how much I can trust this profiler results. But it probably gives some guidance anyway.

     
  • Jens Yllman

    Jens Yllman - 2022-06-08

    Just a small comment. Some things might light very big in profiling. But, running profiling removes the benefit of optimization. But at least it shows the number of calls. That can be important.

     
  • Daniel Marjamäki

    Token::multiCompare() gets called very many times from PathAnalysis::forwardRange().

    I think you have profiled a debug build without match compiler.

    In a release build the Token::multiCompare() should not be called from such Token::Match pattern.

    Try:

    make MATCHCOMPILER=yes ...
    
     
  • Jens Yllman

    Jens Yllman - 2022-06-15

    Yes, it is true multiCompare() only show up if you do not use the match compiler. But even with the match compiler adding the isKeyword() gives little performance boost. Running the file in #10933 with isKeyword() takes 41m23s and without it takes 42m41s, both with match compiler and -O2 -DNDEBUG. But it is small, and I think it does not help understanding the reason for the code. -g and no match compiler and no isKeyword() takes 276m23s. Same compile but -pg instead of -g to run the profile takes like over 1300m.

    But still, why does forwardRange() in #10933 take forever.

     
  • Jens Yllman

    Jens Yllman - 2022-06-16

    So I now also did a run where I had the same settings as John Borland had in his run on another thread. And using perf instead of -pg to generate a FlameGraph. It's all forwardRange(). The graph I attach to this thread is run with the isKeyword(), but removing that will give the same graph. I might do a higher sample frequency to see if something else will show.

     
    • john borland

      john borland - 2022-07-04

      I can say increasing the rate of the perf record shouldn't change the overall shape of the flame graph. For the most part it will just increase the amount of overhead perf is using and half to dump more data to user space. I find that the 99 Hertz rate is pretty good for both long and short runs. One important factor in picking a rate is making sure it's an odd numbed rate to avoid sampling in lockstep with other activity and producing misleading results.

       
      • Jens Yllman

        Jens Yllman - 2022-07-05

        No, it did not. And now when I found the real problem the flamegraph is really boring.

         
        • john borland

          john borland - 2022-07-05

          Did you find the flamegraph useful in finding where to start looking for your performance issue?

           
          • Jens Yllman

            Jens Yllman - 2022-07-08

            Yes and no. I could see things you would like to remove/reduce. But the real reason for the really slow execution was that instead of having more or less 'flat' flow with only one level of recursion there was overflow which caused where much recursion and very slow.

             
            • john borland

              john borland - 2022-07-08

              I was thinking it might be useful to add some sort of cmake build type that would turn on the match compiler and set -g -fno-omit-frame-pointer -O2 for the purpose of making flame graphs. I think that would make it easier for people to find performance regressions or at least a starting thread to pull on for finding performance issues. I could even do a little write up for how to do it, but I'm not sure where to put it. Maybe at some point we might see flame graphs and flame graph diffs for some of the daca@home runs.

               
  • Paul Fultz

    Paul Fultz - 2022-06-16

    But even with the match compiler adding the isKeyword() gives little performance boost.

    We could possibly add such checks to the match compiler, so it will automatically be added when checking for keywords.

     
    • Jens Yllman

      Jens Yllman - 2022-06-18

      Yes, that might be one way to go. I 'fear' that the way I put in the isKeyword() check now, works and 'looks' OK when you do not think about match compiler. So if the match compiler knows about it, the resulting code might get more 'correct'.

       
  • Jens Yllman

    Jens Yllman - 2022-06-18

    So some more info. When you look at the flame graph it might be the fact that memory allocation takes up to much of the time in forwardRange()/forewardRecursive(). And since the 'algorithm' is recursive, it adds up to memory allocation takes a lot of the time in the #10933 example.

     
  • Jens Yllman

    Jens Yllman - 2022-06-26

    Why do we want forwardRange() to go to the end of the file and not to the end of the scope?? In the case of #10933, the example file does add 'many' rows to the info.errorPath, which makes the 'copy' into a new call to forwardRange() very costly. If I reduce the number of rows in info.errorPath, the time is very much reduced.

     
  • Jens Yllman

    Jens Yllman - 2022-06-26

    Actually, it seem that the thing that triggered the scan to the end of the file is that in the example file in #10933 the if conditions are actually assignments not compare. So I am now looking on the code in forwardRange() that looks at the condition. If you change the ifs to == instead of = it finishes in no time.

     
  • Jens Yllman

    Jens Yllman - 2022-06-26

    To me now, it looks like even if you have an endToken into forwardRange() it continues past that end token cause some actions will end on the endToken, and then there is a ++ end then a check if endToken. But then we are past already.

     
  • Jens Yllman

    Jens Yllman - 2022-06-27

    So, the use of isKeyword() might be good. But with the 'fix' I put in as a PR #4235 this optimization might not be so needed. The fix instead tries to stop overstepping endToken in the loop in forwardRange().

     

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.