Currently there seem to be multiple complaints about cppcheck being slow. And in the two cases I have looked at it is because Token::multiCompare() gets called very many times from PathAnalysis::forwardRange().
For instance, the first thing that happens in the loop of forwaredRange() is
if (Token::Match(tok, "asm|goto|break|continue"))
This will trigger multiCompare(). would it not be smart to first do tok->isKeyword()? I will do some testing and see if this might be something to do.
Just some numbers. In a 'small' example I use to test with at the moment. It is big enough to notice it being slow. And profiling it gives me the numbers that forwardRange() gets called 66 times. And that in turn triggers like 2805817 calls to multiCompare(). Actually all of those calls are from forwardRange(). But most are. The reason why there are 66 calls to forwardRange() is that there are 66 lines of std::string x = y +"something"; in that code.
Will be back with my results later.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
OK, so doing this did speed up all my tests. Even running the example in #10933. It still took way to long but. But now I am trying to run profiling on that example. And I am soon at 24h waiting. But since I now know it can finish I will wait and look at the result.
The example above about 2805817 calls to multiCompare() now only does 90000+ calls. And the existing test still report OK.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
So, now when keeping the multiCompare() 'low' in the profiling the top 20 is things are related to std::string and something that looks like std::list.
So, are we copying strings to much? Triggered by copying the token list? But the profile output I have does not show where the calls are coming from.
13495308729 calls to std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_data() const is what seem to use the most time. with 1432802144 calls to void std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_construct<char*>(char*, char*, std::forward_iterator_tag) comes second. And there have only been 162 calls to PathAnalysis::forwardRange(). So it looks like the loop in forwardRange() grows a lot when the token grows big. Or I might be wrong. And in time forwardRange() actually puts in in 14th spot, and that is after removing the time in function calls.
I have to admit I am not sure how much I can trust this profiler results. But it probably gives some guidance anyway.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Just a small comment. Some things might light very big in profiling. But, running profiling removes the benefit of optimization. But at least it shows the number of calls. That can be important.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Yes, it is true multiCompare() only show up if you do not use the match compiler. But even with the match compiler adding the isKeyword() gives little performance boost. Running the file in #10933 with isKeyword() takes 41m23s and without it takes 42m41s, both with match compiler and -O2 -DNDEBUG. But it is small, and I think it does not help understanding the reason for the code. -g and no match compiler and no isKeyword() takes 276m23s. Same compile but -pg instead of -g to run the profile takes like over 1300m.
But still, why does forwardRange() in #10933 take forever.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
So I now also did a run where I had the same settings as John Borland had in his run on another thread. And using perf instead of -pg to generate a FlameGraph. It's all forwardRange(). The graph I attach to this thread is run with the isKeyword(), but removing that will give the same graph. I might do a higher sample frequency to see if something else will show.
I can say increasing the rate of the perf record shouldn't change the overall shape of the flame graph. For the most part it will just increase the amount of overhead perf is using and half to dump more data to user space. I find that the 99 Hertz rate is pretty good for both long and short runs. One important factor in picking a rate is making sure it's an odd numbed rate to avoid sampling in lockstep with other activity and producing misleading results.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Yes and no. I could see things you would like to remove/reduce. But the real reason for the really slow execution was that instead of having more or less 'flat' flow with only one level of recursion there was overflow which caused where much recursion and very slow.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I was thinking it might be useful to add some sort of cmake build type that would turn on the match compiler and set -g -fno-omit-frame-pointer -O2 for the purpose of making flame graphs. I think that would make it easier for people to find performance regressions or at least a starting thread to pull on for finding performance issues. I could even do a little write up for how to do it, but I'm not sure where to put it. Maybe at some point we might see flame graphs and flame graph diffs for some of the daca@home runs.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Yes, that might be one way to go. I 'fear' that the way I put in the isKeyword() check now, works and 'looks' OK when you do not think about match compiler. So if the match compiler knows about it, the resulting code might get more 'correct'.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
So some more info. When you look at the flame graph it might be the fact that memory allocation takes up to much of the time in forwardRange()/forewardRecursive(). And since the 'algorithm' is recursive, it adds up to memory allocation takes a lot of the time in the #10933 example.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Why do we want forwardRange() to go to the end of the file and not to the end of the scope?? In the case of #10933, the example file does add 'many' rows to the info.errorPath, which makes the 'copy' into a new call to forwardRange() very costly. If I reduce the number of rows in info.errorPath, the time is very much reduced.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Actually, it seem that the thing that triggered the scan to the end of the file is that in the example file in #10933 the if conditions are actually assignments not compare. So I am now looking on the code in forwardRange() that looks at the condition. If you change the ifs to == instead of = it finishes in no time.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
To me now, it looks like even if you have an endToken into forwardRange() it continues past that end token cause some actions will end on the endToken, and then there is a ++ end then a check if endToken. But then we are past already.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
So, the use of isKeyword() might be good. But with the 'fix' I put in as a PR #4235 this optimization might not be so needed. The fix instead tries to stop overstepping endToken in the loop in forwardRange().
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Currently there seem to be multiple complaints about cppcheck being slow. And in the two cases I have looked at it is because Token::multiCompare() gets called very many times from PathAnalysis::forwardRange().
For instance, the first thing that happens in the loop of forwaredRange() is
if (Token::Match(tok, "asm|goto|break|continue"))
This will trigger multiCompare(). would it not be smart to first do tok->isKeyword()? I will do some testing and see if this might be something to do.
Just some numbers. In a 'small' example I use to test with at the moment. It is big enough to notice it being slow. And profiling it gives me the numbers that forwardRange() gets called 66 times. And that in turn triggers like 2805817 calls to multiCompare(). Actually all of those calls are from forwardRange(). But most are. The reason why there are 66 calls to forwardRange() is that there are 66 lines of
std::string x = y +"something";
in that code.Will be back with my results later.
OK, so doing this did speed up all my tests. Even running the example in #10933. It still took way to long but. But now I am trying to run profiling on that example. And I am soon at 24h waiting. But since I now know it can finish I will wait and look at the result.
The example above about 2805817 calls to multiCompare() now only does 90000+ calls. And the existing test still report OK.
So, now when keeping the multiCompare() 'low' in the profiling the top 20 is things are related to std::string and something that looks like std::list.
So, are we copying strings to much? Triggered by copying the token list? But the profile output I have does not show where the calls are coming from.
13495308729 calls to
std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_data() const
is what seem to use the most time. with 1432802144 calls to voidstd::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_construct<char*>(char*, char*, std::forward_iterator_tag)
comes second. And there have only been 162 calls to PathAnalysis::forwardRange(). So it looks like the loop in forwardRange() grows a lot when the token grows big. Or I might be wrong. And in time forwardRange() actually puts in in 14th spot, and that is after removing the time in function calls.I have to admit I am not sure how much I can trust this profiler results. But it probably gives some guidance anyway.
Just a small comment. Some things might light very big in profiling. But, running profiling removes the benefit of optimization. But at least it shows the number of calls. That can be important.
I think you have profiled a debug build without match compiler.
In a release build the Token::multiCompare() should not be called from such Token::Match pattern.
Try:
Yes, it is true multiCompare() only show up if you do not use the match compiler. But even with the match compiler adding the isKeyword() gives little performance boost. Running the file in #10933 with isKeyword() takes 41m23s and without it takes 42m41s, both with match compiler and -O2 -DNDEBUG. But it is small, and I think it does not help understanding the reason for the code. -g and no match compiler and no isKeyword() takes 276m23s. Same compile but -pg instead of -g to run the profile takes like over 1300m.
But still, why does forwardRange() in #10933 take forever.
So I now also did a run where I had the same settings as John Borland had in his run on another thread. And using perf instead of -pg to generate a FlameGraph. It's all forwardRange(). The graph I attach to this thread is run with the isKeyword(), but removing that will give the same graph. I might do a higher sample frequency to see if something else will show.
I can say increasing the rate of the perf record shouldn't change the overall shape of the flame graph. For the most part it will just increase the amount of overhead perf is using and half to dump more data to user space. I find that the 99 Hertz rate is pretty good for both long and short runs. One important factor in picking a rate is making sure it's an odd numbed rate to avoid sampling in lockstep with other activity and producing misleading results.
No, it did not. And now when I found the real problem the flamegraph is really boring.
Did you find the flamegraph useful in finding where to start looking for your performance issue?
Yes and no. I could see things you would like to remove/reduce. But the real reason for the really slow execution was that instead of having more or less 'flat' flow with only one level of recursion there was overflow which caused where much recursion and very slow.
I was thinking it might be useful to add some sort of cmake build type that would turn on the match compiler and set -g -fno-omit-frame-pointer -O2 for the purpose of making flame graphs. I think that would make it easier for people to find performance regressions or at least a starting thread to pull on for finding performance issues. I could even do a little write up for how to do it, but I'm not sure where to put it. Maybe at some point we might see flame graphs and flame graph diffs for some of the daca@home runs.
We could possibly add such checks to the match compiler, so it will automatically be added when checking for keywords.
Yes, that might be one way to go. I 'fear' that the way I put in the isKeyword() check now, works and 'looks' OK when you do not think about match compiler. So if the match compiler knows about it, the resulting code might get more 'correct'.
So some more info. When you look at the flame graph it might be the fact that memory allocation takes up to much of the time in forwardRange()/forewardRecursive(). And since the 'algorithm' is recursive, it adds up to memory allocation takes a lot of the time in the #10933 example.
Why do we want forwardRange() to go to the end of the file and not to the end of the scope?? In the case of #10933, the example file does add 'many' rows to the info.errorPath, which makes the 'copy' into a new call to forwardRange() very costly. If I reduce the number of rows in info.errorPath, the time is very much reduced.
Actually, it seem that the thing that triggered the scan to the end of the file is that in the example file in #10933 the if conditions are actually assignments not compare. So I am now looking on the code in forwardRange() that looks at the condition. If you change the ifs to == instead of = it finishes in no time.
To me now, it looks like even if you have an endToken into forwardRange() it continues past that end token cause some actions will end on the endToken, and then there is a ++ end then a check if endToken. But then we are past already.
So, the use of isKeyword() might be good. But with the 'fix' I put in as a PR #4235 this optimization might not be so needed. The fix instead tries to stop overstepping endToken in the loop in forwardRange().