I think this would be a good compromise in my situation, but I guess you still need to find the extra check that is not performing well for our code.
Is there something obvious in the 1.82 release that might be impacting things? If you could provide a build with it temporarily disabled I'm happy to try it out.
If we find the cause then making it something I can optionally disable would be fine.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Is there something obvious in the 1.82 release that might be impacting things?
Feel free to run both 1.81 and 1.82 with the --showtime=top5 option. I would expect that improved handling of templates or types or something like that is the "problem".
Last edit: Daniel Marjamäki 2019-03-01
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I'll give this a go when I have a chance (at best next week). I wouldn't say we have particular heavy use of templates, but we do use a couple of non-standard containers that are templatised and might be part of the issue here.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
This has nothing to do with the difference in speed between 1.81 and 1.82 but I see the posibility of a significant speedup of the template code if we have perfect detection of instantiations outside of template declarations and perfect detection of new instantiation when we expand template declarations. We currently constant fold all code following each instantiation when we really only need to constant fold the instantiation arguments. That could mean a large speedup when there are lots of instantiations in large files.
We currently don't have perfect detection of intsantiations outside of template declarations and we have poor or no detection of new instaniations when we expand a template definition. That means we need the constant folding of the remaining part of the file after each instantiation and an extra instantiation cleanup pass of the file to find the missed instantiations.
We could possible reduce hundreds or thousands of partial passes over the code for a large file with a lot of instantiations to 2 full passes for unnested templates. Nested templates should add 2 more passes for each level of nesting.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
in my humble opinion, you are just making wild guesses.
The checks do not take so much time. If you comment out all our checks then cppcheck will still be roughly as slow as it is today. If that would give you a 2x speed difference then we would have such option.
In theory I agree that the string comparisons could be an opportunity. I tried to replace those with enum comparisons. And I saw no measureable speed difference.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Did I present educated guesses once more because of experiences from computer science?
The checks do not take so much time.
They represent significant work, don't they?
If you comment out all our checks then cppcheck will still be roughly as slow as it is today.
I find such a feedback strange.
How would you like to avoid undesirable software slowness then?
If that would give you a 2x speed difference then we would have such option.
I would find improvements nice also if they would contribute an effect on a smaller scale.
In theory I agree that the string comparisons could be an opportunity.
Thanks for such feedback.
I tried to replace those with enum comparisons.
Would anybody like to try similar approaches out again?
And I saw no measureable speed difference.
I imagine that there was something to measure. But the possible differences might have looked not big enough for additional software development considerations.
How diverse were the corresponding test environments?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Preliminary testing shows about a 5% speedup by just doing constant folding on template instantiation arguments on a large file with a lot of templates so it looks promising. Trying the next step of removing the cleanup pass isn't working and doesn't look easy to fix.
I'll clean up the first step code and do some real testing to make sure it doesn't break things that can't be fixed.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
It might be possible to speed up the handling of headers.
Implement "precompiled headers"
As ticket 9007 shows the analysis can be really slow even for a small file that has 3 lines of code. https://trac.cppcheck.net/ticket/9007
Even though we do not use much information from the header all the unused declarations and stuff slows down Cppcheck. I believe we can make cppcheck a lot faster if we remove stuff in the header that we do not need.
Sorry.. but if you like these ideas .. I wonder if any of you are working at a company that could be willing to pay for some such work?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I consider this a correctness fix rather than an optimization. This could speed up template code in large files or files with many includes that have many instantiations. It could also slow down checking because constant folding is not happening to the remainder of the code following the instantiation. This means the checks could have more code to process and the may have to work harder to figure out the new code is just a constant. The improved constant folding could also cause more templates to be expanded which could decrease performance because there is more code to check.
I did not see a performance change when checking btflag_test.cpp.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Yesterday I observed the Donate CPU client processing a package for several hours. I waited and waited but had to restart the computer so i finally aborted the analysis. The package it analyzed was CImg (http://cimg.eu). A one header file library for image processing that seems to heavily use templates according to the homepage. The header file is about 3 MB in size. When i aborted Cppcheck it had allocated more than 3 GB of RAM. Maybe a good candidate for testing potential improvements or profile the code to find "slow" code parts?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Yes it sounds like a good candidate. If you good candidates for optimisations / profiling ... please feel free to add it in cppcheck/benchmarks.txt. I have updated that file now.
Last edit: Daniel Marjamäki 2019-03-04
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
For information it takes ~10 seconds to check that file if you use --check-headers=no.
The logic is very aggressive, but I think it shows that if we have an option to properly remove unused templates/etc in headers then users could achieve a lot faster analysis without getting degraded analysis..
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I ran cppcheck on CImg. It took 169 minutes to finish the full analysis. And as I said, with --check-headers=no it takes ~10 seconds. This is an extreme case though, the difference is usually much smaller.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I have been investigating CImg and it is interesting. It instantiates a huge class around 2 dozen times. Each instantiation causes around 9K new instantiations. We end up with 200K instantiations. That's a lot of instantiations to search through for each of the 700 + template definitions.
I noticed we were not always deleting instantiations after processing them and I have a fix for that but it doesn't make that much difference for a worst case scenario like this.
We could split TokenAndName into 2 classes to save some memory for instantiations.
We could split the large instantiations list into per template definition lists so we don't have to search the big list at all. That should turn an D * I complexity into an I complexity.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I think this would be a good compromise in my situation, but I guess you still need to find the extra check that is not performing well for our code.
Is there something obvious in the 1.82 release that might be impacting things? If you could provide a build with it temporarily disabled I'm happy to try it out.
If we find the cause then making it something I can optionally disable would be fine.
Do you use templates also?
Feel free to run both 1.81 and 1.82 with the
--showtime=top5option. I would expect that improved handling of templates or types or something like that is the "problem".Last edit: Daniel Marjamäki 2019-03-01
I'll give this a go when I have a chance (at best next week). I wouldn't say we have particular heavy use of templates, but we do use a couple of non-standard containers that are templatised and might be part of the issue here.
This has nothing to do with the difference in speed between 1.81 and 1.82 but I see the posibility of a significant speedup of the template code if we have perfect detection of instantiations outside of template declarations and perfect detection of new instantiation when we expand template declarations. We currently constant fold all code following each instantiation when we really only need to constant fold the instantiation arguments. That could mean a large speedup when there are lots of instantiations in large files.
We currently don't have perfect detection of intsantiations outside of template declarations and we have poor or no detection of new instaniations when we expand a template definition. That means we need the constant folding of the remaining part of the file after each instantiation and an extra instantiation cleanup pass of the file to find the missed instantiations.
We could possible reduce hundreds or thousands of partial passes over the code for a large file with a lot of instantiations to 2 full passes for unnested templates. Nested templates should add 2 more passes for each level of nesting.
How do you think about other update candidates besides data processing for class templates?
Your question is very vague. So I can't have any opinion about it. If you see a real problem then you should be able to say something concrete.
in my humble opinion, you are just making wild guesses.
The checks do not take so much time. If you comment out all our checks then cppcheck will still be roughly as slow as it is today. If that would give you a 2x speed difference then we would have such option.
In theory I agree that the string comparisons could be an opportunity. I tried to replace those with enum comparisons. And I saw no measureable speed difference.
Did I present educated guesses once more because of experiences from computer science?
They represent significant work, don't they?
I find such a feedback strange.
How would you like to avoid undesirable software slowness then?
I would find improvements nice also if they would contribute an effect on a smaller scale.
Thanks for such feedback.
Would anybody like to try similar approaches out again?
Preliminary testing shows about a 5% speedup by just doing constant folding on template instantiation arguments on a large file with a lot of templates so it looks promising. Trying the next step of removing the cleanup pass isn't working and doesn't look easy to fix.
I'll clean up the first step code and do some real testing to make sure it doesn't break things that can't be fixed.
I am happy that You found possible reason and already have idea how to speed up.
If you want - I can test your version on our project.
It might be possible to speed up the handling of headers.
Implement "precompiled headers"
As ticket 9007 shows the analysis can be really slow even for a small file that has 3 lines of code.
https://trac.cppcheck.net/ticket/9007
Even though we do not use much information from the header all the unused declarations and stuff slows down Cppcheck. I believe we can make cppcheck a lot faster if we remove stuff in the header that we do not need.
Sorry.. but if you like these ideas .. I wonder if any of you are working at a company that could be willing to pay for some such work?
I have a pull request here: https://github.com/danmar/cppcheck/pull/1721 that limits constant folding to template arguments. It also improves constant folding.
I consider this a correctness fix rather than an optimization. This could speed up template code in large files or files with many includes that have many instantiations. It could also slow down checking because constant folding is not happening to the remainder of the code following the instantiation. This means the checks could have more code to process and the may have to work harder to figure out the new code is just a constant. The improved constant folding could also cause more templates to be expanded which could decrease performance because there is more code to check.
I did not see a performance change when checking btflag_test.cpp.
Yesterday I observed the Donate CPU client processing a package for several hours. I waited and waited but had to restart the computer so i finally aborted the analysis. The package it analyzed was CImg (http://cimg.eu). A one header file library for image processing that seems to heavily use templates according to the homepage. The header file is about 3 MB in size. When i aborted Cppcheck it had allocated more than 3 GB of RAM. Maybe a good candidate for testing potential improvements or profile the code to find "slow" code parts?
Thank for the info. It seems to only hang on one file so it's probably a bug. I'm looking into it.
Yes it sounds like a good candidate. If you good candidates for optimisations / profiling ... please feel free to add it in
cppcheck/benchmarks.txt. I have updated that file now.Last edit: Daniel Marjamäki 2019-03-04
For information it takes ~10 seconds to check that file if you use
--check-headers=no.The logic is very aggressive, but I think it shows that if we have an option to properly remove unused templates/etc in headers then users could achieve a lot faster analysis without getting degraded analysis..
I ran cppcheck on CImg. It took 169 minutes to finish the full analysis. And as I said, with
--check-headers=noit takes ~10 seconds. This is an extreme case though, the difference is usually much smaller.I have been investigating CImg and it is interesting. It instantiates a huge class around 2 dozen times. Each instantiation causes around 9K new instantiations. We end up with 200K instantiations. That's a lot of instantiations to search through for each of the 700 + template definitions.
I noticed we were not always deleting instantiations after processing them and I have a fix for that but it doesn't make that much difference for a worst case scenario like this.
We could split TokenAndName into 2 classes to save some memory for instantiations.
We could split the large instantiations list into per template definition lists so we don't have to search the big list at all. That should turn an D * I complexity into an I complexity.