cppcheck / Discussion / Development: Generic forward analyzers

Paul Fultz - 2019-11-08

To better improve our analysis, it would be nice to unify all of our forward analyzers such as valueFlowForward, valueFlowContainerForward, and FwdAnalysis. Each one contain slightly different ways to traverse through the tokens, and when we fix bugs in one, it may still exist in another. It also would allow us to add more complicated forwarding functions such as for multiple variables or iterator sizes.

Instead, we can have a generic forward analysis function, and a class can be passed in to handle the different scenarios. Something like this:

struct ForwardAnalyzer { enum class Action { None, Read, Write, Invalid, Inconclusive }; Action Analyze(const Token* tok) const; void Update(Token* tok, Action a); std::vector<int> Evaluate(const Token* tok) const; }; void valueFlowGenericForward(Token* start, const Token* end, ForwardAnalyzer& fa);

The ForwardAnalyzer class could be an abstract base class with the methods defined as virtual. However, this would make the class not easily copyable and we may want to make copies if we want to bifurcate during traversal(although we could write something like clone_ptr to help with this).

Another option would be to make valueFlowGenericForward a template, then the class could be easily copied and through inlining we could get better performance but compile-time would be much slower.

A third option would be to use type erasure. This makes the class copyable, and uses virtual dispatch so compile-time wont suffer. However, it does require quite a bit of boilerplate.

The first method to provide is the Analyze method that returns an Action as I have called it(although there could be a better name for this). From here it decides:

None: The token is not of interest.

Read: The token represents the variable or expression of interest, but it is not modified

Write: This is the token of the variable or expression of interest that is being modified in a way that it can update the value internally.

Invalid: This means the state has been modified in a way that the analysis cannot proceed. This can be a variabel modified by a function that the analysis doesnt understand, or it could be a function call that modifies the value indirectly(ie modifiy one of the variables of the expression or modifying the contianer size)

Inconclusive: This means that the token could be Invalid, but it is not conclusive.

After Analyze is called it will call Update to either update the ValueFlow for the token or update the analyzer's internal state. The reason these two calls are seperate is so it can analyze code(such as branches with unknown conditions) where it may not be updating the values, but it may decide to stop analysis.

The Evaluate method is used to evaluate the conditions or switch statements. Most likely, it will use something like ProgramMemory to evaluate the value.

We may want to have another method to control how it traverses into different control structures. For example, lifetime values usually traverse directly into the lambda, but other values it doesnt make sense.

I wrote this up to get feedback from the other cppcheck devs on the design before starting to implement such a thing. Since, its a big change it would most likely happen after the release.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Daniel Marjamäki - 2019-11-09

My immediate reaction is that it's definitely good if we can avoid having separate functions. It is stupid to have separate valueflow functions for plain variables and containers as we have today.

I am not sure if it's so easy to reuse it for "redundant assignment" though. In "normal" analysis you handle each execution path totally separately. But in "redundant assignment" analysis we want to have a combined analysis.

Maybe we can still reuse the same Action logic.

I am not enthustiastic about lambdas and templates. If that is necessary then fine. But if there is not a clear advantage (code duplication etc) then try to use inheritance and virtual methods etc instead.

So please feel free to refactor this. I think I would prefer if you can do this in steps.. so each PR is easy to review.

Since, its a big change it would most likely happen after the release.

yes.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Paul Fultz - 2019-11-12

I am not sure if it's so easy to reuse it for "redundant assignment" though. In "normal" analysis you handle each execution path totally separately. But in "redundant assignment" analysis we want to have a combined analysis.

We might be able to combine them both as well, but I am not sure. How does combined analysis exactly work?

I am not enthustiastic about lambdas and templates. If that is necessary then fine. But if there is not a clear advantage (code duplication etc) then try to use inheritance and virtual methods etc instead.

The problem with inheritance is we need to use a pointer, which makes the ability to do rollbacks or bifurcating harder. Type erasure would be ideal but has a lot of boilerplate.

I am thinking about using something like a value_ptr:

https://github.com/martinmoene/value-ptr-lite

This way we can use virtual methods and inheritance but we dont need to use pointer-like semantics. I don't think we need all those featues in that library. I think a simple class with a pointer and a function pointer should be simple to implement.

So please feel free to refactor this. I think I would prefer if you can do this in steps.. so each PR is easy to review.

I will work on a PR for next release. Ideally, I would like to refactor all of them so it can be better tested, but then I can split them up afterwards to make the changes smaller for review.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Daniel Marjamäki - 2019-11-13

We might be able to combine them both as well, but I am not sure. How does combined analysis exactly work?

We report redundant assignment if a variable is reassigned in all execution paths. if there is a path where it is not reassigned no warning is written.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Generic forward analyzers

Static source code analysis tool for C and C++ code

Forums

Help

Generic forward analyzers

Generic forward analyzers

Static source code analysis tool for C and C++ code

Forums

Help

Generic forward analyzers document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

Generic forward analyzers