Do you have some ideas how we can create a flexible AST matcher?
One idea is:
astMatch(token, pattern)
where the pattern can be a string literal that describes what to look for. I don't know how to write this pattern in a elegant way. Any ideas? If we somehow reuse %str% and those stuff from Token::Match I think that would be good.
A + that adds a string and number in RPN notation:
"%str% %num% +"
Another idea is to use some clang-tidy inspired pattern matching. we might be able to specify more sophisticated matchers then that checks valuetype and stuff.
after some thinking I am thinking that a clang-tidy inspired pattern matching would be nice. we will not be able to directly copy code to/from clang-tidy. But if anybody knows clang-query or something then looking at cppcheck code will be familiar. and vice versa.
Last edit: Daniel Marjamäki 2018-11-23
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
The nice thing about clang's matchers is they provide a general-purpose way to compose a bunch of predicates. These matchers can match more than just tokens, it could also match ValueFlow values or symbols. I see matching something like this:
And the above could be implemented with something like this:
structMatcherContext{std::unordered_map<std::string,constToken*>tokens;template<classT>voidbind(std::stringname,constT*x){throwstd::runtime_exception("Type is not bindable");}voidbind(std::stringname,constToken*tok){tokens[name]=tok;}};//Generalpuporsematcherobject,thiscanchangethetypeitstraversingto//supporttraversingmorethantokenstemplate<classT,classR=T>structMatcher{std::function<constR*(MatcherContext&,constT*)>match;//SetupbindingtokensMatcherbind(std::stringname)const{autom=match;automf=[m,name](MatcherContext&ctx,constT*x)->constR*{constR*result=m(ctx,x);if(result)ctx.bind(name,x);returnresult;};returnMatcher{mf};}//Composeasubmatchtemplate<classU>Matcher<T,R>operator()(Matcher<R,U>subMatch)const{autom=match;automf=[m,subMatch](MatcherContext&ctx,constT*x)->constR*{constR*result=m(ctx,x);if(result&&subMatch(ctx,result))returnresult;elsereturnnullptr;};}};structMatchResult{constToken*tok;std::unordered_map<std::string,constToken*>tokens;};MatchResultastMatch(constToken*tok,Matcher<Token>m){MatchResultresult;MatcherContextctx;constToken*tok=m.match(ctx,tok);result.tok=tok;result.tokens=ctx.tokens;returnresult;}//Thepredicates:Matcher<Token>pattern(std::stringp){returnMatcher<Token>{[=](MatcherContext&ctx,constToken*tok){if(Token::Match(tok,p.c_str()))returntok;elsereturnnullptr;}};}Matcher<Token>binary(Matcher<Token>op1,Matcher<Token>op2){returnMatcher<Token>{[=](MatcherContext&ctx,constToken*tok){if(!tok->astOperand1()&&!tok->astOperand2())returnnullptr;if(op1.match(ctx,tok->astOperand1())&&op2.match(ctx,tok->astOperand2()))returntok;}};}
More can be built on top of this to improve it more. We can add allOf matcher so we can match several predicates for the same token. We could also register the matchers and build a parser so users can run them with custom rules or straight cppcheck CLI.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
The nice thing about clang's matchers is they provide a general-purpose way to compose a bunch of predicates. These matchers can match more than just tokens, it could also match ValueFlow values or symbols.
I agree. That is a huge advantage.
We could also register the matchers and build a parser so users can run them with custom rules or straight cppcheck CLI.
Yes imho that should be our goal. The rule files will be a lot more interesting with this functionality.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I envision that the ast matcher will be a hotspot so we will probably need to optimise it as much as possible, especially in release builds.
There is a bottleneck with std::function. Instead we could make it a template parameter so the compiler can more easily inline it.
I assume a new Matcher<token> object will be created every loop?</token>
Possibly. Perhaps with a template the compiler might optimize it out. Alternatively, we could support finders so we can search for multiple matchers at the same time, and CSE can optimize the duplicate conditions across several matchers.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I am not sure about using templates to force inlines, it increases the translation unit size. We can look at it also later and see what the difference will be.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I suggest that you create a working solution first. And we worry about optimisations later.
I do not see why Matcher needs to be a template. It seems to me that a Scope matcher and Token matcher will be quite different. But we can look into that later.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I am not sure about using templates to force inlines
It doesn't force inline, it just allows inline.
it increases the translation unit size.
The goal is to improve runtime performance not compile-time performance. As shown by Vittorio Romeo, a template parameter(over std::function and a function pointer) will produce the smallest assembly when compiled with -O2 or greater:
But more importantly than the zero overhead function call of a template parameter, is that if we combine multiple matchers together CSE can remove the duplicate conditions. This is not possible with std::function or a function pointer.
I do not see why Matcher needs to be a template. It seems to me that a Scope matcher and Token matcher will be quite different.
They should be the same except the types. The uniformity is necessary to be able to use the matchers together. For example, if we want to check for variables declared as a class or struct we could write something like this:
Do you have some ideas how we can create a flexible AST matcher?
One idea is:
where the pattern can be a string literal that describes what to look for. I don't know how to write this pattern in a elegant way. Any ideas? If we somehow reuse %str% and those stuff from Token::Match I think that would be good.
A
+
that adds a string and number in RPN notation:Another idea is to use some clang-tidy inspired pattern matching. we might be able to specify more sophisticated matchers then that checks valuetype and stuff.
What do you think?
Will your software development interest increase for the technology “computation tree logic”?
I don't know how you propose that computation tree logic can be used.
after some thinking I am thinking that a clang-tidy inspired pattern matching would be nice. we will not be able to directly copy code to/from clang-tidy. But if anybody knows clang-query or something then looking at cppcheck code will be familiar. and vice versa.
Last edit: Daniel Marjamäki 2018-11-23
The nice thing about clang's matchers is they provide a general-purpose way to compose a bunch of predicates. These matchers can match more than just tokens, it could also match ValueFlow values or symbols. I see matching something like this:
More importantly, we can support binding tokens as well:
And the above could be implemented with something like this:
More can be built on top of this to improve it more. We can add
allOf
matcher so we can match several predicates for the same token. We could also register the matchers and build a parser so users can run them with custom rules or straight cppcheck CLI.I agree. That is a huge advantage.
Yes imho that should be our goal. The rule files will be a lot more interesting with this functionality.
I think your example is a very good start.
I envision that the ast matcher will be a hotspot so we will probably need to optimise it as much as possible, especially in release builds.
I am guessing that with your code there will be overhead for such code:
I assume a new Matcher<token> object will be created every loop?</token>
I believe we can optimise it with MatchCompiler so the object is created before the loop starts.
Feel free to create a PR with your code and we can start using it!
There is a bottleneck with
std::function
. Instead we could make it a template parameter so the compiler can more easily inline it.Possibly. Perhaps with a template the compiler might optimize it out. Alternatively, we could support finders so we can search for multiple matchers at the same time, and CSE can optimize the duplicate conditions across several matchers.
I am not sure about using templates to force inlines, it increases the translation unit size. We can look at it also later and see what the difference will be.
I suggest that you create a working solution first. And we worry about optimisations later.
I do not see why Matcher needs to be a template. It seems to me that a Scope matcher and Token matcher will be quite different. But we can look into that later.
btw,, I also made a POC. In my approach the
binaryOperator()
etc are creating a AstMatcher struct. But it does not perform any matching.The AstMatcher structs should be created once and reused.
It doesn't force inline, it just allows inline.
The goal is to improve runtime performance not compile-time performance. As shown by Vittorio Romeo, a template parameter(over
std::function
and a function pointer) will produce the smallest assembly when compiled with-O2
or greater:https://vittorioromeo.info/index/blog/passing_functions_to_functions.html
(here is the website from google cache: https://webcache.googleusercontent.com/search?q=cache:-leiPhnPQTYJ:https://vittorioromeo.info/index/blog/passing_functions_to_functions.html+&cd=1&hl=en&ct=clnk&gl=us&client=firefox-b-1)
But more importantly than the zero overhead function call of a template parameter, is that if we combine multiple matchers together CSE can remove the duplicate conditions. This is not possible with
std::function
or a function pointer.They should be the same except the types. The uniformity is necessary to be able to use the matchers together. For example, if we want to check for variables declared as a class or struct we could write something like this:
This starts by matching a
Token
then aVariable
then aScope
and then aToken
again. This is just one example of course.ok!
I am hoping you will open a PR.
Personally I would like to have a proof of concept soonish that we can try out.
I do think it is more important that it's a clean, simple and intuitive framework than a fast framework. Sorry if I "side tracked" these discussions.
I spontaneously think your suggestion seems to be clean and intuitive. The checker code should become much simpler.
Last edit: Daniel Marjamäki 2018-11-25