cppcheck / Discussion / Development: Software evolution around Token::simpleMatch

Markus Elfring - 2018-04-18

I have taken another look at the implementation of the function “Token::simpleMatch” and its usage. Now I see further opportunities for corresponding software adjustments.

I would appreciate if a C++ reference can be used for the parameter “tok”. A sanity check for a passed null pointer could be omitted then.

Can the run time characteristics become a bit nicer if an other data type will be used for the parameter “pattern” (instead of a string)? How are the chances to avoid the search for delimiting space characters?

Would you like to represent any patterns by optimised data structures?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Markus Elfring - 2018-04-18
  
  Strings are passed for the parameter “pattern” of the function “Token::simpleMatch”. They contain only a single character or a string without spaces in some function calls.
  
  I suggest to introduce functions which will be more efficient in such use cases.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Markus Elfring - 2018-04-19
    
    How do you think about to integrate the following predicate functions?
    
    static bool is_equal(Token const& to, char const text[]) { return to._str == text; } static bool is_equal(Token const& to, char const c) { return to._str[0] == c; }
    
    Would you like to apply the following script for the semantic patch language?
    
    @initialize:python@ @@ import re @find_update_candidate@ constant input_string; expression input_context; position pos; @@ Token::simpleMatch@pos(input_context, input_string) @script:python selection@ param << find_update_candidate.input_string; pos << find_update_candidate.pos; text; @@ match = re.match('"([^ "]+)"', param) if match: content = match.group(1) if (len(content) == 1) or ((len(content) == 2) and (content[0] == '\\')): content = ''.join(["'", content, "'"]) else: content = param coccinelle.text = cocci.make_expr(content) else: cocci.include_match(False) @replacement@ constant find_update_candidate.input_string, selection.text; expression find_update_candidate.input_context; position find_update_candidate.pos; @@ -Token::simpleMatch@pos +Token::is_equal ( + *( input_context + ) , -input_string +text )
    
    Command example:
    
    elfring@Sonne:~/Projekte/cppcheck/lokal> spatch --c++ --timeout 123 -j 4 --dir . ~/Projekte/Coccinelle/janitor/compare_with_character_or_string2.cocci
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
    - Daniel Marjamäki - 2018-04-22
      
      I am not against that is_equal is used in some situations instead of Token::simpleMatch but as far as I see, if is_equal is used instead of Token::simpleMatch when the pointer is NULL then we will get a crash.
      
      The name is_equal is more readable than Token::simpleMatch imho.. if you just want to look at a single token.
      
      But the is_equal that takes a char does not check if the Token::str() is equal. It only checks the first char. Imagine that tok is "++" and you call is_equal(tok, '+'); You can check the string length in is_equal but then what is the point to not have a std::string == comparison instead?
      
      If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
      - Markus Elfring - 2018-04-22
        
        I am not against that is_equal is used in some situations instead of Token::simpleMatch
        
        This is nice.
        
        but as far as I see, if is_equal is used instead of Token::simpleMatch when the pointer is NULL then we will get a crash.
        
        It seems that this could occur according to your test case while I do not want it to happen finally.
        
        It only checks the first char.
        
        I expected for my transformation suggestion that the checked token object will not contain additional characters.
        
        You can check the string length in is_equal
        
        Would you like to consider any additional function overloads together with derived classes?
        How do you think about throwing a C++ exception by an other function implementation if a data type mismatch would occur?
        
        but then what is the point to not have a std::string == comparison instead?
        
        There can be a bit more software fine-tuning needed.
        
        If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Daniel Marjamäki - 2018-04-21

Sometimes Token::simpleMatch receives a null pointer. I expect that it returns false then.

I am not 100% against coccinelle scripts but it would imho be better that you just tweak the tools/matchcompiler.py. We currently don't need coccinelle for compilation.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Markus Elfring - 2018-04-21
  
  Sometimes Token::simpleMatch receives a null pointer. I expect that it returns false then.
  
  This member function was designed in this way and is fine for the general use case.
  
  We currently don't need coccinelle for compilation.
  
  I suggest to use this development tool also for your software just for a very specific source code transformation.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Daniel Marjamäki - 2018-04-21

I also think it is weird looking to have a Token reference. We use pointers for Token (almost?) everywhere else so it is more natural with pointer than reference..

A sanity check for a passed null pointer could be omitted then.

It is technically invalid with a null reference. However compilers happily generates code that produces null references.

The argument to prefer reference instead of pointer is as far as I know that you'll get tool warnings then if you try to call the function with a null pointer. Well if you use a static analysis tool that can analyze the function calls it should not matter.. it should see that you have a null pointer dereference anyway. I am not claiming that static analysis tools are perfect but they are getting better and better.

People should not write unreadable code to help tools write more warnings because sooner or later those tweaks are not needed and all that is accomplished is that the code is less readable.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Markus Elfring - 2018-04-21
  
  I also think it is weird looking to have a Token reference.
  
  I wonder about such a view.
  
  We use pointers for Token (almost?) everywhere else so it is more natural with pointer
  
  You got used to this implementation detail.
  
  than reference..
  
  I hope that the software situation can be extended there.
  
  Well if you use a static analysis tool that can analyze the function calls it should not matter.
  
  It matters if a better software design can prevent you from categories of programming errors at compile-time, doesn't it?
  
  Would you like to care any more if a passed null pointer will be tolerated for a function parameter (or not) as another documentation hint?
  
  … the code is less readable.
  
  There can be different preferences involved for source code readability.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Daniel Marjamäki - 2018-04-23

There can be different preferences involved for source code readability.

Yes! And I feel that I am somewhat open for discussion.

But it feels like I have told you "no I am not against pointers" 10 times now already.

I am the lead developer of this project so you will just have to live with my opinion. You seem to think that I should adapt to your opinions.

It matters if a better software design can prevent you from categories of programming errors at compile-time, doesn't it?

I like to prevent categories of programming errors at compile-time... but I dislike naive tweaks without caring what tools are doing.

Naive micro optimisations can mean that performance is hurt.

Naive code style tweaks to make safer code can cause bugs or prevent that bugs are found. I am thinking mostly about:
* use auto and always initialize variable
* fix signedness conversion warnings using casts
* prefer '1==x' instead of 'x==1' in conditions to prevent unintended assignments
* use reference instead of pointer : pointless advice if you use static analysis that catch the bugs anyway. And pointless advice if you don't use static analysis - the bugs are not found at compile time anyway.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Markus Elfring - 2018-04-23
  
  You seem to think that I should adapt to your opinions.
  
  I hope so to some degree.
  
  This software implementation was adjusted by various contributors in some ways through the years. It will evolve further after corresponding constructive software development discussions.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Software evolution around Token::simpleMatch

Static source code analysis tool for C and C++ code

Forums

Help

Software evolution around Token::simpleMatch

Software evolution around Token::simpleMatch

Static source code analysis tool for C and C++ code

Forums

Help

Software evolution around Token::simpleMatch document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

Software evolution around Token::simpleMatch