Menu

Cppcheckdata.parsedump() in loop appends to rawTokens

samed
2021-01-07
2021-01-08
  • samed

    samed - 2021-01-07

    What I'm doing is basically this. I invoke a script with a bunch of different dumps:

    python checker.py .\1.cpp.dump .\2.cpp.dump ...
    

    In code, I parse the arguments and check them in loop like this:

    for dump in args.dumps:
        parsedDump = Cppcheckdata.parsedump(dump)
        ...
    

    The thing is, in the second iteration, data in parsedDump.rawTokens from the first iteration is not destroy and second iteration's data is simply appended to first one's. I'm not very good on Python but I think it's related to how the Cppcheckdata class is created. Any help is appreciated.

     
  • Daniel Marjamäki

    hmm sounds very strange.. that should not happen. If you get that behavior then it's not your fault but some bug in Cppcheckdata!

     
  • Georgiy Komarov

    Georgiy Komarov - 2021-01-07

    Could you provide the source code of your script or a minimal example that reproduces this problem?

     
  • samed

    samed - 2021-01-08
    import cppcheckdata as f
    import argparse
    
    class CppCheckFormatter(argparse.HelpFormatter):
        """
        Properly formats multiline argument helps
        """
        def _split_lines(self, text, width):
            # this is the RawTextHelpFormatter._split_lines
            if text.startswith('R|'):
                return text[2:].splitlines()
            return argparse.HelpFormatter._split_lines(self, text, width)
    
    def ArgumentParser():
    
        parser = argparse.ArgumentParser(formatter_class=CppCheckFormatter)
        parser.add_argument("dumpfile", nargs='*',
                            help="Path of dump files from cppcheck.")
        parser.add_argument("--cli",
                            help="Addon is executed from command line.",
                            action="store_true")
        return parser
    
    def main():
        #Get path and parse data
        parser = ArgumentParser()
        args = parser.parse_args()
        #Invoke checkers for each dumpfile
        for dump in args.dumpfile: 
            parsedDump = f.parsedump(dump)
            print("Dump name:")
            print(dump)
            print(parsedDump)
            print("Raw tokens:")
            print(*parsedDump.rawTokens, sep ="\n") #Check tokens in rawToken. When you run it with multiple files, tokens just get appended
    
            for cfg in parsedDump.configurations: #For comparison
                print("Configuration")
                print("cfg")
                print("Tokens")
                for token in cfg.tokenlist:
                    print
                    print(token) #Tokens are okay.
    
    if __name__ == "__main__":
        main()
    

    Create multiple dumps and run like this:
    python code.py 1.cpp.dump 2.cpp.dump

    Edit:
    I think problem lies in the parsedDump = f.parsedump(dump) where content of Raw Tokens doesn't get replaced and instead gets appended. Which means problem should be somewhere in Cppcheck data. parsedump(filename) calls Cppcheckdata(filename) which creates a class object then calls __init(...) to initialize class variables. Which means somewhere in Cppcheck data in the below lines something should be going wrong:

    # Parse general configuration options from <dumps> node
            # We intentionally don't clean node resources here because we
            # want to serialize in memory only small part of the XML tree.
            for event, node in ElementTree.iterparse(self.filename, events=('start', 'end')):
                if platform_done and rawtokens_done and suppressions_done:
                    break
                if node.tag == 'platform' and event == 'start':
                    self.platform = Platform(node)
                    platform_done = True
                elif node.tag == 'rawtokens' and event == 'end':
                    for rawtokens_node in node:
                        if rawtokens_node.tag == 'file':
                            files.append(rawtokens_node.get('name'))
                            self.files.append(rawtokens_node.get('name'))
                        elif rawtokens_node.tag == 'tok':
                            tok = Token(rawtokens_node)
                            tok.file = files[int(rawtokens_node.get('fileIndex'))]
                            self.rawTokens.append(tok)
                    rawtokens_done = True
                elif node.tag == 'suppressions' and event == 'end':
                    for suppressions_node in node:
                        self.suppressions.append(Suppression(suppressions_node))
                    suppressions_done = True
    
            # Set links between rawTokens.
            for i in range(len(self.rawTokens)-1):
                self.rawTokens[i+1].previous = self.rawTokens[i]
                self.rawTokens[i].next = self.rawTokens[i+1]
    

    I guess it ElementTree that causes the behaviour. Though I have no idea why data persists from one call to other. Wouldn't every object get destroyed after the function ends?

     

    Last edit: samed 2021-01-08
  • samed

    samed - 2021-01-08

    I guess I have found the culprit. This read explains it (and this one is a good addition). Apparently in python we have class attributes and instance attributes. Since rawTokens is a class attributes and is mutable (list), when you append to it, it doesn't promote to instance variable which means when you call parsedump() multiple times data just gets appended to it. If you add self.rawTokens = [] to __init__()(or simple remove rawTokens from class and define it in __init__() to make it an instance variable) it fixes the situation.

     
  • Georgiy Komarov

    Georgiy Komarov - 2021-01-08

    Nice catch! Thank you very much.

    We've never seen this issue, because we normally call the addons through cppcheck binary. So we never used it with multiple dump files.

    This PR will fix it: https://github.com/danmar/cppcheck/pull/3029

     
  • samed

    samed - 2021-01-08

    Glad I could help. On a quick note though. I use files attribute in my checker. I suggest you make it accessible from outside.

     
    • Georgiy Komarov

      Georgiy Komarov - 2021-01-08

      https://github.com/danmar/cppcheck/pull/3030

      @samed You offer the good improvements in this area. Feel free to open PRs in cppcheck. I think, it will be easier for you and allows to discuss/merge new features faster.

       

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.