clonedigger-general Mailing List for Clone Digger
Status: Beta
Brought to you by:
peter_bulychev
You can subscribe to this list here.
2008 |
Jan
|
Feb
|
Mar
(18) |
Apr
|
May
|
Jun
|
Jul
(3) |
Aug
(5) |
Sep
(4) |
Oct
(2) |
Nov
|
Dec
|
---|---|---|---|---|---|---|---|---|---|---|---|---|
2011 |
Jan
(2) |
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
From: Peter B. <pet...@gm...> - 2011-01-04 19:46:43
|
Hello Neha, thank you for using Clone Digger. Sorry, you can't do that using the current version of the tool. But you can: *) write are parser of output.html that will remove all the clones that are not interesting for you from the report *) add something like if (clone[0].getSourceFile().getFileName() == "your_file") or clone[1].getSourceFile().getFileName() == "your_file"): continue to the html_report.py: 107, where "your_file" is the file you've modified. You can alter the code of Clone Digger and make this filename a parameter. You can solve your other problem by the same manner. 2011/1/5 Neha Maheshwari <neh...@ne...> > Hi, > I am currently trying to use CloneDigger in my project. > Currently Whenever we run clonedigger on source_tree1 it checks all > files present in source_tree1 with themselves and with each other too. > But My requirement is that if I modified certain file in > source_tree1 than only it should be checked against all the files > present in source_tree1 and rest files are neither checked with > themselves nor with each other. > And other requirement is to exclude some irrelevant clones present > in code already. > > Any idea how can I achieve these requirement by current version of > clonedigger? > > > With Regards, > Neha Maheshwari > > > > > > > > > DISCLAIMER: > > ----------------------------------------------------------------------------------------------------------------------- > The contents of this e-mail and any attachment(s) are confidential and > intended > for the named recipient(s) only. > It shall not attach any liability on the originator or NECHCL or its > affiliates. Any views or opinions presented in > this email are solely those of the author and may not necessarily reflect > the > opinions of NECHCL or its affiliates. > Any form of reproduction, dissemination, copying, disclosure, modification, > distribution and / or publication of > this message without the prior written consent of the author of this e-mail > is > strictly prohibited. If you have > received this email in error please delete it and notify the sender > immediately. . > > ----------------------------------------------------------------------------------------------------------------------- > > > ------------------------------------------------------------------------------ > Learn how Oracle Real Application Clusters (RAC) One Node allows customers > to consolidate database storage, standardize their database environment, > and, > should the need arise, upgrade to a full multi-node Oracle RAC database > without downtime or disruption > http://p.sf.net/sfu/oracle-sfdevnl > _______________________________________________ > Clonedigger-general mailing list > Clo...@li... > https://lists.sourceforge.net/lists/listinfo/clonedigger-general > -- Best regards, Peter Bulychev. |
From: Neha M. <neh...@ne...> - 2011-01-04 11:45:03
|
Hi, I am currently trying to use CloneDigger in my project. Currently Whenever we run clonedigger on source_tree1 it checks all files present in source_tree1 with themselves and with each other too. But My requirement is that if I modified certain file in source_tree1 than only it should be checked against all the files present in source_tree1 and rest files are neither checked with themselves nor with each other. And other requirement is to exclude some irrelevant clones present in code already. Any idea how can I achieve these requirement by current version of clonedigger? With Regards, Neha Maheshwari DISCLAIMER: ----------------------------------------------------------------------------------------------------------------------- The contents of this e-mail and any attachment(s) are confidential and intended for the named recipient(s) only. It shall not attach any liability on the originator or NECHCL or its affiliates. Any views or opinions presented in this email are solely those of the author and may not necessarily reflect the opinions of NECHCL or its affiliates. Any form of reproduction, dissemination, copying, disclosure, modification, distribution and / or publication of this message without the prior written consent of the author of this e-mail is strictly prohibited. If you have received this email in error please delete it and notify the sender immediately. . ----------------------------------------------------------------------------------------------------------------------- |
From: Peter B. <pet...@gm...> - 2008-10-26 11:48:09
|
Hello, Matthew. Can you try to run the trunk version of CD on your example? (svn co https://clonedigger.svn.sourceforge.net/svnroot/clonedigger/trunkclonedigger) 2008/10/26 Matthew Wilson <ma...@tp...> > I'm running clonedigger on my SQLObject model files and I'm getting > this traceback: > > $ clonedigger model/user.py > Parsing model/user.py ... Error: can't parse "model/user.py" > : Traceback (most recent call last): > File > "/home/matt/virtualenvs/staffknex/lib/python2.5/site-packages/clonedigger-1.0.9_beta-py2.5.egg/clonedigger/clonedigger.py", > line 128, in parse_file > source_file = supplier(file_name) > File > "/home/matt/virtualenvs/staffknex/lib/python2.5/site-packages/clonedigger-1.0.9_beta-py2.5.egg/clonedigger/python_compiler.py", > line 175, in __init__ > self._setTree(rec_build_tree(compiler.parseFile(file_name))) > File > "/home/matt/virtualenvs/staffknex/lib/python2.5/site-packages/clonedigger-1.0.9_beta-py2.5.egg/clonedigger/python_compiler.py", > line 167, in rec_build_tree > t = rec_build_tree(c, is_statement) > File > "/home/matt/virtualenvs/staffknex/lib/python2.5/site-packages/clonedigger-1.0.9_beta-py2.5.egg/clonedigger/python_compiler.py", > line 167, in rec_build_tree > t = rec_build_tree(c, is_statement) > File > "/home/matt/virtualenvs/staffknex/lib/python2.5/site-packages/clonedigger-1.0.9_beta-py2.5.egg/clonedigger/python_compiler.py", > line 124, in rec_build_tree > add_childs([compiler_ast_node.code]) > File > "/home/matt/virtualenvs/staffknex/lib/python2.5/site-packages/clonedigger-1.0.9_beta-py2.5.egg/clonedigger/python_compiler.py", > line 58, in add_childs > t = rec_build_tree(child, is_statement) > File > "/home/matt/virtualenvs/staffknex/lib/python2.5/site-packages/clonedigger-1.0.9_beta-py2.5.egg/clonedigger/python_compiler.py", > line 167, in rec_build_tree > t = rec_build_tree(c, is_statement) > File > "/home/matt/virtualenvs/staffknex/lib/python2.5/site-packages/clonedigger-1.0.9_beta-py2.5.egg/clonedigger/python_compiler.py", > line 167, in rec_build_tree > t = rec_build_tree(c, is_statement) > File > "/home/matt/virtualenvs/staffknex/lib/python2.5/site-packages/clonedigger-1.0.9_beta-py2.5.egg/clonedigger/python_compiler.py", > line 167, in rec_build_tree > t = rec_build_tree(c, is_statement) > File > "/home/matt/virtualenvs/staffknex/lib/python2.5/site-packages/clonedigger-1.0.9_beta-py2.5.egg/clonedigger/python_compiler.py", > line 156, in rec_build_tree > add_childs([compiler_ast_node.expr]) > File > "/home/matt/virtualenvs/staffknex/lib/python2.5/site-packages/clonedigger-1.0.9_beta-py2.5.egg/clonedigger/python_compiler.py", > line 58, in add_childs > t = rec_build_tree(child, is_statement) > File > "/home/matt/virtualenvs/staffknex/lib/python2.5/site-packages/clonedigger-1.0.9_beta-py2.5.egg/clonedigger/python_compiler.py", > line 160, in rec_build_tree > add_childs(compiler_ast_node.defaults) > File > "/home/matt/virtualenvs/staffknex/lib/python2.5/site-packages/clonedigger-1.0.9_beta-py2.5.egg/clonedigger/python_compiler.py", > line 55, in add_childs > assert(type(childs) t== type([])) > AssertionError > > Input is empty > > Any ideas what is going on? Sorry, I can't provide my input code > (it's proprietary). It would be helpful to get some information about > where in my code this blowing up. > > Matt > > -- > Matthew Wilson > ma...@tp... > http://tplus1.com > > ------------------------------------------------------------------------- > This SF.Net email is sponsored by the Moblin Your Move Developer's > challenge > Build the coolest Linux based applications with Moblin SDK & win great > prizes > Grand prize is a trip for two to an Open Source event anywhere in the world > http://moblin-contest.org/redirect.php?banner_id=100&url=/ > _______________________________________________ > Clonedigger-general mailing list > Clo...@li... > https://lists.sourceforge.net/lists/listinfo/clonedigger-general > -- Best regards, Peter Bulychev. |
From: Matthew W. <ma...@tp...> - 2008-10-26 00:34:16
|
I'm running clonedigger on my SQLObject model files and I'm getting this traceback: $ clonedigger model/user.py Parsing model/user.py ... Error: can't parse "model/user.py" : Traceback (most recent call last): File "/home/matt/virtualenvs/staffknex/lib/python2.5/site-packages/clonedigger-1.0.9_beta-py2.5.egg/clonedigger/clonedigger.py", line 128, in parse_file source_file = supplier(file_name) File "/home/matt/virtualenvs/staffknex/lib/python2.5/site-packages/clonedigger-1.0.9_beta-py2.5.egg/clonedigger/python_compiler.py", line 175, in __init__ self._setTree(rec_build_tree(compiler.parseFile(file_name))) File "/home/matt/virtualenvs/staffknex/lib/python2.5/site-packages/clonedigger-1.0.9_beta-py2.5.egg/clonedigger/python_compiler.py", line 167, in rec_build_tree t = rec_build_tree(c, is_statement) File "/home/matt/virtualenvs/staffknex/lib/python2.5/site-packages/clonedigger-1.0.9_beta-py2.5.egg/clonedigger/python_compiler.py", line 167, in rec_build_tree t = rec_build_tree(c, is_statement) File "/home/matt/virtualenvs/staffknex/lib/python2.5/site-packages/clonedigger-1.0.9_beta-py2.5.egg/clonedigger/python_compiler.py", line 124, in rec_build_tree add_childs([compiler_ast_node.code]) File "/home/matt/virtualenvs/staffknex/lib/python2.5/site-packages/clonedigger-1.0.9_beta-py2.5.egg/clonedigger/python_compiler.py", line 58, in add_childs t = rec_build_tree(child, is_statement) File "/home/matt/virtualenvs/staffknex/lib/python2.5/site-packages/clonedigger-1.0.9_beta-py2.5.egg/clonedigger/python_compiler.py", line 167, in rec_build_tree t = rec_build_tree(c, is_statement) File "/home/matt/virtualenvs/staffknex/lib/python2.5/site-packages/clonedigger-1.0.9_beta-py2.5.egg/clonedigger/python_compiler.py", line 167, in rec_build_tree t = rec_build_tree(c, is_statement) File "/home/matt/virtualenvs/staffknex/lib/python2.5/site-packages/clonedigger-1.0.9_beta-py2.5.egg/clonedigger/python_compiler.py", line 167, in rec_build_tree t = rec_build_tree(c, is_statement) File "/home/matt/virtualenvs/staffknex/lib/python2.5/site-packages/clonedigger-1.0.9_beta-py2.5.egg/clonedigger/python_compiler.py", line 156, in rec_build_tree add_childs([compiler_ast_node.expr]) File "/home/matt/virtualenvs/staffknex/lib/python2.5/site-packages/clonedigger-1.0.9_beta-py2.5.egg/clonedigger/python_compiler.py", line 58, in add_childs t = rec_build_tree(child, is_statement) File "/home/matt/virtualenvs/staffknex/lib/python2.5/site-packages/clonedigger-1.0.9_beta-py2.5.egg/clonedigger/python_compiler.py", line 160, in rec_build_tree add_childs(compiler_ast_node.defaults) File "/home/matt/virtualenvs/staffknex/lib/python2.5/site-packages/clonedigger-1.0.9_beta-py2.5.egg/clonedigger/python_compiler.py", line 55, in add_childs assert(type(childs) == type([])) AssertionError Input is empty Any ideas what is going on? Sorry, I can't provide my input code (it's proprietary). It would be helpful to get some information about where in my code this blowing up. Matt -- Matthew Wilson ma...@tp... http://tplus1.com |
From: Peter B. <pet...@gm...> - 2008-09-09 17:31:14
|
Hello. Anatoly Zapadinsky worked on the Clone Digger during this Google Summer of Code. He made a big progress, the key features he has implemented are listed below: - Eclipse Plugin. Now you can use Clone Digger from Eclipse. Anatoly wrote how to use it in the manual: http://clonedigger.sourceforge.net/eclipse_plugin_manual/eclipse_plugin_manual.html - --fast option. It can be used in order to quickly discover simple clones, which differ only in function and variable names and constants - Highlighting on the AST level. Diff highlighting looked bad, now clones are highlighted on the AST level. This was implemented by pretty-printing ASTs, thus the code in the report can differ from your original code (you can turn this mode off by using the --force-diff option). -- Best regards, Peter Bulychev. |
From: Peter B. <pet...@gm...> - 2008-09-04 14:00:02
|
Hello again. Thank you. I've fixed and committed that. The link to the paper which is referenced here is given above in the "Notice" section, so I didn't give it here. 2008/9/4 _ _ <zp...@gm...> > --hashing-depth=HASHING_DEPTH > default value if 1, read the paper for semantics. > Compuation can be speed up by increasing increasing > this value (but some clones can be list) > > This snippet contains multiple typos. It shall read "computation", the word > "increasing" is repeated. > > Also there are multiple references to "the paper" in the help. > I have to assume it's > http://clonedigger.sourceforge.net/duplicate_code_detection_bulychev_minea.pdf > I suggest to include this file into the trunk, unless it was not included > on purpose. Because of license-related reasons for example. > > > ------------------------------------------------------------------------- > This SF.Net email is sponsored by the Moblin Your Move Developer's > challenge > Build the coolest Linux based applications with Moblin SDK & win great > prizes > Grand prize is a trip for two to an Open Source event anywhere in the world > http://moblin-contest.org/redirect.php?banner_id=100&url=/ > _______________________________________________ > Clonedigger-general mailing list > Clo...@li... > https://lists.sourceforge.net/lists/listinfo/clonedigger-general > > -- Best regards, Peter Bulychev. |
From: _ _ <zp...@gm...> - 2008-09-04 10:10:24
|
--hashing-depth=HASHING_DEPTH default value if 1, read the paper for semantics. Compuation can be speed up by increasing increasing this value (but some clones can be list) This snippet contains multiple typos. It shall read "computation", the word "increasing" is repeated. Also there are multiple references to "the paper" in the help. I have to assume it's http://clonedigger.sourceforge.net/duplicate_code_detection_bulychev_minea.pdf I suggest to include this file into the trunk, unless it was not included on purpose. Because of license-related reasons for example. |
From: _ _ <zp...@gm...> - 2008-08-27 11:29:51
|
On Wed, Aug 27, 2008 at 11:39 AM, Peter Bulychev <pet...@gm...>wrote: > Hello again. > > I've applied your patch with minor changes and put it to svn. > Great. |
From: Peter B. <pet...@gm...> - 2008-08-27 08:55:30
|
Hello again. I've applied your patch with minor changes and put it to svn. Sorry for delay. Thank you. 2008/8/6 Peter Bulychev <pet...@gm...> > Hello. > > Thank you for your patch. > > I'm currently away from home, in the worst case I'll be able to check it > and to place it to subversion in two weeks (preferably I'll do it before). > > Preliminary it seems ok :) > > 2008/8/6 _ _ <zp...@gm...> > >> My previous message is unreadable from the web interface due to destroyed >> indentation in source code :-( >> I'm resending the diff and the test files as attachments. >> >> ------------------------------------------------------------------------- >> This SF.Net email is sponsored by the Moblin Your Move Developer's >> challenge >> Build the coolest Linux based applications with Moblin SDK & win great >> prizes >> Grand prize is a trip for two to an Open Source event anywhere in the >> world >> http://moblin-contest.org/redirect.php?banner_id=100&url=/ >> _______________________________________________ >> Clonedigger-general mailing list >> Clo...@li... >> https://lists.sourceforge.net/lists/listinfo/clonedigger-general >> >> > > > -- > Best regards, > Peter Bulychev. > -- Best regards, Peter Bulychev. |
From: Peter B. <pet...@gm...> - 2008-08-06 16:01:08
|
Hello. Thank you for your patch. I'm currently away from home, in the worst case I'll be able to check it and to place it to subversion in two weeks (preferably I'll do it before). Preliminary it seems ok :) 2008/8/6 _ _ <zp...@gm...> > My previous message is unreadable from the web interface due to destroyed > indentation in source code :-( > I'm resending the diff and the test files as attachments. > > ------------------------------------------------------------------------- > This SF.Net email is sponsored by the Moblin Your Move Developer's > challenge > Build the coolest Linux based applications with Moblin SDK & win great > prizes > Grand prize is a trip for two to an Open Source event anywhere in the world > http://moblin-contest.org/redirect.php?banner_id=100&url=/ > _______________________________________________ > Clonedigger-general mailing list > Clo...@li... > https://lists.sourceforge.net/lists/listinfo/clonedigger-general > > -- Best regards, Peter Bulychev. |
From: _ _ <zp...@gm...> - 2008-08-06 10:10:55
|
My previous message is unreadable from the web interface due to destroyed indentation in source code :-( I'm resending the diff and the test files as attachments. |
From: _ _ <zp...@gm...> - 2008-08-06 09:44:54
|
I've made an attempt to implement the proposed functionality. This is a diff between r165 and my working copy: --- cut here --- Index: clonedigger/abstract_syntax_tree.py =================================================================== --- clonedigger/abstract_syntax_tree.py (revision 165) +++ clonedigger/abstract_syntax_tree.py (working copy) @@ -66,8 +66,9 @@ self._hash = None self._source_file = source_file self._is_statement = False - if name != None: - self.setName(name) + #if name != None: + # self.setName(name) + self.setName(name) def getSourceFile(self): return self._source_file def setMark(self, mark): Index: clonedigger/python_compiler.py =================================================================== --- clonedigger/python_compiler.py (revision 165) +++ clonedigger/python_compiler.py (working copy) @@ -36,8 +36,9 @@ distance_threshold = 5 size_threshold = 5 ignored_statements = ['Import', 'From'] - def __init__(self, file_name): + def __init__(self, file_name, func_prefixes = ()): SourceFile.__init__(self, file_name) + self.func_prefixes = func_prefixes def rec_build_tree(compiler_ast_node, is_statement=False): def flatten(list): l = [] @@ -89,14 +90,19 @@ r.addChild(t) if isinstance(compiler_ast_node, compiler.ast.Node): - name = compiler_ast_node.__class__.__name__ + name = compiler_ast_node.__class__.__name__ + if name == 'Function': + for prefix in self.func_prefixes: + if compiler_ast_node.name.startswith(prefix): + # skip function that matches pattern + return AbstractSyntaxTree() if name in ['Function', 'Class']: # ignoring class and function docs compiler_ast_node.doc = None if compiler_ast_node.lineno: lines = [compiler_ast_node.lineno-1] else: - lines = [] + lines = [] r = AbstractSyntaxTree(name, lines, self) r.ast_node = compiler_ast_node if is_statement and compiler_ast_node.lineno: Index: clonedigger/clonedigger.py =================================================================== --- clonedigger/clonedigger.py (revision 165) +++ clonedigger/clonedigger.py (working copy) @@ -89,14 +89,23 @@ cmdline.add_option('--report-unifiers', action='store_true', dest='report_unifiers', help='') - + cmdline.add_option('--func-prefixes', + action='store', + dest='f_prefixes', + help='skip functions/methods with these prefixes (provide a CSV string as argument)') + cmdline.set_defaults(output='output.html', language='python', ingore_dirs=[], + f_prefixes = None, **arguments.__dict__) (options, source_file_names) = cmdline.parse_args() + if options.f_prefixes != None: + func_prefixes = tuple([x.strip() for x in options.f_prefixes.split(',')]) + else: + func_prefixes = () source_files = [] report = html_report.HTMLReport() @@ -118,11 +127,11 @@ report.startTimer('Construction of AST') - def parse_file(file_name): + def parse_file(file_name, func_prefixes): try: print 'Parsing ', file_name, '...', sys.stdout.flush() - source_file = supplier(file_name) + source_file = supplier(file_name, func_prefixes) source_file.getTree().propagateCoveredLineNumbers() source_file.getTree().propagateHeight() source_files.append(source_file) @@ -148,13 +157,13 @@ files = [os.path.join(file_name, f) for f in os.listdir(file_name) if os.path.splitext(f)[1][1:] == supplier.extension] for f in files: - parse_file(f) + parse_file(f, func_prefixes) else: for dirpath, dirnames, filenames in walk(file_name): for f in filenames: - parse_file(os.path.join(dirpath, f)) + parse_file(os.path.join(dirpath, f), func_prefixes) else: - parse_file(file_name) + parse_file(file_name, func_prefixes) report.stopTimer() duplicates = clone_detection_algorithm.findDuplicateCode(source_files, report) --- cut here --- The test source file: --- cut here --- #!/usr/bin/env python class A: def __init__(self): self.a = None def get_a(self): return self.a def set_a(self, a): self.a = a class B: def __init__(self): self.b = None def get_b(self): return self.b def set_b(self, b): self.b = b --- cut here --- The tests: $ clonedigger.py 1.py Parsing 1.py ... done 3 sequences average sequence length: 2.666667 maximum sequence length: 3 Number of statements: 8 Calculating size for each statement... done Building statement hash... done Number of different hash values: 3 Building patterns... 3 patterns were discovered Choosing pattern for each statement... done Finding similar sequences of statements... 2 sequences were found Refining candidates... 2 clones were found Removing dominated clones... -1 clones were removed $ clonedigger.py --func-prefixes "get" 1.py Parsing 1.py ... done 1 sequences average sequence length: 2.000000 maximum sequence length: 2 Number of statements: 2 Calculating size for each statement... done Building statement hash... done Number of different hash values: 1 Building patterns... 1 patterns were discovered Choosing pattern for each statement... done Finding similar sequences of statements... 1 sequences were found Refining candidates... 1 clones were found Removing dominated clones... 0 clones were removed $ clonedigger.py --func-prefixes "get,set" 1.py Parsing 1.py ... done 1 sequences average sequence length: 2.000000 maximum sequence length: 2 Number of statements: 2 Calculating size for each statement... done Building statement hash... done Number of different hash values: 1 Building patterns... 1 patterns were discovered Choosing pattern for each statement... done Finding similar sequences of statements... 0 sequences were found Refining candidates... 0 clones were found Removing dominated clones... 0 clones were removed |
From: _ _ <zp...@gm...> - 2008-07-27 11:43:05
|
On Sun, Jul 27, 2008 at 9:58 AM, Peter Bulychev <pet...@gm...>wrote: > It's good idea to ignore such kind of functions. Nevertheless this mode > should be optional > I do agree. The idea is to add flexibility to the tool, not to restrict the way it can be used by programmers. > Maybe, you'd like to write a patch by your own and send it to me (like some > other people did). > > I don't think this could happen in the near future. The main reason is that I've got 0 experience in this domain and I can't estimate how much time it would take to understand the algorithms and then to understand the code of Clone Digger. So my only option for now is to wait for you to do it. I would also like to point that it would be a good idea to clean and document the code a bit if you expect other people to do some work on it. I strongly hope you won't take this as a flame. I do understand that the code is still in the stage of a prototype. |
From: Peter B. <pet...@gm...> - 2008-07-27 06:58:19
|
Hello. 2008/7/26 _ _ <zp...@gm...> > I've been using Clone Digger for a while (to dig Python sources) and I've > noticed some regularities. > There are some generic class methods: constructors (__init__() in Python), > trivial getters and setters. > The differences between them are often just the variable names (and perhaps > the docstrings). > Clone Digger will detect them as clones, but I think these are false clones > (can't be refactored). > I think it would be a good idea to add some filtering abilities so the > users could exclude class methods matching certain patterns, > like __**, get*, set*. > It's good idea to ignore such kind of functions. Nevertheless this mode should be optional because sometimes cloned __init__ functions can be refactored by pulling them up to the base class. I'll put this request to my "what-to-do" list but I don't know when I'll be able to do it. Maybe, you'd like to write a patch by your own and send it to me (like some other people did). -- Best regards, Peter Bulychev. |
From: _ _ <zp...@gm...> - 2008-07-26 08:45:47
|
I've been using Clone Digger for a while (to dig Python sources) and I've noticed some regularities. There are some generic class methods: constructors (__init__() in Python), trivial getters and setters. The differences between them are often just the variable names (and perhaps the docstrings). Clone Digger will detect them as clones, but I think these are false clones (can't be refactored). I think it would be a good idea to add some filtering abilities so the users could exclude class methods matching certain patterns, like __**, get*, set*. |
From: Peter B. <pet...@gm...> - 2008-03-29 06:43:31
|
Hello. I've resolved the bug with very poor performance on large sequences of statements and formed the 1.0.1 version. If you had the problems with the performance during the " Finding similar sequences of statements..." stage, please download the new version. -- Best regards, Peter Bulychev. |
From: Rob H. <ro...@ho...> - 2008-03-27 10:56:16
|
Peter, In this particular case, it was about 170, 45, and 90 "f.write" statements, respectively. Only some of them have identical arguments, rarely in the same order, and all these "one line clones" are in the same method (nothing shared between two of the three that I undressed): 3 + f.write("\t\t\t\t\t\t1.0\n") 3 + f.write("\t\t\t\t\tcylinder\n") 3 + f.write("\t\t\t\t\t{\n") 3 + f.write("\t\t\t\t\t}\n") 3 + f.write("\t\t\t\tmodelTexture(%f, %f, %f)\n"%(col[0], col[1], col[2])) 3 + f.write("\t\t\tsphere\n") 3 + f.write("\t\t\tunion\n") 3 + f.write("\t\t}\n") 4 + f.write("\t\t\t{\n") 4 + f.write("\t\t\t}\n") 5 + f.write("\t\t{\n") 6 + f.write("}\n\n") 6 + f.write("\t\t\t\tbox\n") 6 + f.write("\t\t\tdisc\n") 7 + f.write("#end\n") 7 + f.write("{\n") 8 + f.write("\t\t{\n") 8 + f.write("\t\t}\n") 8 + f.write("\t{\n") 8 + f.write("\t}\n") 8 + f.write("\t\t\t\t{\n") 8 + f.write("\t\t\t\t}\n") 14 + f.write("\t\t\t{\n") 14 + f.write("\t\t\t}\n") Regards, Rob 2008/3/27, Peter Bulychev <pet...@gm...>: > It's good :) > > Please, tell me the size of the sequences of similar statements in your > case. > > > 2008/3/27, Rob Hooft <ro...@ho...>: > > Peter, > > > > This works, thanks! I get: -- Rob W. W. Hooft || ro...@ho... || http://www.hooft.net/people/rob |
From: Peter B. <pet...@gm...> - 2008-03-27 10:30:39
|
It's good :) Please, tell me the size of the sequences of similar statements in your case. 2008/3/27, Rob Hooft <ro...@ho...>: > > Peter, > > This works, thanks! I get: > > Parsing /home/hooft/p/seattle/gui/batomicbasics.py ... done > 344 sequences > average sequence length: 4.066860 > maximum sequence length: 160 > Number of statements: 1399 > > Calculating size for each statement... done > Building statement hash... done > > Number of different hash values: 159 > Building patterns... 235 patterns were discovered > > Choosing pattern for each statement... done > > Finding similar sequences of statements... 1139 sequences were found > ----------------------------------------- > Warning: sequence of statements starting at > /home/hooft/p/seattle/gui/batomicbasics.py:819 > consists of many similar statements. > It can result in bad perfomance. > Please refer to http://clonedigger.sourceforge.net/documentation.html > ----------------------------------------- > ----------------------------------------- > Warning: sequence of statements starting at > /home/hooft/p/seattle/gui/batomicbasics.py:1856 > consists of many similar statements. > It can result in bad perfomance. > Please refer to http://clonedigger.sourceforge.net/documentation.html > ----------------------------------------- > ----------------------------------------- > Warning: sequence of statements starting at > /home/hooft/p/seattle/gui/batomicbasics.py:2007 > consists of many similar statements. > It can result in bad perfomance. > Please refer to http://clonedigger.sourceforge.net/documentation.html > ----------------------------------------- > Refining candidates... > ^C > > And when I remove the three code blocks these refer to: > > Parsing batomicbasics.py ... done > 344 sequences > average sequence length: 3.226744 > maximum sequence length: 59 > Number of statements: 1110 > > Calculating size for each statement... done > Building statement hash... done > > Number of different hash values: 157 > Building patterns... 234 patterns were discovered > > Choosing pattern for each statement... done > > Finding similar sequences of statements... 45 sequences were found > Refining candidates... 28 clones were found > Removing dominated clones... -2 clones were removed > python2.4 clonedigger.py batomicbasics.py 43.94s user 0.04s system > 99% cpu 44.045 total > > > 2008/3/27, Peter Bulychev <pet...@gm...>: > > Hello. > > > > Please, replace the clone_detection_algorithm.py by the attached file , > > and inform me, if the warnings are shown before the "refining the > > candidates" phase. > > > > If they are shown, you will not have to manually remove this sequences > from > > you code, I'll quickly modify the Clone Digger project to automatically > > exclude them. > > > > > > 2008/3/27, Rob Hooft <ro...@ho...>: > > > Sorry, I forgot to add the list as CC in my earlier answer, copied > below. > > > > > > I located two files in the source directory that take a long time to > > > complete on their own, and that I therefore suspect of being the > > > origin of the "infinite" time spent on the complete set. > > > > > > One of them contains sequences of code like: > > > > > > f.write("#macro dot(lx, ly, lz)\n") > > > f.write("\tsphere\n") > > > f.write("\t{\n") > > > f.write("\t\t<lx, ly, lz>, radDot\n") > > > f.write("\t\tflatTexture(1.0, 1.0, 1.0)\n") > > > f.write("\t}\n") > > > f.write("#end\n") > > > [...] > > > > > > The other a keyboard map like > > > > > > > > self.btnKey_q.setText(self.__keymap__['q'][self.__shift__]) > > > > > self.btnKey_w.setText(self.__keymap__['w'][self.__shift__]) > > > > > self.btnKey_e.setText(self.__keymap__['e'][self.__shift__]) > > > > > self.btnKey_r.setText(self.__keymap__['r'][self.__shift__]) > > > > > self.btnKey_t.setText(self.__keymap__['t'][self.__shift__]) > > > [...] > > > > > > I guess I will have to exclude those two to be able to complete the > > > set. Any further guidance would be welcome! > > > > > > Regards, > > > > > > > > > Rob Hooft > > > > > > > > > 2008/3/27, Rob Hooft <ro...@ho...>: > > > > Peter, > > > > > > > > Thanks for your answer. Yes, I did prevent the inclusion of > generated > > > > code. Below is the stack trace for the run I just terminated. I am > not > > > > aware of long sequences of statements like that, isn't that what > > > > "maximum sequence length: 169" is telling me? Is 169 excessive? > > > > > > > > Rob > > > > > > > > Traceback (most recent call last): > > > > File "clonedigger.py", line 114, in ? > > > > duplicates = > > > > > > clone_detection_algorithm.findDuplicateCode(source_files, > > report) > > > > File > > "/home/hooft/p/clonedigger/clone_detection_algorithm.py", > > line > > > > 258, in findDuplicateCode > > > > clones = refineDuplicates(duplicate_candidates) > > > > File > > "/home/hooft/p/clonedigger/clone_detection_algorithm.py", > > line > > > > 146, in refineDuplicates > > > > distance = candidate_sequence.calcDistance() > > > > File > > "/home/hooft/p/clonedigger/abstract_syntax_tree.py", line > > 252, > > > > in calcDistance > > > > unifier = anti_unification.Unifier(trees[0], > > trees[1]) > > > > File "/home/hooft/p/clonedigger/anti_unification.py", > > line 100, in __init__ > > > > (self._unifier, self._substitutions) = unify(t1, t2) > > > > File "/home/hooft/p/clonedigger/anti_unification.py", > > line 96, in unify > > > > (ai, si) = unify(node1.getChilds()[i], node2.getChilds()[i]) > > > > File "/home/hooft/p/clonedigger/anti_unification.py", > > line 96, in unify > > > > (ai, si) = unify(node1.getChilds()[i], node2.getChilds()[i]) > > > > File "/home/hooft/p/clonedigger/anti_unification.py", > > line 96, in unify > > > > (ai, si) = unify(node1.getChilds()[i], node2.getChilds()[i]) > > > > File "/home/hooft/p/clonedigger/anti_unification.py", > > line 97, in unify > > > > (ai, s) = combineSubs(ai, si, s) > > > > File "/home/hooft/p/clonedigger/anti_unification.py", > > line 71, in combineSubs > > > > newt = (copy.copy(t[0]), copy.copy(t[1])) > > > > File "/usr/lib/python2.4/copy.py", line 85, in copy > > > > return copier(x) > > > > File "/usr/lib/python2.4/copy.py", line 141, in _copy_inst > > > > y = _EmptyClass() > > > > KeyboardInterrupt > > > > python2.4 clonedigger.py ~/p/seattle/gui/*.py~*ui.py 167342.50suser > > > > 108.63s system 99% cpu 46:35:33.63 total > > > > > > > > > > > > 2008/3/27, Peter Bulychev <pet...@gm...>: > > > > > > > > > Hello, Rob. > > > > > > > > > > Let's try to solve the problem :) > > > > > > > > > > I expect you have removed the automatically generated code from > the > > > > > inspected module (as it was recommended in the documentation > page). > > > > > > > > > > I think that the problem is caused by the following line: > > > > > > > > > > > Finding similar sequences of statements... 6948 sequences were > > found > > > > > > > > > > > The value of 6948 is abnormally high for 30k statements. > > > > > > > > > > First of all, I think you should press the Ctrl-C and send me the > > Python > > > > > traceback (I don't think that waiting is a good solution, I think > we > > should > > > > > fix the bug). > > > > > > > > > > Secondly, please take a look, if there are any long sequences of > > assignments > > > > > in your project, like: > > > > > > > > > > > const_1 = 1 > > > > > > const_2 = 2 > > > > > > ..... > > > > > > const_100 = 100 > > > > > > > > > > > > > > > > > > > > > 2008/3/27, Rob Hooft <ro...@ho...>: > > > > > > > > > > > Hi, > > > > > > > > > > > > I have read the announcement of CloneDigger with interest. The > > > > > > department where I work is using python for all application > > software > > > > > > that we ship to customers, and I am very interested in the > > application > > > > > > of tools that can be used to "automatically" ensure quality > of > > code. > > > > > > Unfortunately, the whole package (436k lines excluding > > automatically > > > > > > generated parts) will be too large to analyze at once, so I > will > > focus > > > > > > on packages at a time. > > > > > > > > > > > > I have now made 1.5 attempt at running CloneDigger. I was very > > happy > > > > > > with the results on ~3k statements which ran in about 2 > minutes. A > > bit > > > > > > more challenging ~30k statement run is still going after 44 > hours > > of > > > > > > CPU time; the last screen output was generated 40 hours ago. > What > > can > > > > > > I do to figure out where this time is going, and can I get an > > > > > > intermediate sample of the result in some way? > > > > > > > > > > > > ------------------ > > > > > > ... > > > > > > 9513 sequences > > > > > > average sequence length: 3.057290 > > > > > > maximum sequence length: 169 > > > > > > Number of statements: 29084 > > > > > > Calculating size for each statement... done > > > > > > Building statement hash... done > > > > > > Number of different hash values: 1100 > > > > > > Building patterns... 5000, 10000, 15000, 20000, 25000, 4677 > > patterns > > > > > > were discovered > > > > > > Choosing pattern for each statement... done > > > > > > Finding similar sequences of statements... 6948 sequences > were > > found > > > > > > Refining candidates... > > > > > > > > -------------------------------------------------- > > > > > > > > > > > > Rob Hooft > > > > > > > > > > > > -- > > > > > > Rob W. W. Hooft || ro...@ho... || > > http://www.hooft.net/people/rob > > > > > > > > > > > > > > > > > > > > ------------------------------------------------------------------------- > > > > > > Check out the new SourceForge.net Marketplace. > > > > > > It's the best place to buy or sell services for > > > > > > just about anything Open Source. > > > > > > > > > > > > > > http://ad.doubleclick.net/clk;164216239;13503038;w?http://sf.net/marketplace > > > > > > _______________________________________________ > > > > > > Clonedigger-general mailing list > > > > > > Clo...@li... > > > > > > > > > > > > > https://lists.sourceforge.net/lists/listinfo/clonedigger-general > > > > > > > > > > > > > > > > > > > > > -- > > > > > Best regards, > > > > > Peter Bulychev. > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > Rob W. W. Hooft || ro...@ho... || http://www.hooft.net/people/rob > > > > > > > > > > > > > > > > -- > > > > > > Rob W. W. Hooft || ro...@ho... || http://www.hooft.net/people/rob > > > > > > > > > > > -- > > Best regards, > > Peter Bulychev. > > > > > > -- > > Rob W. W. Hooft || ro...@ho... || http://www.hooft.net/people/rob > -- Best regards, Peter Bulychev. |
From: Rob H. <ro...@ho...> - 2008-03-27 10:27:58
|
Peter, This works, thanks! I get: Parsing /home/hooft/p/seattle/gui/batomicbasics.py ... done 344 sequences average sequence length: 4.066860 maximum sequence length: 160 Number of statements: 1399 Calculating size for each statement... done Building statement hash... done Number of different hash values: 159 Building patterns... 235 patterns were discovered Choosing pattern for each statement... done Finding similar sequences of statements... 1139 sequences were found ----------------------------------------- Warning: sequence of statements starting at /home/hooft/p/seattle/gui/batomicbasics.py:819 consists of many similar statements. It can result in bad perfomance. Please refer to http://clonedigger.sourceforge.net/documentation.html ----------------------------------------- ----------------------------------------- Warning: sequence of statements starting at /home/hooft/p/seattle/gui/batomicbasics.py:1856 consists of many similar statements. It can result in bad perfomance. Please refer to http://clonedigger.sourceforge.net/documentation.html ----------------------------------------- ----------------------------------------- Warning: sequence of statements starting at /home/hooft/p/seattle/gui/batomicbasics.py:2007 consists of many similar statements. It can result in bad perfomance. Please refer to http://clonedigger.sourceforge.net/documentation.html ----------------------------------------- Refining candidates... ^C And when I remove the three code blocks these refer to: Parsing batomicbasics.py ... done 344 sequences average sequence length: 3.226744 maximum sequence length: 59 Number of statements: 1110 Calculating size for each statement... done Building statement hash... done Number of different hash values: 157 Building patterns... 234 patterns were discovered Choosing pattern for each statement... done Finding similar sequences of statements... 45 sequences were found Refining candidates... 28 clones were found Removing dominated clones... -2 clones were removed python2.4 clonedigger.py batomicbasics.py 43.94s user 0.04s system 99% cpu 44.045 total 2008/3/27, Peter Bulychev <pet...@gm...>: > Hello. > > Please, replace the clone_detection_algorithm.py by the attached file , > and inform me, if the warnings are shown before the "refining the > candidates" phase. > > If they are shown, you will not have to manually remove this sequences from > you code, I'll quickly modify the Clone Digger project to automatically > exclude them. > > > 2008/3/27, Rob Hooft <ro...@ho...>: > > Sorry, I forgot to add the list as CC in my earlier answer, copied below. > > > > I located two files in the source directory that take a long time to > > complete on their own, and that I therefore suspect of being the > > origin of the "infinite" time spent on the complete set. > > > > One of them contains sequences of code like: > > > > f.write("#macro dot(lx, ly, lz)\n") > > f.write("\tsphere\n") > > f.write("\t{\n") > > f.write("\t\t<lx, ly, lz>, radDot\n") > > f.write("\t\tflatTexture(1.0, 1.0, 1.0)\n") > > f.write("\t}\n") > > f.write("#end\n") > > [...] > > > > The other a keyboard map like > > > > > self.btnKey_q.setText(self.__keymap__['q'][self.__shift__]) > > > self.btnKey_w.setText(self.__keymap__['w'][self.__shift__]) > > > self.btnKey_e.setText(self.__keymap__['e'][self.__shift__]) > > > self.btnKey_r.setText(self.__keymap__['r'][self.__shift__]) > > > self.btnKey_t.setText(self.__keymap__['t'][self.__shift__]) > > [...] > > > > I guess I will have to exclude those two to be able to complete the > > set. Any further guidance would be welcome! > > > > Regards, > > > > > > Rob Hooft > > > > > > 2008/3/27, Rob Hooft <ro...@ho...>: > > > Peter, > > > > > > Thanks for your answer. Yes, I did prevent the inclusion of generated > > > code. Below is the stack trace for the run I just terminated. I am not > > > aware of long sequences of statements like that, isn't that what > > > "maximum sequence length: 169" is telling me? Is 169 excessive? > > > > > > Rob > > > > > > Traceback (most recent call last): > > > File "clonedigger.py", line 114, in ? > > > duplicates = > > > > clone_detection_algorithm.findDuplicateCode(source_files, > report) > > > File > "/home/hooft/p/clonedigger/clone_detection_algorithm.py", > line > > > 258, in findDuplicateCode > > > clones = refineDuplicates(duplicate_candidates) > > > File > "/home/hooft/p/clonedigger/clone_detection_algorithm.py", > line > > > 146, in refineDuplicates > > > distance = candidate_sequence.calcDistance() > > > File > "/home/hooft/p/clonedigger/abstract_syntax_tree.py", line > 252, > > > in calcDistance > > > unifier = anti_unification.Unifier(trees[0], > trees[1]) > > > File "/home/hooft/p/clonedigger/anti_unification.py", > line 100, in __init__ > > > (self._unifier, self._substitutions) = unify(t1, t2) > > > File "/home/hooft/p/clonedigger/anti_unification.py", > line 96, in unify > > > (ai, si) = unify(node1.getChilds()[i], node2.getChilds()[i]) > > > File "/home/hooft/p/clonedigger/anti_unification.py", > line 96, in unify > > > (ai, si) = unify(node1.getChilds()[i], node2.getChilds()[i]) > > > File "/home/hooft/p/clonedigger/anti_unification.py", > line 96, in unify > > > (ai, si) = unify(node1.getChilds()[i], node2.getChilds()[i]) > > > File "/home/hooft/p/clonedigger/anti_unification.py", > line 97, in unify > > > (ai, s) = combineSubs(ai, si, s) > > > File "/home/hooft/p/clonedigger/anti_unification.py", > line 71, in combineSubs > > > newt = (copy.copy(t[0]), copy.copy(t[1])) > > > File "/usr/lib/python2.4/copy.py", line 85, in copy > > > return copier(x) > > > File "/usr/lib/python2.4/copy.py", line 141, in _copy_inst > > > y = _EmptyClass() > > > KeyboardInterrupt > > > python2.4 clonedigger.py ~/p/seattle/gui/*.py~*ui.py 167342.50s user > > > 108.63s system 99% cpu 46:35:33.63 total > > > > > > > > > 2008/3/27, Peter Bulychev <pet...@gm...>: > > > > > > > Hello, Rob. > > > > > > > > Let's try to solve the problem :) > > > > > > > > I expect you have removed the automatically generated code from the > > > > inspected module (as it was recommended in the documentation page). > > > > > > > > I think that the problem is caused by the following line: > > > > > > > > > Finding similar sequences of statements... 6948 sequences were > found > > > > > > > > > The value of 6948 is abnormally high for 30k statements. > > > > > > > > First of all, I think you should press the Ctrl-C and send me the > Python > > > > traceback (I don't think that waiting is a good solution, I think we > should > > > > fix the bug). > > > > > > > > Secondly, please take a look, if there are any long sequences of > assignments > > > > in your project, like: > > > > > > > > > const_1 = 1 > > > > > const_2 = 2 > > > > > ..... > > > > > const_100 = 100 > > > > > > > > > > > > > > > > > 2008/3/27, Rob Hooft <ro...@ho...>: > > > > > > > > > Hi, > > > > > > > > > > I have read the announcement of CloneDigger with interest. The > > > > > department where I work is using python for all application > software > > > > > that we ship to customers, and I am very interested in the > application > > > > > of tools that can be used to "automatically" ensure quality of > code. > > > > > Unfortunately, the whole package (436k lines excluding > automatically > > > > > generated parts) will be too large to analyze at once, so I will > focus > > > > > on packages at a time. > > > > > > > > > > I have now made 1.5 attempt at running CloneDigger. I was very > happy > > > > > with the results on ~3k statements which ran in about 2 minutes. A > bit > > > > > more challenging ~30k statement run is still going after 44 hours > of > > > > > CPU time; the last screen output was generated 40 hours ago. What > can > > > > > I do to figure out where this time is going, and can I get an > > > > > intermediate sample of the result in some way? > > > > > > > > > > ------------------ > > > > > ... > > > > > 9513 sequences > > > > > average sequence length: 3.057290 > > > > > maximum sequence length: 169 > > > > > Number of statements: 29084 > > > > > Calculating size for each statement... done > > > > > Building statement hash... done > > > > > Number of different hash values: 1100 > > > > > Building patterns... 5000, 10000, 15000, 20000, 25000, 4677 > patterns > > > > > were discovered > > > > > Choosing pattern for each statement... done > > > > > Finding similar sequences of statements... 6948 sequences were > found > > > > > Refining candidates... > > > > > > -------------------------------------------------- > > > > > > > > > > Rob Hooft > > > > > > > > > > -- > > > > > Rob W. W. Hooft || ro...@ho... || > http://www.hooft.net/people/rob > > > > > > > > > > > > > > > ------------------------------------------------------------------------- > > > > > Check out the new SourceForge.net Marketplace. > > > > > It's the best place to buy or sell services for > > > > > just about anything Open Source. > > > > > > > > > > http://ad.doubleclick.net/clk;164216239;13503038;w?http://sf.net/marketplace > > > > > _______________________________________________ > > > > > Clonedigger-general mailing list > > > > > Clo...@li... > > > > > > > > > > https://lists.sourceforge.net/lists/listinfo/clonedigger-general > > > > > > > > > > > > > > > > > -- > > > > Best regards, > > > > Peter Bulychev. > > > > > > > > > > > > > > > > -- > > > > > > Rob W. W. Hooft || ro...@ho... || http://www.hooft.net/people/rob > > > > > > > > > > > -- > > > > Rob W. W. Hooft || ro...@ho... || http://www.hooft.net/people/rob > > > > > > -- > Best regards, > Peter Bulychev. > -- Rob W. W. Hooft || ro...@ho... || http://www.hooft.net/people/rob |
From: Peter B. <pet...@gm...> - 2008-03-27 10:23:10
|
Hello. Please, replace the clone_detection_algorithm.py by the attached file , and inform me, if the warnings are shown before the "refining the candidates" phase. If they are shown, you will not have to manually remove this sequences from you code, I'll quickly modify the Clone Digger project to automatically exclude them. 2008/3/27, Rob Hooft <ro...@ho...>: > > Sorry, I forgot to add the list as CC in my earlier answer, copied below. > > I located two files in the source directory that take a long time to > complete on their own, and that I therefore suspect of being the > origin of the "infinite" time spent on the complete set. > > One of them contains sequences of code like: > > f.write("#macro dot(lx, ly, lz)\n") > f.write("\tsphere\n") > f.write("\t{\n") > f.write("\t\t<lx, ly, lz>, radDot\n") > f.write("\t\tflatTexture(1.0, 1.0, 1.0)\n") > f.write("\t}\n") > f.write("#end\n") > [...] > > The other a keyboard map like > > self.btnKey_q.setText > (self.__keymap__['q'][self.__shift__]) > self.btnKey_w.setText > (self.__keymap__['w'][self.__shift__]) > self.btnKey_e.setText > (self.__keymap__['e'][self.__shift__]) > self.btnKey_r.setText > (self.__keymap__['r'][self.__shift__]) > self.btnKey_t.setText > (self.__keymap__['t'][self.__shift__]) > [...] > > I guess I will have to exclude those two to be able to complete the > set. Any further guidance would be welcome! > > Regards, > > > Rob Hooft > > > 2008/3/27, Rob Hooft <ro...@ho...>: > > Peter, > > > > Thanks for your answer. Yes, I did prevent the inclusion of generated > > code. Below is the stack trace for the run I just terminated. I am not > > aware of long sequences of statements like that, isn't that what > > "maximum sequence length: 169" is telling me? Is 169 excessive? > > > > Rob > > > > Traceback (most recent call last): > > File "clonedigger.py", line 114, in ? > > duplicates = > > clone_detection_algorithm.findDuplicateCode(source_files, report) > > File "/home/hooft/p/clonedigger/clone_detection_algorithm.py", line > > 258, in findDuplicateCode > > clones = refineDuplicates(duplicate_candidates) > > File "/home/hooft/p/clonedigger/clone_detection_algorithm.py", line > > 146, in refineDuplicates > > distance = candidate_sequence.calcDistance() > > File "/home/hooft/p/clonedigger/abstract_syntax_tree.py", line 252, > > in calcDistance > > unifier = anti_unification.Unifier(trees[0], trees[1]) > > File "/home/hooft/p/clonedigger/anti_unification.py", line 100, in > __init__ > > (self._unifier, self._substitutions) = unify(t1, t2) > > File "/home/hooft/p/clonedigger/anti_unification.py", line 96, in > unify > > (ai, si) = unify(node1.getChilds()[i], node2.getChilds()[i]) > > File "/home/hooft/p/clonedigger/anti_unification.py", line 96, in > unify > > (ai, si) = unify(node1.getChilds()[i], node2.getChilds()[i]) > > File "/home/hooft/p/clonedigger/anti_unification.py", line 96, in > unify > > (ai, si) = unify(node1.getChilds()[i], node2.getChilds()[i]) > > File "/home/hooft/p/clonedigger/anti_unification.py", line 97, in > unify > > (ai, s) = combineSubs(ai, si, s) > > File "/home/hooft/p/clonedigger/anti_unification.py", line 71, in > combineSubs > > newt = (copy.copy(t[0]), copy.copy(t[1])) > > File "/usr/lib/python2.4/copy.py", line 85, in copy > > return copier(x) > > File "/usr/lib/python2.4/copy.py", line 141, in _copy_inst > > y = _EmptyClass() > > KeyboardInterrupt > > python2.4 clonedigger.py ~/p/seattle/gui/*.py~*ui.py 167342.50s user > > 108.63s system 99% cpu 46:35:33.63 total > > > > > > 2008/3/27, Peter Bulychev <pet...@gm...>: > > > > > Hello, Rob. > > > > > > Let's try to solve the problem :) > > > > > > I expect you have removed the automatically generated code from the > > > inspected module (as it was recommended in the documentation page). > > > > > > I think that the problem is caused by the following line: > > > > > > > Finding similar sequences of statements... 6948 sequences were > found > > > > > > > The value of 6948 is abnormally high for 30k statements. > > > > > > First of all, I think you should press the Ctrl-C and send me the > Python > > > traceback (I don't think that waiting is a good solution, I think we > should > > > fix the bug). > > > > > > Secondly, please take a look, if there are any long sequences of > assignments > > > in your project, like: > > > > > > > const_1 = 1 > > > > const_2 = 2 > > > > ..... > > > > const_100 = 100 > > > > > > > > > > > > > 2008/3/27, Rob Hooft <ro...@ho...>: > > > > > > > Hi, > > > > > > > > I have read the announcement of CloneDigger with interest. The > > > > department where I work is using python for all application > software > > > > that we ship to customers, and I am very interested in the > application > > > > of tools that can be used to "automatically" ensure quality of > code. > > > > Unfortunately, the whole package (436k lines excluding > automatically > > > > generated parts) will be too large to analyze at once, so I will > focus > > > > on packages at a time. > > > > > > > > I have now made 1.5 attempt at running CloneDigger. I was very > happy > > > > with the results on ~3k statements which ran in about 2 minutes. A > bit > > > > more challenging ~30k statement run is still going after 44 hours > of > > > > CPU time; the last screen output was generated 40 hours ago. What > can > > > > I do to figure out where this time is going, and can I get an > > > > intermediate sample of the result in some way? > > > > > > > > ------------------ > > > > ... > > > > 9513 sequences > > > > average sequence length: 3.057290 > > > > maximum sequence length: 169 > > > > Number of statements: 29084 > > > > Calculating size for each statement... done > > > > Building statement hash... done > > > > Number of different hash values: 1100 > > > > Building patterns... 5000, 10000, 15000, 20000, 25000, 4677 > patterns > > > > were discovered > > > > Choosing pattern for each statement... done > > > > Finding similar sequences of statements... 6948 sequences were > found > > > > Refining candidates... > > > > -------------------------------------------------- > > > > > > > > Rob Hooft > > > > > > > > -- > > > > Rob W. W. Hooft || ro...@ho... || > http://www.hooft.net/people/rob > > > > > > > > > > > > ------------------------------------------------------------------------- > > > > Check out the new SourceForge.net Marketplace. > > > > It's the best place to buy or sell services for > > > > just about anything Open Source. > > > > > > > > http://ad.doubleclick.net/clk;164216239;13503038;w?http://sf.net/marketplace > > > > _______________________________________________ > > > > Clonedigger-general mailing list > > > > Clo...@li... > > > > > > > https://lists.sourceforge.net/lists/listinfo/clonedigger-general > > > > > > > > > > > > > -- > > > Best regards, > > > Peter Bulychev. > > > > > > > > > > > -- > > > > Rob W. W. Hooft || ro...@ho... || http://www.hooft.net/people/rob > > > > > > -- > > Rob W. W. Hooft || ro...@ho... || http://www.hooft.net/people/rob > -- Best regards, Peter Bulychev. |
From: Peter B. <pet...@gm...> - 2008-03-27 10:22:20
|
I will answer your question about the long sequences below: Consider two functions each consists of 100 different print calls. Clone Digger understands that all these 200 statements are very similar and firstly, tries to compare the function bodies as a whole, and it sees that they differ in a lot of places. Than it tries to compare statements #2...100 of the first function and statements #1..99 of the second, and again sees that they are different Than it tries to do it with 3..100 (comparing them with #3..100, #2..99 and #1..98) ,2..99, 1..98, 4..100, 3..99 and so on, so, unfortunately, we have the exponential blow-up. It is caused by the presence of long sequence of *similar statements*. But if these functions will contain a mix of different statements (prints, for loops, and so on), Clone Digger will not perform all the comparisons, described in the previous paragraph. I think that Clone Digger should simply remove long sequences of similar statements out of consideration, because they are useless as clones (in most cases they will be the sequences of assignments or prints as in your case) 2008/3/27, Peter Bulychev <pet...@gm...>: > > Hello. > > Please, replace the clone_detection_algorithm.py by the attached file , > and inform me, if the warnings are shown before the "refining the > candidates" phase. > > If they are shown, you will not have to manually remove this sequences > from you code, I'll quickly modify the Clone Digger project to automatically > exclude them. > > 2008/3/27, Rob Hooft <ro...@ho...>: > > > > Sorry, I forgot to add the list as CC in my earlier answer, copied > > below. > > > > I located two files in the source directory that take a long time to > > complete on their own, and that I therefore suspect of being the > > origin of the "infinite" time spent on the complete set. > > > > One of them contains sequences of code like: > > > > f.write("#macro dot(lx, ly, lz)\n") > > f.write("\tsphere\n") > > f.write("\t{\n") > > f.write("\t\t<lx, ly, lz>, radDot\n") > > f.write("\t\tflatTexture(1.0, 1.0, 1.0)\n") > > f.write("\t}\n") > > f.write("#end\n") > > [...] > > > > The other a keyboard map like > > > > self.btnKey_q.setText > > (self.__keymap__['q'][self.__shift__]) > > self.btnKey_w.setText > > (self.__keymap__['w'][self.__shift__]) > > self.btnKey_e.setText > > (self.__keymap__['e'][self.__shift__]) > > self.btnKey_r.setText > > (self.__keymap__['r'][self.__shift__]) > > self.btnKey_t.setText > > (self.__keymap__['t'][self.__shift__]) > > [...] > > > > I guess I will have to exclude those two to be able to complete the > > set. Any further guidance would be welcome! > > > > Regards, > > > > > > Rob Hooft > > > > > > 2008/3/27, Rob Hooft <ro...@ho...>: > > > Peter, > > > > > > Thanks for your answer. Yes, I did prevent the inclusion of generated > > > code. Below is the stack trace for the run I just terminated. I am > > not > > > aware of long sequences of statements like that, isn't that what > > > "maximum sequence length: 169" is telling me? Is 169 excessive? > > > > > > Rob > > > > > > Traceback (most recent call last): > > > File "clonedigger.py", line 114, in ? > > > duplicates = > > > clone_detection_algorithm.findDuplicateCode(source_files, report) > > > File "/home/hooft/p/clonedigger/clone_detection_algorithm.py", line > > > 258, in findDuplicateCode > > > clones = refineDuplicates(duplicate_candidates) > > > File "/home/hooft/p/clonedigger/clone_detection_algorithm.py", line > > > 146, in refineDuplicates > > > distance = candidate_sequence.calcDistance() > > > File "/home/hooft/p/clonedigger/abstract_syntax_tree.py", line 252, > > > in calcDistance > > > unifier = anti_unification.Unifier(trees[0], trees[1]) > > > File "/home/hooft/p/clonedigger/anti_unification.py", line 100, in > > __init__ > > > (self._unifier, self._substitutions) = unify(t1, t2) > > > File "/home/hooft/p/clonedigger/anti_unification.py", line 96, in > > unify > > > (ai, si) = unify(node1.getChilds()[i], node2.getChilds()[i]) > > > File "/home/hooft/p/clonedigger/anti_unification.py", line 96, in > > unify > > > (ai, si) = unify(node1.getChilds()[i], node2.getChilds()[i]) > > > File "/home/hooft/p/clonedigger/anti_unification.py", line 96, in > > unify > > > (ai, si) = unify(node1.getChilds()[i], node2.getChilds()[i]) > > > File "/home/hooft/p/clonedigger/anti_unification.py", line 97, in > > unify > > > (ai, s) = combineSubs(ai, si, s) > > > File "/home/hooft/p/clonedigger/anti_unification.py", line 71, in > > combineSubs > > > newt = (copy.copy(t[0]), copy.copy(t[1])) > > > File "/usr/lib/python2.4/copy.py", line 85, in copy > > > return copier(x) > > > File "/usr/lib/python2.4/copy.py", line 141, in _copy_inst > > > y = _EmptyClass() > > > KeyboardInterrupt > > > python2.4 clonedigger.py ~/p/seattle/gui/*.py~*ui.py 167342.50s user > > > 108.63s system 99% cpu 46:35:33.63 total > > > > > > > > > 2008/3/27, Peter Bulychev <pet...@gm...>: > > > > > > > Hello, Rob. > > > > > > > > Let's try to solve the problem :) > > > > > > > > I expect you have removed the automatically generated code from the > > > > inspected module (as it was recommended in the documentation page). > > > > > > > > I think that the problem is caused by the following line: > > > > > > > > > Finding similar sequences of statements... 6948 sequences were > > found > > > > > > > > > The value of 6948 is abnormally high for 30k statements. > > > > > > > > First of all, I think you should press the Ctrl-C and send me the > > Python > > > > traceback (I don't think that waiting is a good solution, I think > > we should > > > > fix the bug). > > > > > > > > Secondly, please take a look, if there are any long sequences of > > assignments > > > > in your project, like: > > > > > > > > > const_1 = 1 > > > > > const_2 = 2 > > > > > ..... > > > > > const_100 = 100 > > > > > > > > > > > > > > > > > 2008/3/27, Rob Hooft <ro...@ho...>: > > > > > > > > > Hi, > > > > > > > > > > I have read the announcement of CloneDigger with interest. The > > > > > department where I work is using python for all application > > software > > > > > that we ship to customers, and I am very interested in the > > application > > > > > of tools that can be used to "automatically" ensure quality of > > code. > > > > > Unfortunately, the whole package (436k lines excluding > > automatically > > > > > generated parts) will be too large to analyze at once, so I will > > focus > > > > > on packages at a time. > > > > > > > > > > I have now made 1.5 attempt at running CloneDigger. I was very > > happy > > > > > with the results on ~3k statements which ran in about 2 minutes. > > A bit > > > > > more challenging ~30k statement run is still going after 44 > > hours of > > > > > CPU time; the last screen output was generated 40 hours ago. > > What can > > > > > I do to figure out where this time is going, and can I get an > > > > > intermediate sample of the result in some way? > > > > > > > > > > ------------------ > > > > > ... > > > > > 9513 sequences > > > > > average sequence length: 3.057290 > > > > > maximum sequence length: 169 > > > > > Number of statements: 29084 > > > > > Calculating size for each statement... done > > > > > Building statement hash... done > > > > > Number of different hash values: 1100 > > > > > Building patterns... 5000, 10000, 15000, 20000, 25000, 4677 > > patterns > > > > > were discovered > > > > > Choosing pattern for each statement... done > > > > > Finding similar sequences of statements... 6948 sequences were > > found > > > > > Refining candidates... > > > > > -------------------------------------------------- > > > > > > > > > > Rob Hooft > > > > > > > > > > -- > > > > > Rob W. W. Hooft || ro...@ho... || > > http://www.hooft.net/people/rob > > > > > > > > > > > > > > > > ------------------------------------------------------------------------- > > > > > Check out the new SourceForge.net Marketplace. > > > > > It's the best place to buy or sell services for > > > > > just about anything Open Source. > > > > > > > > > > > http://ad.doubleclick.net/clk;164216239;13503038;w?http://sf.net/marketplace > > > > > _______________________________________________ > > > > > Clonedigger-general mailing list > > > > > Clo...@li... > > > > > > > > > https://lists.sourceforge.net/lists/listinfo/clonedigger-general > > > > > > > > > > > > > > > > > -- > > > > Best regards, > > > > Peter Bulychev. > > > > > > > > > > > > > > > > -- > > > > > > Rob W. W. Hooft || ro...@ho... || http://www.hooft.net/people/rob > > > > > > > > > > > -- > > > > Rob W. W. Hooft || ro...@ho... || http://www.hooft.net/people/rob > > > > > > -- > Best regards, > Peter Bulychev. > -- Best regards, Peter Bulychev. |
From: Rob H. <ro...@ho...> - 2008-03-27 09:57:03
|
Sorry, I forgot to add the list as CC in my earlier answer, copied below. I located two files in the source directory that take a long time to complete on their own, and that I therefore suspect of being the origin of the "infinite" time spent on the complete set. One of them contains sequences of code like: f.write("#macro dot(lx, ly, lz)\n") f.write("\tsphere\n") f.write("\t{\n") f.write("\t\t<lx, ly, lz>, radDot\n") f.write("\t\tflatTexture(1.0, 1.0, 1.0)\n") f.write("\t}\n") f.write("#end\n") [...] The other a keyboard map like self.btnKey_q.setText(self.__keymap__['q'][self.__shift__]) self.btnKey_w.setText(self.__keymap__['w'][self.__shift__]) self.btnKey_e.setText(self.__keymap__['e'][self.__shift__]) self.btnKey_r.setText(self.__keymap__['r'][self.__shift__]) self.btnKey_t.setText(self.__keymap__['t'][self.__shift__]) [...] I guess I will have to exclude those two to be able to complete the set. Any further guidance would be welcome! Regards, Rob Hooft 2008/3/27, Rob Hooft <ro...@ho...>: > Peter, > > Thanks for your answer. Yes, I did prevent the inclusion of generated > code. Below is the stack trace for the run I just terminated. I am not > aware of long sequences of statements like that, isn't that what > "maximum sequence length: 169" is telling me? Is 169 excessive? > > Rob > > Traceback (most recent call last): > File "clonedigger.py", line 114, in ? > duplicates = > clone_detection_algorithm.findDuplicateCode(source_files, report) > File "/home/hooft/p/clonedigger/clone_detection_algorithm.py", line > 258, in findDuplicateCode > clones = refineDuplicates(duplicate_candidates) > File "/home/hooft/p/clonedigger/clone_detection_algorithm.py", line > 146, in refineDuplicates > distance = candidate_sequence.calcDistance() > File "/home/hooft/p/clonedigger/abstract_syntax_tree.py", line 252, > in calcDistance > unifier = anti_unification.Unifier(trees[0], trees[1]) > File "/home/hooft/p/clonedigger/anti_unification.py", line 100, in __init__ > (self._unifier, self._substitutions) = unify(t1, t2) > File "/home/hooft/p/clonedigger/anti_unification.py", line 96, in unify > (ai, si) = unify(node1.getChilds()[i], node2.getChilds()[i]) > File "/home/hooft/p/clonedigger/anti_unification.py", line 96, in unify > (ai, si) = unify(node1.getChilds()[i], node2.getChilds()[i]) > File "/home/hooft/p/clonedigger/anti_unification.py", line 96, in unify > (ai, si) = unify(node1.getChilds()[i], node2.getChilds()[i]) > File "/home/hooft/p/clonedigger/anti_unification.py", line 97, in unify > (ai, s) = combineSubs(ai, si, s) > File "/home/hooft/p/clonedigger/anti_unification.py", line 71, in combineSubs > newt = (copy.copy(t[0]), copy.copy(t[1])) > File "/usr/lib/python2.4/copy.py", line 85, in copy > return copier(x) > File "/usr/lib/python2.4/copy.py", line 141, in _copy_inst > y = _EmptyClass() > KeyboardInterrupt > python2.4 clonedigger.py ~/p/seattle/gui/*.py~*ui.py 167342.50s user > 108.63s system 99% cpu 46:35:33.63 total > > > 2008/3/27, Peter Bulychev <pet...@gm...>: > > > Hello, Rob. > > > > Let's try to solve the problem :) > > > > I expect you have removed the automatically generated code from the > > inspected module (as it was recommended in the documentation page). > > > > I think that the problem is caused by the following line: > > > > > Finding similar sequences of statements... 6948 sequences were found > > > > > The value of 6948 is abnormally high for 30k statements. > > > > First of all, I think you should press the Ctrl-C and send me the Python > > traceback (I don't think that waiting is a good solution, I think we should > > fix the bug). > > > > Secondly, please take a look, if there are any long sequences of assignments > > in your project, like: > > > > > const_1 = 1 > > > const_2 = 2 > > > ..... > > > const_100 = 100 > > > > > > > > > 2008/3/27, Rob Hooft <ro...@ho...>: > > > > > Hi, > > > > > > I have read the announcement of CloneDigger with interest. The > > > department where I work is using python for all application software > > > that we ship to customers, and I am very interested in the application > > > of tools that can be used to "automatically" ensure quality of code. > > > Unfortunately, the whole package (436k lines excluding automatically > > > generated parts) will be too large to analyze at once, so I will focus > > > on packages at a time. > > > > > > I have now made 1.5 attempt at running CloneDigger. I was very happy > > > with the results on ~3k statements which ran in about 2 minutes. A bit > > > more challenging ~30k statement run is still going after 44 hours of > > > CPU time; the last screen output was generated 40 hours ago. What can > > > I do to figure out where this time is going, and can I get an > > > intermediate sample of the result in some way? > > > > > > ------------------ > > > ... > > > 9513 sequences > > > average sequence length: 3.057290 > > > maximum sequence length: 169 > > > Number of statements: 29084 > > > Calculating size for each statement... done > > > Building statement hash... done > > > Number of different hash values: 1100 > > > Building patterns... 5000, 10000, 15000, 20000, 25000, 4677 patterns > > > were discovered > > > Choosing pattern for each statement... done > > > Finding similar sequences of statements... 6948 sequences were found > > > Refining candidates... > > > -------------------------------------------------- > > > > > > Rob Hooft > > > > > > -- > > > Rob W. W. Hooft || ro...@ho... || http://www.hooft.net/people/rob > > > > > > > > ------------------------------------------------------------------------- > > > Check out the new SourceForge.net Marketplace. > > > It's the best place to buy or sell services for > > > just about anything Open Source. > > > > > http://ad.doubleclick.net/clk;164216239;13503038;w?http://sf.net/marketplace > > > _______________________________________________ > > > Clonedigger-general mailing list > > > Clo...@li... > > > > > https://lists.sourceforge.net/lists/listinfo/clonedigger-general > > > > > > > > > -- > > Best regards, > > Peter Bulychev. > > > > > > -- > > Rob W. W. Hooft || ro...@ho... || http://www.hooft.net/people/rob > -- Rob W. W. Hooft || ro...@ho... || http://www.hooft.net/people/rob |
From: Peter B. <pet...@gm...> - 2008-03-27 07:51:52
|
Hello, Rob. Let's try to solve the problem :) I expect you have removed the automatically generated code from the inspected module (as it was recommended in the documentation page). I think that the problem is caused by the following line: > Finding similar sequences of statements... 6948 sequences were found > The value of 6948 is abnormally high for 30k statements. First of all, I think you should press the Ctrl-C and send me the Python traceback (I don't think that waiting is a good solution, I think we should fix the bug). Secondly, please take a look, if there are any long sequences of assignments in your project, like: > const_1 = 1 > const_2 = 2 > ..... > const_100 = 100 > 2008/3/27, Rob Hooft <ro...@ho...>: > Hi, > > I have read the announcement of CloneDigger with interest. The > department where I work is using python for all application software > that we ship to customers, and I am very interested in the application > of tools that can be used to "automatically" ensure quality of code. > Unfortunately, the whole package (436k lines excluding automatically > generated parts) will be too large to analyze at once, so I will focus > on packages at a time. > > I have now made 1.5 attempt at running CloneDigger. I was very happy > with the results on ~3k statements which ran in about 2 minutes. A bit > more challenging ~30k statement run is still going after 44 hours of > CPU time; the last screen output was generated 40 hours ago. What can > I do to figure out where this time is going, and can I get an > intermediate sample of the result in some way? > > ------------------ > ... > 9513 sequences > average sequence length: 3.057290 > maximum sequence length: 169 > Number of statements: 29084 > Calculating size for each statement... done > Building statement hash... done > Number of different hash values: 1100 > Building patterns... 5000, 10000, 15000, 20000, 25000, 4677 patterns > were discovered > Choosing pattern for each statement... done > Finding similar sequences of statements... 6948 sequences were found > Refining candidates... > -------------------------------------------------- > > Rob Hooft > > -- > Rob W. W. Hooft || ro...@ho... || http://www.hooft.net/people/rob > > ------------------------------------------------------------------------- > Check out the new SourceForge.net Marketplace. > It's the best place to buy or sell services for > just about anything Open Source. > http://ad.doubleclick.net/clk;164216239;13503038;w?http://sf.net/marketplace > _______________________________________________ > Clonedigger-general mailing list > Clo...@li... > https://lists.sourceforge.net/lists/listinfo/clonedigger-general > -- Best regards, Peter Bulychev. |
From: Rob H. <ro...@ho...> - 2008-03-27 06:57:08
|
Hi, I have read the announcement of CloneDigger with interest. The department where I work is using python for all application software that we ship to customers, and I am very interested in the application of tools that can be used to "automatically" ensure quality of code. Unfortunately, the whole package (436k lines excluding automatically generated parts) will be too large to analyze at once, so I will focus on packages at a time. I have now made 1.5 attempt at running CloneDigger. I was very happy with the results on ~3k statements which ran in about 2 minutes. A bit more challenging ~30k statement run is still going after 44 hours of CPU time; the last screen output was generated 40 hours ago. What can I do to figure out where this time is going, and can I get an intermediate sample of the result in some way? ------------------ ... 9513 sequences average sequence length: 3.057290 maximum sequence length: 169 Number of statements: 29084 Calculating size for each statement... done Building statement hash... done Number of different hash values: 1100 Building patterns... 5000, 10000, 15000, 20000, 25000, 4677 patterns were discovered Choosing pattern for each statement... done Finding similar sequences of statements... 6948 sequences were found Refining candidates... -------------------------------------------------- Rob Hooft -- Rob W. W. Hooft || ro...@ho... || http://www.hooft.net/people/rob |
From: Peter B. <pet...@gm...> - 2008-03-23 05:39:37
|
Hello. The problem was caused by the automatically generated sources in the third party library pyglet. There were a lot of sequent assignments and this led to poor performance of suffix tree based algorithm. I think this situation is only possible for automatically-generated code, because a human programmer will not produce function consists of 5000 sequent statements. I modified the documentation page of the site and recommended to remove automatically generated sources, tests and third party libraries from the source tree. (http://clonedigger.sourceforge.net/documentation.html) 2008/3/23, Ondrej Certik <on...@ce...>: > Hi, > > I noticed the Clone Digger (http://clonedigger.sourceforge.net/) on > the GSoC Python list, so I tried it on sympy and failed. But then the > author Peter Bulychev run it on his machine > and made it run and send me the result (attached). > > It's very interesting, it discovered quite a lot of places, that > clearly we could refactor into functions or something. Very cool. > > > Ondrej > > -- Best regards, Peter Bulychev. |