RE: #107
Brought to you by:
alnd
Hi, following #107
If you create a case where the current solution fails,
please post it as a new bug report and I'll sharpen the logic.
test03.zip
org/
|- src/
| |- module01/
| | |- main.c
| | |- file02.c
mod/
|- src/
| |- file02.c
| |- module01/
| | |- main.c
| | |- bak/
| | | |- main.c
cloc-1.62.exe --skip-uniqueness --diff org mod
=> NG? 1 same, 2 added, 1 removed, but the file recognized as same is file02.c, not main.c
and also, very strange result in test04.zip,
where moa/ is a exact copy of mod/, but the result is not same.
org/
|- src/
| |- module01/
| | |- main.c
| | |- file01.c
| | |- file02.c
mod/
|- src/
| |- file02.c
| |- module01/
| | |- main.c
| | |- file01.c
| | |- bak/
| | | |- main.c
moa/ ---- exact copy of mod/
cloc-1.62.exe --skip-uniqueness --diff org mod
=> NG (4 addes, 3 removed files)
cloc-1.62.exe --skip-uniqueness --diff org moa
=> OK (2 same, 2 added, 1 removed files)
cloc-1.62.exe --skip-uniqueness --diff mod moa
=> OK (4 same files)
Anonymous
Sorry, I can't find how to post multiple attachment files at the same time ...
Cloc still produces wrong results when the dir-trees contain subdirs with various versions of the same file(s). I have a structure in my project like this to support different hardware targets:
If I then run perl cloc.pl --diff-alignment=cloc.log --by-file --csv dirB dirA > cloc.csv, cloc.csv only lists
When more varX directories (with "the same" sub-dir structure and files) and/or the varX directories contain more files and folders, the problem gets worse and non-predictable:
- some of the config.c/h files are not reported at all,
- some are only reported in one varX subdir, as above
- some are reported correctly, for each varX
(Windows 7 Enterprise, cloc trunk rev 432)
(following up to above comment)
The --skip-uniqueness flag seems to solve my problem. Still, I'm a bit worried about the unpredictability of the results above. It is weird, that cloc sometimes reports the file from e.g. dirA/var1, sometimes from dir2/var2 and both on other occasions. Not with the example given, but this happens when the dir trees get larger...
Step 3 of http://cloc.sourceforge.net/#How_it_works mentions that cloc tries to avoid counting the same code twice by eliminating duplicate files. The unpredictability you mention arises from the fact that the directory traversal (via the File::Find module) is not guaranteed to return files in the same sequence. Therefore if files A and B have identical contents, one time the code may get to A first and reject B because it is a duplicate, and other times the opposite happens--B is first and A is rejected.
As you discovered, --skip-uniqueness avoids this by never rejecting duplicate files and counting everything. However, those results with duplicate counts for the same code might not be desireable either.
I'll see if there's a convenient way to sort the file lists before excluding duplicates to get repeatable behavior.
I might not be understanding your issue correctly. I downloaded your cloc_bug.zip and ran
The outputs look correct to me:
and
If these aren't what you expect, let me know what you think it should be.
Hi Al, thanks for following up.
After finding the --skip-uniqueness flag, reading your comments above and looking at http://cloc.sourceforge.net/#How_it_works, I changed my expectations to match cloc's output ;-)
Also, regarding repeatability, cloc currently sorts on the basename
of like-sized files. This causes non-repeatable results since, in your case,
dirA/var1/config.h and dirB/var1/config.h have the same basename. Sorting on the full name will solve this. I'll change the code.
svn commit 433 has updated code that should make results repeatable.
v1.64 released