Menu

#114 RE: #107

v1.63
closed
nobody
None
1
2015-06-27
2014-08-22
tksh
No

Hi, following #107

If you create a case where the current solution fails,
please post it as a new bug report and I'll sharpen the logic.

test03.zip

org/
 |- src/
 |   |- module01/
 |   |   |- main.c
 |   |   |- file02.c

mod/
 |- src/
 |   |- file02.c
 |   |- module01/
 |   |   |- main.c
 |   |   |- bak/
 |   |   |   |- main.c

cloc-1.62.exe --skip-uniqueness --diff org mod
=> NG? 1 same, 2 added, 1 removed, but the file recognized as same is file02.c, not main.c


and also, very strange result in test04.zip,
where moa/ is a exact copy of mod/, but the result is not same.

org/
 |- src/
 |   |- module01/
 |   |   |- main.c
 |   |   |- file01.c
 |   |   |- file02.c

mod/
 |- src/
 |   |- file02.c
 |   |- module01/
 |   |   |- main.c
 |   |   |- file01.c
 |   |   |- bak/
 |   |   |   |- main.c

moa/  ---- exact copy of mod/

cloc-1.62.exe --skip-uniqueness --diff org mod
=> NG (4 addes, 3 removed files)

cloc-1.62.exe --skip-uniqueness --diff org moa
=> OK (2 same, 2 added, 1 removed files)

cloc-1.62.exe --skip-uniqueness --diff mod moa
=> OK (4 same files)

1 Attachments

Discussion

  • tksh

    tksh - 2014-08-22

    Sorry, I can't find how to post multiple attachment files at the same time ...

     
  • Anonymous

    Anonymous - 2015-02-12

    Cloc still produces wrong results when the dir-trees contain subdirs with various versions of the same file(s). I have a structure in my project like this to support different hardware targets:

    dirA/
     +- var1/
     |   +- config.c
     |   `- config.h
     `- var2/
         +- config.c
         `- config.h
    dirB/
     +- var1/
     |   +- config.c
     |   `- config.h
     `- var2/
         +- config.c
         `- config.h
    

    If I then run perl cloc.pl --diff-alignment=cloc.log --by-file --csv dirB dirA > cloc.csv, cloc.csv only lists

    dirB\var2\config.h
    dirB\var2\config.c
    

    When more varX directories (with "the same" sub-dir structure and files) and/or the varX directories contain more files and folders, the problem gets worse and non-predictable:
    - some of the config.c/h files are not reported at all,
    - some are only reported in one varX subdir, as above
    - some are reported correctly, for each varX

    (Windows 7 Enterprise, cloc trunk rev 432)

     
  • Anonymous

    Anonymous - 2015-02-12

    (following up to above comment)

    The --skip-uniqueness flag seems to solve my problem. Still, I'm a bit worried about the unpredictability of the results above. It is weird, that cloc sometimes reports the file from e.g. dirA/var1, sometimes from dir2/var2 and both on other occasions. Not with the example given, but this happens when the dir trees get larger...

     
  • Al Danial

    Al Danial - 2015-02-12

    Step 3 of http://cloc.sourceforge.net/#How_it_works mentions that cloc tries to avoid counting the same code twice by eliminating duplicate files. The unpredictability you mention arises from the fact that the directory traversal (via the File::Find module) is not guaranteed to return files in the same sequence. Therefore if files A and B have identical contents, one time the code may get to A first and reject B because it is a duplicate, and other times the opposite happens--B is first and A is rejected.

    As you discovered, --skip-uniqueness avoids this by never rejecting duplicate files and counting everything. However, those results with duplicate counts for the same code might not be desireable either.

    I'll see if there's a convenient way to sort the file lists before excluding duplicates to get repeatable behavior.

     
  • Al Danial

    Al Danial - 2015-02-12

    I might not be understanding your issue correctly. I downloaded your cloc_bug.zip and ran

    ../cloc --diff-alignment=cloc.log --by-file --csv dirB dirA > cloc.csv
    

    The outputs look correct to me:

    # cloc.log
    Files added: 0
    
    Files removed: 0
    
    File pairs compared: 2
      == dirB/var1/config.h | dirA/var1/config.h ; C/C++ Header
      == dirB/var1/config.c | dirA/var1/config.c ; C
    

    and

    # cloc.csv
           4 text files.
           4 text files.
    Wrote cloc.lognique files.                          
           0 files ignored.
    
    File, == blank, != blank, + blank, - blank, == comment, != comment, + comment, - comment, == code, != code, + code, - code, "http://cloc.sourceforge.net v 1.63 T=0.0588138103485107 s"
    dirB/var1/config.h, 0, 0, 0, 0, 0, 0, 0, 0, 3, 0, 0, 0, 
    dirB/var1/config.c, 0, 0, 0, 0, 0, 0, 0, 0, 7, 0, 0, 0,
    

    If these aren't what you expect, let me know what you think it should be.

     
    • Anonymous

      Anonymous - 2015-02-23

      Hi Al, thanks for following up.

      After finding the --skip-uniqueness flag, reading your comments above and looking at http://cloc.sourceforge.net/#How_it_works, I changed my expectations to match cloc's output ;-)

       
  • Al Danial

    Al Danial - 2015-02-12

    Also, regarding repeatability, cloc currently sorts on the basename
    of like-sized files. This causes non-repeatable results since, in your case,
    dirA/var1/config.h and dirB/var1/config.h have the same basename. Sorting on the full name will solve this. I'll change the code.

     
  • Al Danial

    Al Danial - 2015-02-12
    • status: open --> pending
     
  • Al Danial

    Al Danial - 2015-02-12

    svn commit 433 has updated code that should make results repeatable.

     
  • Al Danial

    Al Danial - 2015-06-27
    • status: pending --> closed
     
  • Al Danial

    Al Danial - 2015-06-27

    v1.64 released

     

Anonymous
Anonymous

Add attachments
Cancel