Menu

#17 Support for large file diffs

closed
nobody
None
5
2012-10-11
2011-05-17
Andy
No

Just wondering if any benchmarks exist for what files sizes CLOC can handle when running diffs?
In my environment I am comparing releases which consist of around 1K files containing 10.5 million blank lines, 30K comments, and 18 million lines of XML; I killed the diff after 25 hours.
No mention of bdiff or large file support in Algorithm::Diff so wondering if it's even possible.

Currently testing with GNU diff '--speed-large-files' and if feasible may write a wrapper (maybe Algorithm::diff could be enhanced)

Discussion

  • Al Danial

    Al Danial - 2011-05-19

    1.8e7 lines of XML is a lot. Could you let cloc run to completion, perhaps over a weekend, just to establish the baseline performance--which can hopefully be improved later? Better yet, run cloc with a Perl profiler (http://www.perl.org/about/whitepapers/perl-profiling.html) to see where the time is spent. The results would be most valuable.

    As a point of reference my development box can do a straight count of gcc-4.5.3.tar.bz2 in 242 seconds, or 31679.1 lines/s. I'll report back later on how long it takes to do a diff between gcc-4.5.2.tar.bz2 and gcc-4.5.3.tar.bz2.

     
  • Al Danial

    Al Danial - 2011-05-20

    More performance info:

    time cloc --diff gcc-4.5.2.tar.bz2 gcc-4.5.3.tar.bz2
    57298 text files.
    57414 text files.es
    7029 files ignored.

    [...output trimmed..]

    SUM:
    same 53076 0 1653021 4902805
    modified 320 0 165 996
    added 116 630 799 4943
    removed 1 83 151 1134


    real 29m4.001s
    user 23m56.697s
    sys 3m59.772s

    So a diff of two archives, each with roughly 6.5e+6 lines of code and comments, took just under a half hour.

    Hardware is Intel Core2 Quad CPU Q6600 @ 2.40GHz; runs Fedora 14.

     
  • Al Danial

    Al Danial - 2011-07-20

    Changing to pending; will close in two weeks unless more discussion is needed.

     

Anonymous
Anonymous

Add attachments
Cancel