#122 find duplicate code

release_3.2
closed
Check (274)
5
2012-10-10
2002-12-09
Lars Kühne
No

We should be able to find duplicate code. This both
applies to copy+pasted code but also for structurally
identical code that could be moved to a method.

Example:

snippet 1:
doStuff()
doMore(7)
doOtherStuff()

snippet 2:
doStuff()
doMore(i + j)
doOtherStuff()

These snippets are not really copy+paste but if they
are sufficiently long they should go in a method that
takes the integer as an argument.

The PMD project (pmd.sf.net) has a copy and paste
detection tool which works well but is too slow for
regular use. I don't know whether the problem lies in
the underlying algorithm or in the pmd implementation.

If the algorithm is the problem, there has been some
reasearch on this subject, and the algorithm in
http://citeseer.nj.nec.com/539959.html might be an
alternative.

I think our implementation will have to be a new
FileSetCheck, although it would be cool if we did not
have to parse each file twice.

Discussion

  • Lars Kühne

    Lars Kühne - 2003-06-29

    Logged In: YES
    user_id=401384

    This is now implemented in a commercial product (free for
    open source projects), which provides a checkstyle plugin:

    http://www.redhillconsulting.com.au/products/simian/

    Lowering priority for this request but keeping it open, as
    duplicate code detection would still be a great feature of
    the checkstyle core distribution:

    • ability to inspect the source code for educational
      purposes (I still wonder how simian does it, especially why
      it's so much faster than PMD's CPD tool)
    • download in one convenient package
    • ...

    On the other hand Simon Harris (the guy behind redhill) has
    contributed many checks to checkstyle's core, and we sure
    don't want to ruin his buisiness. Hmm... does this mean we'd
    have to purposefully distribute an inferior implementation?

     
  • Oliver Burn

    Oliver Burn - 2003-06-30

    Logged In: YES
    user_id=218824

    > On the other hand Simon Harris (the guy behind redhill) has
    > contributed many checks to checkstyle's core, and we sure
    > don't want to ruin his buisiness. Hmm... does this mean we'd
    > have to purposefully distribute an inferior implementation?

    From my point of view, absolutely not! I will continue to
    write checks that interest me. If that means I end up
    writing a code duplication detector, I will certainly
    release it.

     
  • Lars Kühne

    Lars Kühne - 2003-07-12

    Logged In: YES
    user_id=401384

    I have a working prototype for duplicate code detection

     
  • Lars Kühne

    Lars Kühne - 2003-10-24

    Logged In: YES
    user_id=401384

    functionality is available in 3.2, check StrictDuplicateCode

     

Get latest updates about Open Source Projects, Conferences and News.

Sign up for the SourceForge newsletter:





No, thanks