DuMP3 is a duplicate and similar file finder. It finds exact duplicate binaries by hash, similar text files by substring content, images (JPG, BMP, GIF, PNG, etc) by color and audio files (MP3, WAV, OGG, etc) by wave data. Future: fonts, video.
jATLAS is a Java implementation of ATLAS [Architecture and Tools for Linguistic Analysis Systems]. For more information, see http://jatlas.sourceforge.net.