Similarity
Text similarity calculation Toolkit for Java
Similarity is a Java toolkit for calculating similarity scores between text strings. It provides a collection of algorithms for word similarity, phrase similarity, sentence similarity, paragraph similarity, semantic comparison, sentiment tendency, and approximate word discovery. The project is designed to teach and apply natural language similarity methods while keeping the architecture practical and customizable. It includes approaches such as edit distance, cosine similarity, Euclidean distance, Jaccard similarity, Jaro distance, Jaro-Winkler distance, Manhattan distance, SimHash with Hamming distance, and Sørensen-Dice coefficient. ...