Similarity is a Java toolkit for calculating similarity scores between text strings. It provides a collection of algorithms for word similarity, phrase similarity, sentence similarity, paragraph similarity, semantic comparison, sentiment tendency, and approximate word discovery. The project is designed to teach and apply natural language similarity methods while keeping the architecture practical and customizable. It includes approaches such as edit distance, cosine similarity, Euclidean distance, Jaccard similarity, Jaro distance, Jaro-Winkler distance, Manhattan distance, SimHash with Hamming distance, and Sørensen-Dice coefficient. It also supports Java dependency integration through Maven or Gradle workflows. It is useful for Chinese NLP projects, search features, duplicate detection, recommendation systems, and text analysis experiments.
Features
- Java text similarity toolkit
- Word, phrase, sentence, and paragraph comparison
- Multiple distance algorithms
- Sentiment tendency analysis
- Approximate word discovery
- Maven and Gradle integration