Similarity is a Java toolkit for calculating similarity scores between text strings. It provides a collection of algorithms for word similarity, phrase similarity, sentence similarity, paragraph similarity, semantic comparison, sentiment tendency, and approximate word discovery. The project is designed to teach and apply natural language similarity methods while keeping the architecture practical and customizable. It includes approaches such as edit distance, cosine similarity, Euclidean distance, Jaccard similarity, Jaro distance, Jaro-Winkler distance, Manhattan distance, SimHash with Hamming distance, and Sørensen-Dice coefficient. It also supports Java dependency integration through Maven or Gradle workflows. It is useful for Chinese NLP projects, search features, duplicate detection, recommendation systems, and text analysis experiments.

Features

  • Java text similarity toolkit
  • Word, phrase, sentence, and paragraph comparison
  • Multiple distance algorithms
  • Sentiment tendency analysis
  • Approximate word discovery
  • Maven and Gradle integration

Project Samples

Project Activity

See All Activity >

Categories

Libraries

License

Apache License V2.0

Follow Similarity

Similarity Web Site

Other Useful Business Software
MongoDB Atlas runs apps anywhere Icon
MongoDB Atlas runs apps anywhere

Deploy in 115+ regions with the modern database for every enterprise.

MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
Start Free
Rate This Project
Login To Rate This Project

User Reviews

Be the first to post a review of Similarity!

Additional Project Details

Programming Language

Java

Related Categories

Java Libraries

Registered

21 hours ago