This is a tool for retrieving nearest neighbors and clustering of large categorical data sets repesented in transactional form.
The clustering is achieved via a locality-sensitive hashing of categorical datasets for speed and scalability.
The locality-sensitive hashing method implemented is described in the video lectures under (Chapter 3).
Information needed for LSH, such as shingles/tokens, MinHash signatures, band hashes to buckets
are stored in several database tables.
Information needed for clustering purposes, such as the most significant pairwise object similarities and density-based similarities are also stored in tables.

An early version of the fast database-based retrieval of nearest neighbors and clustering in large categorical datasets was published in:
Bill Andreopoulos, Aijun An, Xiaogang Wang, Dirk Labudde. Efficient Layered Density-based Clustering of Categorical Data. Elsevier Journal of Biomedical Informatics, 2009.

Project Samples

Project Activity

See All Activity >




Other Useful Business Software

Multi-vendor storage monitoring simplified Multi-vendor storage monitoring simplified Icon
Multi-vendor storage monitoring simplified Icon

Monitor your multi-vendor storage to help ensure your applications get the performance & capacity they need with SolarWinds® Storage Resource Monitor.

SolarWinds Storage Resource Monitor (SRM) gives you multi-vendor storage performance monitoring and alerting to help ensure peak storage performance. Automated capacity planning helps you predict storage shortages, reclaim space, and prevent application outages. SRM integrates with other Orion® Platform products to provide end-to-end visibility into the application stack, and lets you easily troubleshoot performance issues from application to storage.

Rate This Project

Login To Rate This Project

User Reviews

Be the first to post a review of HIERDENC!

Additional Project Details