This is a tool for retrieving nearest neighbors and clustering of large categorical data sets repesented in transactional form.
The clustering is achieved via a locality-sensitive hashing of categorical datasets for speed and scalability.
The locality-sensitive hashing method implemented is described in the video lectures under www.mmds.org (Chapter 3).
Information needed for LSH, such as shingles/tokens, MinHash signatures, band hashes to buckets
are stored in several database tables.
Information needed for clustering purposes, such as the most significant pairwise object similarities and density-based similarities are also stored in tables.

An early version of the fast database-based retrieval of nearest neighbors and clustering in large categorical datasets was published in:
Bill Andreopoulos, Aijun An, Xiaogang Wang, Dirk Labudde. Efficient Layered Density-based Clustering of Categorical Data. Elsevier Journal of Biomedical Informatics, 2009.

Project Samples

Project Activity

See All Activity >

Follow HIERDENC

HIERDENC Web Site

Other Useful Business Software
Compliant and Reliable File Transfers Backed by Top Security Certifications Icon
Compliant and Reliable File Transfers Backed by Top Security Certifications

Cerberus FTP Server delivers SOC 2 Type II certified security and FIPS 140-2 validated encryption.

Stop relying on non-certified, legacy file transfer tools that creak under the weight of modern security demands. Get full audit trails, advanced access controls and more supported by an award-winning team of experts. Start your free 25-day trial today.
Start Free Trial
Rate This Project
Login To Rate This Project

User Reviews

Be the first to post a review of HIERDENC!

Additional Project Details

Registered

2016-11-16