HIERDENC

This is a tool for retrieving nearest neighbors and clustering of large categorical data sets repesented in transactional form.
The clustering is achieved via a locality-sensitive hashing of categorical datasets for speed and scalability.
The locality-sensitive hashing method implemented is described in the video lectures under www.mmds.org (Chapter 3).
Information needed for LSH, such as shingles/tokens, MinHash signatures, band hashes to buckets
are stored in several database tables.
Information needed for clustering purposes, such as the most significant pairwise object similarities and density-based similarities are also stored in tables.

An early version of the fast database-based retrieval of nearest neighbors and clustering in large categorical datasets was published in:
Bill Andreopoulos, Aijun An, Xiaogang Wang, Dirk Labudde. Efficient Layered Density-based Clustering of Categorical Data. Elsevier Journal of Biomedical Informatics, 2009.

Project Samples

Project Activity

See All Activity >

Follow HIERDENC

HIERDENC Web Site

Other Useful Business Software

Gen AI apps are built with MongoDB Atlas

The database for AI-powered applications.

MongoDB Atlas is the developer-friendly database used to build, scale, and run gen AI and LLM-powered apps—without needing a separate vector database. Atlas offers built-in vector search, global availability across 115+ regions, and flexible document modeling. Start building AI apps faster, all in one place.

Start Free

Rate This Project

User Reviews

Be the first to post a review of HIERDENC!

Additional Project Details

Registered

2016-11-16

Report inappropriate content