DeepCluster is a classic self-supervised clustering-based representation learning algorithm that iteratively groups image features and uses the cluster assignments as pseudo-labels to train the network. In each round, features produced by the network are clustered (e.g. k-means), and the cluster IDs become supervision targets in the next epoch, encouraging the model to refine its representation to better separate semantic groups. This alternating “cluster & train” scheme helps the model gradually discover meaningful structure without labels. DeepCluster was one of the early successes in unsupervised visual feature learning, demonstrating that clustering-based reformulation can rival supervised baselines for many downstream tasks. The repository includes code for feature extraction, clustering, training loops, and evaluation benchmarks like linear probes. Because of its simplicity and modular design, DeepCluster has inspired many later methods.
Features
- Unsupervised learning via iterative clustering and pseudo-label supervision
- Alternating pipeline: cluster features → use cluster IDs to train network
- Support for k-means or other clustering algorithms in feature space
- Training and evaluation scripts for downstream tasks (classification, detection)
- Modular code to swap network architectures or clustering methods
- Baseline reference for many later self-supervised approaches