MobileCLIP is a family of efficient image-text embedding models designed for real-time, on-device retrieval and zero-shot classification. The repo provides training, inference, and evaluation code for MobileCLIP models trained on DataCompDR, and for newer MobileCLIP2 models trained on DFNDR. It includes an iOS demo app and Core ML artifacts to showcase practical, offline photo search and classification on iPhone-class hardware. Project notes highlight latency/accuracy trade-offs, with MobileCLIP2 variants matching or surpassing larger baselines at notably lower parameter counts and runtime on mobile devices. A companion “mobileclip-dr” repository details large-scale, distributed data-generation pipelines used to reinforce datasets across billions of samples on thousands of GPUs. Overall, MobileCLIP emphasizes end-to-end practicality: scalable training, deployable models, and consumer-grade demos.
Features
- Efficient image-text embeddings optimized for mobile latency
- Training, inference, and evaluation pipelines for MobileCLIP and MobileCLIP2
- iOS demo app and Core ML models for offline search
- Strong accuracy at lower parameters and runtime vs larger baselines
- Dataset reinforcement tooling via the companion DR codebase
- Zero-shot retrieval and classification for on-device experiences