fashion-clip
CLIP model fine-tuned for zero-shot fashion product classification
FashionCLIP is a domain-adapted CLIP model fine-tuned specifically for the fashion industry, enabling zero-shot classification and retrieval of fashion products. Developed by Patrick John Chia and collaborators, it builds on the CLIP ViT-B/32 architecture and was trained on over 800K image-text pairs from the Farfetch dataset. The model learns to align product images and descriptive text using contrastive learning, enabling it to perform well across various fashion-related tasks without additional supervision. FashionCLIP 2.0, the latest version, uses the laion/CLIP-ViT-B-32-laion2B-s34B-b79K checkpoint for improved accuracy, achieving better F1 scores across multiple benchmarks compared to earlier versions. ...