CLIP-ViT-bigG-14-laion2B-39B-b160k is a powerful vision-language model trained on the English subset of the LAION-5B dataset using the OpenCLIP framework. Developed by LAION and trained by Mitchell Wortsman on Stability AI’s compute infrastructure, it pairs a ViT-bigG/14 vision transformer with a text encoder to perform contrastive learning on image-text pairs. This model excels at zero-shot image classification, image-to-text and text-to-image retrieval, and can be adapted for tasks such as...