CLIP, Predict the most relevant text snippet given an image
4M: Massively Multimodal Masked Modeling
Large Multimodal Models for Video Understanding and Editing
Official implementation of DreamCraft3D
Implementation of "MobileCLIP" CVPR 2024
The official PyTorch implementation of Google's Gemma models
Towards Ultimate Expert Specialization in Mixture-of-Experts Language
The ChatGPT Retrieval Plugin lets you easily find personal documents
Inference script for Oasis 500M
ICLR2024 Spotlight: curation/training code, metadata, distribution
A Customizable Image-to-Video Model based on HunyuanVideo
Implementation of the Surya Foundation Model for Heliophysics
GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning
Official code for Style Aligned Image Generation via Shared Attention
Qwen2.5-Coder is the code version of Qwen2.5, the large language model
Open-source, high-performance Mixture-of-Experts large language model
Powerful open source image generation model
Open Multilingual Multimodal Chat LMs
Official PyTorch Implementation of "Scalable Diffusion Models"
Code release for ConvNeXt V2 model
Learning to Act by Watching Unlabeled Online Videos
Code release for "Masked-attention Mask Transformer
A library for Multilingual Unsupervised or Supervised word Embeddings
Code for reproducing key results in the paper
Code for "Image Generation from Scene Graphs", Johnson et al, CVPR 201